Synthetic Aperture Radar (SAR) data contains a high level of informative content that enables its application in many domains, but its image quality is affected by various classes of artifacts that can limit its use.
Azimuth ambiguities are an important class of artifacts that affect the performance of target detection/classification algorithms and interferometric applications. They are caused by finite sampling and sidelobe backscattering contamination from the adjacent radar pulses. This is because the SAR spectrum is not strictly band-limited and the signal band is contaminated by ambiguous signals from the adjacent spectra.
The first approach to reduce the azimuth ambiguities is to design a SAR system by selecting the antenna size and the PRF accordingly. Unfortunately, that design choice might not be in line with the requirements of modern micro-SAR platforms. New SAR satellite constellations are equipped with smaller antennas compared to their predecessors, which impose constraints that restrict conventional suppression of these ambiguities.
Many techniques to detect and suppress the azimuth ambiguities have been proposed, and ICEYE implemented dedicated algorithms based on signal processing and image processing techniques to remove undesirable artifacts, thereby improving the quality of the data.
The implemented algorithm for ambiguity detection is called Phase Variant Analysis (PVA), and it exploits the phase information of the complex data to detect the artifacts. The implemented algorithm for ambiguity suppression in the Single Look Complex (SLC) data is called Selective Doppler Frequency Suppression (SDFS), and it decouples the ambiguous energy from the main signal in the Doppler Spectrum.
The presence of multi-domain data layers inspires the use of exploratory Machine Learning (ML) techniques for ambiguity detection and suppression. Furthermore, ML can potentially make this process scalable when considering big EO data handling. ML can potentially improve the scalability of ambiguity detection and suppression.
ICEYE is collaborating with ESA 𝛟-lab on the Artificial Intelligence for SAR at High Resolution (AI4SARHighRes) project to exploit some ML techniques that mitigate the problem of azimuth ambiguities.
The presence of artifacts on SAR images make it quite an interesting problem from ML perspective. Since artifacts are outliers, a good convolutional neural network should be able to filter such artifacts from the SAR images. We aim to work on this problem in two steps using ML. In the first step, we tackle the problem of azimuth ambiguity detection using a supervised ML approach that is trained on a large synthetic dataset. In the second step, we aim to achieve content aware image filling/inpainting (gaps of detected artifacts) using state-of-the-art unsupervised/supervised ML approaches.
A preliminary unsupervised ML approach is being evaluated to process the complex SAR multi-physics data. We recognize the recent advancements in the visible spectrum image inpainting by modern ML techniques and attempt to transfer some of these to the Doppler Spectrum domain. In the final stage of the azimuth ambiguity suppression tests, we will frame ambiguity suppression as an inpainting problem and compare weakly supervised ML models to our signal processing approach.
Radar data remains under-utilised because it is still too complex for many potential beneficiaries to analyze and use. The socio-economic impacts of the Copernicus programme could be significantly higher if the data was easier to use. Various government agencies, universities, ICT, GIS and consulting companies would benefit from recent and systematic Sentinel-1 satellite imagery, but the pre-processing steps required to facilitate ease of use and integration with Sentinel-2 data is too complex and time consuming. This is especially the case with interferometric coherence, which is a key variable for change detection. Numerous users prefer to focus on their data analytics, modelling or visualisation tasks instead of doing the pre-processing and image generation by themselves. Therefore, there is a clear need for Sentinel-1 ARD.
The Sentinel-1 ARD API, Web Map Service (WMS) and Web Coverage Service (WCS) by KappaZeta (KZ) will satisfy this need. The SAR expertise of KZ helps to take full advantage of the interferometric and polarimetric data content of Sentinel-1. For the end-user, it means calibrated and noise corrected imagery products with the highest possible spatial resolution using advanced speckle suppression methods. Users can browse imagery and compute parameters about their areas of interest that complement optical data and are available regardless of the weather.
Six Sentinel-1 ARD layers are accessible with one click or one API command.
• Time series of calibrated parcel-level statistics (1) of VH, VV backscatter and 6/12-day repeat pass interferometric coherence covering also parcels with small area and complex shape.
• Deep time stacks of calibrated high-resolution VH, VV backscatter (2) and coherence (3) raster datasets.
• Multi-polarisation SAR backscatter image for visual (4) use.
• AI-generated natural colour (RGB) images (5) based on Sentinel-1 and Sentinel-2 data
• AI-generated NDVI raster images based on Sentinel-1 and Sentinel-2 data (6).
These services are enablers for AI model development, spatiotemporal analysis, and visual interpretation. All services that output raster images conform both to WMS and WCS standards. Parcel level statistics are made available as compressed CSV files or JSON over API. The Committee on Earth Observation Satellites (CEOS) Analysis Ready Data for Land (CARD4L) framework will be followed. The services can be integrated into fully automatic processing chains, information systems, and web map applications. Additionally, these could also be used for ad-hoc model development, visual interpretation, or spatial analysis tasks. KZ Sentinel-1 ARD services can be accessed with general purpose GIS desktop software, machine learning, data mining, and geospatial libraries. KZ Sentinel-1 ARD services have been used by VISTA Remote Sensing in Geosciences GmbH, The ICON Group Ltd., Finnish Food Authority and others.
Processing Synthetic Aperture Radar (SAR) imagery is a time-consuming and computation-heavy activity due to large amounts of data and the complex nature of processing algorithms. With new satellites having improved spatial resolution and coverage, and constellations becoming larger over time due to requirements for more timely acquisition of imagery, the data volume keeps increasing significantly over time. To improve the scalability of processing both temporally and geographically, novel methods for SAR processing need to be applied.
A set of SAR processing tools that utilize GPU-s for processing have been developed by CGI Estonia, and consolidated into the ALUs Toolbox software package. The processing algorithms were selected with input from expert organizations in the academia and industry, and are based on equivalent algorithms from the ESA Sentinels Application Platform (SNAP) toolbox. Particular care was taken to ensure that the results of the GPU processing conformed to the results of SNAP processing in terms of quality, and the outcomes were tested in the Amazon Web Services environment. The initial selection of tools include the generation of coherence and calibrated intensity products from Sentinel-1 SLC imagery. The ALUs software is built using modern C++ and utilizes CUDA for GPU processing.
The latest version of the ALUs Toolbox has been made publicly available and can be found on BitBucket: https://bitbucket.org/cgi-ee-space/alus/src/. During the latest test, for a full Sentinel-1 swath landmass-only scene, the end-to-end processing time was 15.7 seconds for the coherence estimation routine and 5.8 seconds for the calibration routine. As a comparison, coherence routine using SNAP 8 took around 90 seconds on the same images. Details of the processing routines, and the environments where the processing results were achieved and compared, can be found on the aforementioned BitBucket site. It has been identified that the processing speed is heavily affected by the GPU selection, and storage. It has been identified that significantly better performance can be achieved by GPU-s that support FP64 (double) calculations. Moreover, as storage transfer significantly affects the overall end-to-end performance, a high-performance SSD disk is required to store the data.
The optimization tasks and other improvements are being addressed under an ongoing Estonian GSTP activity. Additionally, input has been gathered from organizations working with Machine Learning to support machine-learning use cases for Sentinel-1 and -2. Work is ongoing to add tools to the ALUs Toolbox to support production of analysis-ready optical data with GPU-enabled acceleration. Currently, GPU-accelerated resampling and stacking of optical data is in progress.
Checks by Monitoring (CbM) was introduced in 2018 as an alternative control method for area-based declarations under the European Union’s common agricultural policy (CAP). CbM prescribes the continuous use of Copernicus Sentinel high resolution data for 100% of the Member State’s territory, in contrast to a (2-5%) territorial sample for current on-the spot checks. CbM also provides the basis for the Area Monitoring System (AMS) that is being introduced for the next CAP implementation (2023-2027). The requirement to integrate full territory, continuous use of Sentinel-1 (S-1) and Sentinel-2 (S-2) implies the use of cloud-based compute solutions that are closely coupled to the Sentinel data archives, for which the Copernicus program has set up the Data Access and Information Services (DIAS).
A typical use pattern in CbM is to extract time series statistics from the Sentinel data (both actual and archived) for all parcels in the annual declaration set. Reduction to parcel statistics serves the application of machine learning techniques and detection of time series markers that highlight expected manifestation of crop phenology transitions and soil preparation practices. The number of parcels varies between EU Member States from 60,000 (e.g. Malta) to 10 million (e.g. France). Extracted time series are analyzed in machine learning (ML) routines that are aimed at determining compliance between the declared agricultural practice and the time series characteristics (e.g. does the declared wheat parcel “appear” as a typical wheat case, was a grassland mown). The ML stage typically leads to a large reduction in cases that require further follow up, which may include detailed analysis of the Sentinel image data for the particular parcel (e.g. heterogeneity analysis, segmentation, combination with reference data) and may lead to the triggering of specific actions (e.g. field inspection, interaction with the declarant). Automation of tasks and fully transparent access to the image data and intermediate reductions are prerequisites in this workflow.
Consistent time series extraction and analysis requires the availability of Analysis Ready Data (ARD). While this is currently provided by ESA for S-2 Level 2A (atmospherically corrected) data, this is not the case for S-1. In the CbM context, both calibrated geocoded backscattering coefficients (𝜸⁰, 𝜷⁰) and 6-day coherence are essential inputs to ML applications that require consistent time series over the entire agricultural season. We discuss how we deploy the SNAP s1tbx to generate these S-1 ARD outputs with the use of DIAS Processing-as-a-Service (PaaS) provision. We contrast the PaaS performance with a GPU-based equivalent processing chain (https://bitbucket.org/cgi-ee-space/alus/src/main/ ) that has recently become available and discuss how the greatly accelerated ARD generation will allow us to consider on-the-fly scenarios that no longer require separate storage of ARD. Furthermore, we integrate a “smart approach” to radiometric terrain flattening (RTF) that relies on the excellent orbit stability of the S-1 sensors. We discuss how ML results vary with and without the use of RTF.
Whereas we have benchmarked CbM workflow on each of the 5 DIAS instances (CREODIAS, MUNDI, ONDA, SOBLOO, WEkEO) and encountered only minor differences in implementations, we focus on the use of CREODIAS which uses the CloudFerro cloud infrastructure that is also servicing WEkEO and the German Code-DE platforms and which is federated into the European Open Science Cloud. We have created fully functional backend and frontend modules that address the core processing requirements of the CbM workflow. This workflow has already been extensively tested with a range of EU Member States CbM users, with selected parcel sets that vary in size from 100,000 to 2.5 million features and for annual volumes of S-1 and S-2 ARD data coverages. The backend deploys optimized numerical python code for the parcel time series statistics extraction from S-1 and S-2 ARD stacks integrated in a parallelization environment that orchestrates multiple virtual machines (VMs) using Docker in an on-demand fashion. Parcel time series statistics are stored in a PostgreSQL/Postgis spatial data base in clustered indexed tables to support fast retrieval queries. Flask server components provide access to both the database tables and arbitrary full resolution sub-image selections from the DIAS S3 store via a RESTful interface. The RESTful queries are “consumed” in the frontend analytics, reporting and visualization tasks in support of thematic use cases. ML is currently primarily applied to S-1 ARD, as it provides a consistent calibrated feature set for training, testing and validation that is not dependent on the use of gap-filling and interpolation. Since we can apply ML to very large labelled data sets, for which class labels have a high plausibility, we can successfully use it in our reduction approach, which aims to detect outliers and anomalous cases that are flagged in the ARD time series.
All our code is released under the BSD Clause 3 open source license and accessible at https://github.com/ec-jrc/cbm and explained in a LPS classroom event for “hands-on” use (please check the program for the schedule and details). Application cases are presented in other LPS contributions to thematic and use case sessions.
Interferometric SAR (InSAR) analyses are widely used to provide ground motion monitoring solutions with update frequencies ranging from annual to weekly, offering applications in several fields, from geohazards to civil engineering projects and infrastructure stability assessment. Thanks to the Sentinel-1 constellation, today it is possible to provide updates of ground motion measurements over very large areas after every new satellite acquisition, paving the way for new satellite monitoring solutions. New algorithmic and computational approaches are needed to exploit this new opportunity and create “Continuous Monitoring” applications.
As well known, the main result from a multi-temporal InSAR analysis is a cloud of points where at each point is associated a position, a displacement time series, and different quality parameters. Whenever the analysis is carried out over wide areas, the identification of millions of points, each with a displacement time series and hundreds of samples updated every 6 days (e.g. over Europe), can dramatically increase the data volume to be analyzed. Statistical approaches, both parametric and non-parametric, are currently adopted to filter out unreliable points and make InSAR data handier to users. These approaches are usually based on several assumptions on data distribution and do not scale up with data volume, making the analysis expensive in terms of costs and time.
Machine Learning (ML) approaches are data-driven alternatives that can extract useful information from training datasets. ML solutions can be applied successfully to design new quality check procedures that can significantly speed up the creation of InSAR deliverables and the identification of noisy measurement points. The identification of unreliable points can be framed as a supervised binary classification task requiring experts to create labeled InSAR point cloud datasets discriminating reliable from unreliable points.
In this work, a state of the art method based on a Convolutional Graph Neural Network architecture - developed in the framework of EC H2020 project “DeepCube” in cooperation with the Italian company TECNE - is proposed. Adopting a late-fusion approach, the model is designed to fuse information coming from InSAR analyses with information coming from other layers like Digital Elevation Models (DEM) and Land Cover maps. The network is trained using an expert-labeled dataset and is tested over different areas. The model is then compared with other model alternatives, and an ablation study about the importance of each information layer is made. Since the number of unreliable points is usually orders of magnitude smaller than the number of reliable points, the loss function and the training procedure is modelled to take the strong dataset imbalancing into account. A key requirement considered in this work is scalability: the model needs to be applicable to large-scale datasets. Different optimization techniques are considered to meet this constraint.
To our knowledge, what we present is one of the first applications of Graph Neural Network models for the analysis of InSAR datasets and the first attempt to create a Graph Neural Network model to combine multimodal InSAR data.