Introduction
This presentation describes the data, method, and results of using SAR data and Deep Learning semantic segmentation (pixel-wise classification) for automated sea ice monitoring. The project was performed at MDA, funded by the Canadian Space Agency, and in collaboration with the Canadian Ice Service (CIS). The goal was to investigate how Deep Learning algorithms could be used to automate and improve SAR-based mapping of sea ice and hence provide more powerful tools for monitoring the impact of climate change on Arctic maritime environments.
At Canadian Ice Service (CIS), image analysts form ice charts (ice type, ice concentration) manually by examination of SAR images, and using their contextual knowledge. However, since this process is time-consuming, the ice charts are often limited to shipping routes and near certain communities. A more extensive mapping, enabled by Deep Learning semantic segmentation, would benefit more communities and allow input to climate models. To facilitate this investigation, the archive of RADARSAT-2 imagery over sea ice, together with the corresponding SIGRID ice charts derived by CIS ice analysts, was used to train Deep Learning models to map sea ice.
As a semantic segmentation problem, the mapping of sea ice is challenging because of the large spatial scale of context and features that influence the classification of a pixel. To overcome this problem, an approach to semantic segmentation using multiple spatial scales was investigated.
Data Gathering and Preparation
CIS uses wide-swath dual-pol (HH-HV) ScanSAR data for ice chart construction, whose 500 km swath provides a large spatial scale for ice features and context. Also the 50 m pixel spacing provides a spatial detail that can be important near inlets and communities. A large number of SAR images and corresponding ice chart data was obtained for the project, and prepared for input to the Deep Learning algorithm. This pre-processing step made use of MDA's Deep Learning pipeline, which provided the following capabilities:
- Image Database to index imagery and metadata
- Image Labelling tool to work with large, geospatial data: conversion and geometric transformation of ice chart data
- Exploitation Ready Product tool for pre-processing SAR imagery: conversion to gamma-zero, dealing with blackfill, geometric transformation
- Dataset Creation tool to create image chips and rasterized label chips for input to Deep Learning algorithm
The project used data from four regions in the Canadian arctic, containing a variety of ice types:
- Middle Arctic Waterways (MID) - 2017 Jul to Oct
- Newfoundland (NFLD) - 2016 Dec to 2017 Jun
- Western Arctic (WA)- 2017 Jun to Nov
- Foxe Basin (FOXE) - 2017 May to 2018 Mar
There were about 30 to 40 ScanSAR image frames for each region.
A CIS ice chart data contains much information about the concentration and the type of the sea ice, which needs to be converted into labels for input to the Deep Learning algorithm. For purposes of this project, we first considered the classification into ice or water, where ice was defined as an ice concentration above 20%. This ice-water classification, at the spatial resolution of the SAR image, provides a detailed description of the ice edge. We also considered the estimation of ice concentration, by classifying the ice into categories corresponding to 20% steps of ice concentration.
The image and ice label data was broken into image chips for input to the Deep Learning model, where the chip size was 512 by 512 pixels, or about 25 km by 25 km, and the chips overlapped by 100 pixels. The image chips are 2-channel chips for the HH and HV polarization. There were between about 8,500 to 21,000 chips per region. The data chips were organized according to image acquisition. For the investigation of the Deep Learning model, the data was split into separate training, validation, and test data sets. The data split was by image, so that all chips from an image were assigned to either training, validation or test, in order to avoid validation on chips that may be similar to training data. The training data was also used to compute image mean and standard deviation, for use in normalizing the input to the Deep Learning algorithm.
Deep Learning Algorithm
The Deep Learning model was implemented using the TensorFlow Estimator framework. The ingest to the model was provided by a generator that read image and label chips, normalized image data, converted label data to the appropriate values for the classification, and computed a sample weight array. The sample weight array is set to zero over land, around the edges, and at NaN image values, and is used to compute the weighted loss function for training and the weighted metrics for evaluation.
The Deep Learning model was based on the DeepLab model for semantic segmentation, which uses dilated convolutions to provide large convolution kernels for context, without decimation of the image. The DeepLab model is also built using ResNet blocks which improves training for large networks.
For an input image chip, the output of the model is an array of predicted label values which has the same size as the input chip. During training, the predicted labels and the true labels are weighted by the sample weight array, and then used to compute the Cross Entropy loss function for optimization of the model weights. During validation, the predicted labels and the true labels are weighted and used to compute the accuracy and the mean intersection-over-union, which is a common metric for semantic segmentation.
In addition to the original Deep Learning model for semantic segmentation, a new approach was developed to apply Deep Learning at multiple spatial scales. This was done to overcome the problem of sea ice features that extended beyond the field-of-view of the model that could be provided by a single chip. In this approach the Deep Learning model takes multiple inputs, one of which is the original image chip, the other is the result of classification using down-sampled data, which provides a larger context.
Results
The performance of the DeepLab model for ice-water classification was assessed. First, the data for each of the four regions was used separately to train and evaluate different models. The accuracy and mean intersection-over-union (mIOU) are:
Mid-Arctic: accuracy = 0.9175, mIOU = 0.8322
Newfoundland: accuracy = 0.9582, mIOU = 0.9194
Western Arctic: accuracy = 0.9675, mIOU = 0.8869
Foxe Basin: accuracy = 0.9334, mIOU = 0.8611
Then, the data was combined to train one combined model, and the combined model was used to initialize the training for each region, in a process called fine-tuning. The fine-tuning provided the best result, which compared to separate training improved accuracy on average by about 1%, and improved mean intersection-over-union by about 2%. The figure shows an example of the ice-water classification in the Western Arctic region, using the fine-tuning strategy. The figure shows the HH and HV images, the true labels, and the predictions.
The performance of the DeepLab model for ice concentration classification was assessed. Each of the 5 classes represented a different ice concentration: 0-20% (water), 20-40%, 40-60%, 60-80%, and 80-100% (ice). Since the most of the image pixels were ice or water, with relatively few pixels at intermediate concentrations, there was less data for training these intermediate concentration classes. The effect of this limitation can be seen in the results. Whereas the mIOU values for the ice and water classes were above 0.75 for most of the regions, the mIOU values for the intermediate concentrations were typically below 0.2.
Finally, the approach of using multiple inputs at different spatial scales for semantic segmentation was investigated. This was done for the case of ice-water classification, using separate training for each region. The steps in this approach are:
- down-sample the original image
- train a model using the down-sampled data
- form predictions on the down-sampled data
- for each original resolution chip, extract the prediction on the down-sampled data corresponding to the same area
- train a model using both the original resolution chip, and the down-sampled prediction
The multiscale approach was investigated using 4-times down-sampling, using the Mid-Arctic and Newfoundland regions. The results on ice-water classification are:
Mid-Artic: accuracy = 0.9320, mIOU = 0.8613
Newfoundland: accuracy = 0.9602, mIOU = 0.9243
In particular, there were certain image frames for which the original performance was poor due to a lack of context within a single chip. This was especially true for some of the Newfoundland data. For these difficult images, the improvement in accuracy and mIOU using the multi-scale approach was over 7%.
Conclusion
The performance of the algorithm is very promising, indicating that Deep Learning semantic segmentation has a lot of potential to aid in the automation and improvement of sea ice mapping.
In times of rising world population, increasing use of agricultural products as energy sources, and climate change, the area-wide monitoring of agricultural land is of considerable economic, ecological, and political significance. Crop type information is a crucial requirement for yield forecasts, agricultural water balance models, remote sensing based derivation of biophysical parameters, and precision farming. To allow for long enough forecast intervals that are meaningful for agricultural management purposes, knowledge about types of crops is needed as early as possible, i.e. several months before harvest. Thus, such early-season crop-type information is relevant for a variety of user groups such as public institutions (subsidy control, statistics) or private actors (farmers, agropharma companies, dependent industries).
The identification of crop types has been a long research topic in remote sensing, starting from mono-temporal Landsat scenes in the 1980s to multi-sensor satellite time series data nowadays. However, most often crop types are identified in a late cultivation phase or retrospectively after harvest. Existing products are mainly static and not available in a timely manner and therefore cannot be included in the decision-making and control processes of the users during the cultivation phase.
We currently develop a web-based service for dynamic intra-season crop type classification using multi-sensor satellite time series data and machine learning. We make use of the dense time series the Copernicus Sentinel satellite fleet offers and combine optical (Sentinel-2), and SAR (Sentinel-1) data providing detailed information about the temporal development of the phenological state of the crop growing phases. This synergetic use of optic and radar sensors allows a multi-modal characterization of crops over time using passive optical reflectance spectra and SAR-based derivatives (i.e. backscatter intensities and structural parameters derived by polarimetric decompositions). The automatic data processing pipeline of data retrieval, data pre-processing, and data preparation as a prerequisite for applying machine learning algorithms is based on open-source tools using SNAP and python libraries as main functionalities.
The developed AI-based model uses the multi-modal remote sensing time series data stream to predict crop types early in their growing season. This model is based on previous work by Garnot et al. 2020, who leverage the Attention mechanism originally introduced in the famous Transformer architecture in order to better exploit the information about crop type included in the change of appearance in satellite images over time. The original model focuses on prediction of crop type based on Sentinel-2 acquisitions from within a single Sentinel-2 tile, which leads to very similar acquisition time points for all parcels in the dataset. This is not the case when applying the model to larger regions or including other data sources, such as Sentinel-1 (polarimetric and backscatter), which have vastly different acquisition time points.
By implementing a modified form of positional encoding, we are able to train and predict on regions and data sources with differing acquisition time points as we provide implicit information about the acquisition time point directly to the model. This means we don’t need any temporal data preprocessing (e. g. weekly/monthly averages) and allows us to seamlessly fuse data from different sources (Sentinel-1 and Sentinel-2), leading to good prediction performance also in periods where there is no Sentinel-2 data available due to cloud occlusion.
In order to improve the generalisation abilities of our model across regions and different years, we also study the effect of fusing satellite data with geolocalised temperature and precipitation measurements to account for the dependence of growth periods on these two parameters.
We will present insights on the developed dynamic crop type classification service based on the use case for the federal states Mecklenburg-Vorpommern and Brandenburg in Germany. For both states, official reference information from the municipalities about the cultivation information for approx. 200,000 fields are used for training and testing the algorithm. Predicted crop types are winter wheat, winter barley, winter rye, maize, potatoes, rapeseed, and sugar beet. We will show model performances in different cultivation stages (from early season to late season) and with different remote sensing data streams by using Sentinel-1 or Sentinel-2 data separately or in conjunction. Moreover, the transferability of the approach will be evaluated by applying a trained model of one year to other years not included in the training phase.
Worldwide economic development and populations growth has led to unprecedented urban area change in the 21st century. Many changes to urban areas occur in a short period of time, raising questions as to how these changes impact populations and the environment. As satellite data become available at higher spatial (3-10 m) and temporal (1-3 days) resolution, new opportunities arise to monitor changes in urban areas. In this study, we aim to detect and map changes in urban areas associated with anthropogenic processes (for example, constructions) by combining high-resolution Sentinel-2 data with deep learning techniques. We used the Onera Satellite Change Detection (OSCD) dataset containing Sentinel-2 images pairs with changes labeled across 24 locations. We advanced the OSCD by implementing state-of-art algorithms for atmospheric correction and co-registration of Sentinel-2 images, as well as improve the deep learning model by introducing a modified loss function. A new test area, named Washington D.C., was used to demonstrate the model’s robustness. We show that the performance of the model varies significantly depending on the location: F1 score ranges from 3.37% in West Saclay (France) to 74.16% in Rio (Brazil).
The developed model was applied to mapping and estimating area of changes in several metropolitan areas, including Washington D.C. and Baltimore in the US in 2018-2019 and in Kyiv, Ukraine, 2015-2020. A sample-based approach was adopted for estimating accuracies and areas of change. Stratified random sampling was employed, where strata were derived from change detection maps. Since in our cases, the area of no change would account for >99% of the total area, the corresponding weight would influence the derived uncertainties of area estimates (for example, see Eq. (10) in Olofsson et al. (2014))—the larger the stratum weight, the more considerable uncertainties of estimated areas of change. Therefore, a spatial buffer of 20 pixels (at 10 m) was introduced to include areas of no change around areas of change. The main goal of introducing the buffer was to mitigate the effects of omission errors, which would lead to large uncertainties in change area assessment (Olofsson et al., 2020). Overall, 500 samples were used for each location (DC, Baltimore and Kyiv), with 100 samples allocated for the change stratum, 100 samples for the buffer stratum, and the rest 300 samples allocated for the no-change stratum. In terms of response design, a 10-m pixel was selected as an elementary sampling unit. The reference data source was corresponding imagery available in Google Earth.
We estimated that in just one year, 2018-2019, almost 1% of the total urban area in DC and Baltimore underwent change: 10.9±4.3 km2 (0.85% of the total area in DC) and 10.8±2.2 km2 (0.92% in Baltimore). Among detected changes, active constructions (those that can be seen in the 2019 imagery) accounted for 78% and 86% in DC and Baltimore, respectively, while the rest represented the completed constructions. Commercial buildings accounted for 52% and 46%, and residential buildings accounted for 27% and 21%. Worth noting that 8-9% of detected changes in DC and Baltimore occurred due to the construction of new schools or renovation of existing ones. This high number can result from the growing density population and the number of residential properties being built (as shown in this study), as such overcrowded school buildings requiring renovations. Another type of change identified was the construction of parking lots next to commercial buildings, roads, and hospitals.
In Kyiv, the area of change was estimated at 17.0±2.8 km2 between 2015 and 2020 that constituted 2.1% of the total area. Active constructions accounted for 38%, while the rest were visible finished constructions. Constructions with primary residential land use zoning accounted for 40%, while commercial buildings accounted for 15%.
This study highlights the importance of the overall framework for urban area change detection: from building models using benchmark datasets to actual mapping and change area estimation through sample-based approach, which provides unbiased estimates of areas.
Being able to reconstruct 3D building models with a high Level of Details (LoD) — as described by the CityGML standard from the Open Geospatial Consortium — from optical satellite images is needed for applications such as urban growth monitoring, smart cities, autonomous transportation or natural disaster monitoring. More and more Very High Resolution (VHR) data having a world wide coverage are available, and new VHR missions for Earth Observation are regularly launched as their cost decreases. One key strength of satellite imagery over aerial or LiDAR data is its high revisit frequency allowing to better track urban changes and update their 3D representations. However, extracting and rendering buildings with finer precision, notably regarding their contours, in an automatic fashion still remain a challenge for lower spatial resolution (1 m to 50 cm), as well as minimizing manual post-processing and enabling large scale application.
This work presents an end-to-end LoD1 automatic building 3D reconstruction pipeline from multi view satellite imagery. The proposed pipeline is composed of different steps including a multi-view stereo processing with the French National Centre for Space Studies (CNES) tool named CARS to extract the Digital Surface Model (DSM) followed by the generation of the Digital Terrain Model (DTM) from the DSM by an image implementation of the Drap Cloth algorithm. A building footprint extraction step is carried out through semantic segmentation with a deep learning approach. These footprints are vectorized, regularized with a data-driven approach and geocoded so to form the LoD0 reconstruction model. Eventually, the regularized shapes are extruded from the ground to the gutter height to lead to the LoD1 reconstruction model. Attention is drawn to the building contour restitution quality.
Our proposed pipeline is evaluated on Pleiades multi-view satellite images (GSD of 50cm). Deep learning networks are trained on Toulouse, France, and the pipeline is evaluated on several other French urban areas which exhibit spectral, structural and altimetric variations (such as Paris). Building footprints groundtruth are extracted from the OpenStreet Map project. State of the art results are achieved for building footprints extraction. A good generalization capacity is highlighted without need of a new training for machine learning models.
Human activity is the leading cause of wildfires, however, lightning can also contribute significantly. Lightning ignited fires are unpredictable and identifying the relationship between lightning and its causes for ignition is useful for fire control and prevention services, who currently assess this danger mainly based on forecasts of local lightning activity. In general, there is a relationship between fire ignition and fuel availability and dryness, but the environmental conditions needed for a lightning ignition was unknown. This research looked at developing a global artificial intelligence (also known as machine learning) model which would be triggered by cloud to ground lightning activity and then examine the current weather and ground conditions to identify lightning flashes which could potentially cause fires, adding valuable information to already existing regional early warning systems.
In this research we created three different machine learning models called classifiers to identify where lightning flashes could be a hazard. This model type assigns a label to a series of information given and for these research models, it assigned one of two: lighting-ignited fire hazard or no hazard. All models were developed on the decision tree concept, a basic model, where you follow a series of questions, answering true or false until you reach a decision. The models developed were a singular decision tree, Random Forest and Adaboost, with the latter two methods using many different decision trees to create an answer. The models were built on environmental data gathered from assumed past lightning ignited fires events identified by combining active fire information with a cloud to ground lighting forecast product.
Original test data showed promising results of around 78% accuracy for the multiple decision tree methods, leading to an independent verification 145 lightning ignited fires in Western Australia in 2016. This highlighted that in a minimum of 71% of the cases the models correctly predicted the occurrence of lightning caused fire. Due to the success of the models, further research is planned with the current models to be used in an operational context to enhance information connected to fire management.