Description:
Super resolution technologies are increasingly being employed to increase the spatial resolution of EO data.
There is however little consensus on how to best evaluate the outputs of super resolution algorithms.
Qualitative, spectral or radiometric characteristics are commonly compared to the original source data, but do not provide a comparable basis for assessing the value of a super resolution algorithm.
This forum will discuss the following questions:
- What are the commonly applied metrics to assess the output of SR algorithms?
- How should SR algorithms be ideally assessed in terms of the super resolved data
- Is there an opportunity for a consensus for performance metrics to be used?
Spatio-temporal fusion in the frame of Sentinel-HR
CNES is currently conducting a phase 0 study for a mission called Sentinel-HR, which aims at providing a higher spatial resolution complement to Sentinel2 or its next generation Sentinel2 NG. The mission would acquire imagery at 2 meters resolution every 20 days, in the four 10 meters bands of Sentinel2 (B2, B3, B4 and B8A) and with the same characteristics as Sentinel2: nadir viewing angle, always on instrument, global coverage. The rationale behind these specifications is that while some applications require both revisit and high resolution, it would be very expensive to acquire 2 meters images at the frequency of Sentinel2 (5 days), or Sentinel2 (NG) (2 or 3 days). On the other hand, changes are mostly driven by radiometry and phenology, while the geometric structure of the landscape remains more stable (e.g. roads, crops limits, buildings . . . ). It therefore makes sense to acquire high resolution details on a less frequent basis.
Yet, because of cloud occurrences, the actual revisit of Sentinel-HR would be way higher than 20 days in some locations, and some users are also interested in both the high resolution and the very frequent revisit. It would therefore make sense to merge information from the Sentinel2 and Sentinel-HR time-series in order obtain a high-resolution, high-revisit synthetic time-series, an operation which is referred to as spatio-temporal fusion in the literature. In this work, which was part of the phase 0 study at CNES, we compared several methods in order to achieve the best high-resolution, high-revisit time-series on a large representative dataset.
Data
In order to simulate joint times-series of high resolution acquisitions every 20 days and intermediate medium resolution acquisition with corresponding high resolution references, we leveraged the syn- ergy between Sentinel2 and Venμs on the full L2A archive distributed by Theia (www.theia.cnes.fr). The French and Israeli micro-satellite Venμs provides constant viewing angle observations of a se- lection of sites every 2 days with spectral bands close to the Sentinel2 bands , with a spatial resolution of 5 meters [Dedieu et al., 2018]. We can therefore look for an existing Venμs image within 2 days of every Sentinel2 image we select. Both L2A products come from the MAJA pro- cessing chain [Lonjou et al., 2016], which ensures good data consistency and quality. To correct for difference in spectral sensitivities and residual directional effects, a linear regression is performed so as to bring Sentinel2 surface reflectances closer to the Venμs ones. A spatial registration is also performed.h Sentinel2 images are then down-sampled to 25m (or 12.5m) to achieve a resolution ratio of 5 with respect to Venμs, similar to the ratio that would occur between Sentinel-HR and Sentinel2. It should be noted that images are selected without any filtering on cloud cover, as would be real data of the mission. Cloud masks estimated by MAJA are therefore used during prediction and evaluation. Using this process, we generated joint guides and targets time series over 7 different Venμs sites spanning 3 months to a full year depending on the site.
Methods
Since the seminal work on STARFM [Gao et al., 2006], a large amount of literature has been devoted to spatio-temporal fusion [Belgiu and Stein, 2019]. However, to the noticeable exception of [Kwan et al., 2018], a lot of work on spatio-temporal fusion focuses on the same Landsat8 and Modis dataset, or uses synthetic data by aggregating the high resolution data to the lower resolution. Such datasets are not representative of the Sentinel-HR and Sentinel2 (NG) fusion problem in terms of resolutions ratio and data quality, and methods developed for their fusion might therefore not be optimal in our case. In order to take a step back, we included in this survey several methods that do not come from the spatio-temporal fusion domain. First, we included naive methods, such as temporal interpolation of the Sentinel-HR images or spatial interpolation of the Sentinel2 images. We also included CARN, a method from the Single-Image Super-Resolution field [Anwar et al., 2020]. Last, we included our own machine learning based method, which performs data-driven interpolation similar to [Lutio et al., 2019] by producing weights for linear interpolation of the high-resolution series with a Multi-Layer Perceptron network.
Preliminary results
Our benchmark works as follows. Given the joint time-series of high-resolution dates (called guides) and low-resolution dates (called targets), we use each candidate method to make a high resolution prediction for each target date, and compute a set of metrics with respect to the high resolution reference available at the target date. The set of computed metrics include all the traditional image quality metrics computed on each spectral band, plus the same metrics computed on the NDVI spectral indice. Additionally, 90% and 99% percentile of the absolute error are also computed. We generated our first results over the ARM Venμs site, an agricultural landscape in North America. Among the different metrics, we focused on the 90% percentile of the absolute error of the NDVI as a proxy for radiometric precision, and on the structural error of the red band, as a proxy for high resolution details accuracy. We observed that temporal interpolation is the most accurate in terms of spatial details but provides the worst radiometric performances. As expected, simple bicubic zoom is the worst method in terms of geometric accuracy but shows more stable performances than temporal interpolation in terms of radiometry. All other methods perform similarly better on radiometric accuracy. On the geometric accuracy side, we observed that CARN algorithm improves on bicubic zoom (which is expected) but remains worse than the historical STARFM algorithm. Our Data Driven Interpolation algorithm seems to perform better than STARFM on this case.
Perspectives
With this work, we aim at giving insight on how Sentinel-HR and Sentinel2 (NG) could be com- bined to form a high resolution, high revisit time series. We plan to include more complex methods from the spatio-temporal fusion domain and perform a more detailed metric analysis, by strat- ifying pixels according to their gradient strength or distance to the last clear pixel, on different kind of landscapes. Our final goal is to provide guidance on the trade-of between methods cost and complexity and expected accuracy gains. We also intend to publish our dataset under a free and open data licence in order to encourage research on spatio-temporal fusion for this range of resolutions.
References
[Anwar et al., 2020] Anwar, S., Khan, S., and Barnes, N. (2020). A deep journey into super- resolution: A survey. ACM Computing Surveys (CSUR), 53(3):1–34. [Belgiu and Stein, 2019] Belgiu, M. and Stein, A. (2019). Spatiotemporal image fusion in remote sensing. Remote sensing, 11(7):818.
[Dedieu et al., 2018] Dedieu, G., Hagolle, O., Karnieli, A., Ferrier, P., Crébassol, P., Gamet, P., Desjardins, C., Yakov, M., Cohen, M., and Hayun, E. (2018). Venμs: Performances and first results after 11 months in orbit. In IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, pages 7756–7759.
[Gao et al., 2006] Gao, F., Masek, J., Schwaller, M., and Hall, F. (2006). On the blending of the landsat and modis surface reflectance: Predicting daily landsat surface reflectance. IEEE Transactions on Geoscience and Remote sensing, 44(8):2207–2218.
[Kwan et al., 2018] Kwan, C., Zhu, X., Gao, F., Chou, B., Perez, D., Li, J., Shen, Y., Koperski, K., and Marchisio, G. (2018). Assessment of spatiotemporal fusion algorithms for planet and worldview images. Sensors, 18(4):1051.
[Lonjou et al., 2016] Lonjou, V., Desjardins, C., Hagolle, O., Petrucci, B., Tremas, T., Dejus, M., Makarau, A., and Auer, S. (2016). Maccs-atcor joint algorithm (maja). In Remote Sensing of Clouds and the Atmosphere XXI, volume 10001, page 1000107. International Society for Optics and Photonics.
[Lutio et al., 2019] Lutio, R. d., D’aronco, S., Wegner, J. D., and Schindler, K. (2019). Guided super-resolution as pixel-to-pixel transformation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8829–8837.
AI4EO is an ESA-supported platform for organization of the Earth Observation-related artificial intelligence challenges. It provides start-to-end environment for anyone who would want to learn, participate of compete in development of novel machine learning methods. In September 2021 the "AI4Sentinel2 Challenge" ended, which was focusing on mapping cultivated land map at 2.5-meter spatial resolution from the native 10-meter per pixel resolution Sentinel-2 time-series. It was about developing super-resolution technologies in order to identify small or narrow agricultural areas. The challenge attracted one hundred teams, demonstrating that the topic is of high interest, which is why the AI4EO board decided to re-classify the challenge as a permanently open one[1], allowing the teams to continuously improve the methods using well-prepared label and validation data as well as services and start-to-end example Jupyter Notebook. These materials can be used outside of the AI4EO platform and can serve as a standard dataset on which to fine-tune super-resolution methods.
The materials include input data (Sentinel-2 L2A time-series at 10m resolution for all twelve bands, for a period of March 1st to September 1st 2019 over the area of Republic of Slovenia) and labels (100 tiles of 5 by 5 km with cultivated mask at 2.5-meter resolution). There are additional 25 tiles with labels available to be used for validation of the results. In addition to the data themselves there is also a start-to-end Jupyter Notebook available, using open-source eo-learn[2] library to load and manipulate with the data, allowing machine-learning experts to focus to the actual methodology, rather than typical remote sensing problems. There is even a Euro Data Cube[3]-powered compute environment available for those who don't have their own resources available.
We will present the overall challenge of extracting super-resolved results from the Sentinel-2 time-series, standard label dataset, the further opportunities of the permanently open challenge as well as our experiences on how to prepare such standard dataset to prevent over-fitting.
References:
[1] https://platform.ai4eo.eu/enhanced-sentinel2-agriculture-permanent
[2] https://github.com/sentinel-hub/eo-learn
[3] https://eurodatacube.com/
CO3D is an Earth observation mission by the Centre national d’étude spatiales (CNES) aiming at providing a worldwide accurate Digital Surface Model (DSM). For this purpose, 3D photogrammetric reconstruction from pairs of satellite optical images will be employed. CO3D, french acronym for Constellation Optique 3D, will be composed by no less than four optical satellites that will provide at least two simultaneous acquisitions of the same scene. In this way, temporal differences will be minimized, allowing a more accurate stereo matching as well as an automatic production of DSMs on a global scale. Such a 3D detailed information in DSM format is strategic for growing applications in space field downstream, from 3D city mapping to catastrophic event damage assessment, to the incoming smart city market. With such premises, it goes without saying that the DSM generation pipeline is key, and the CNES, in collaboration with CS Group, is developing two open source tools at this purpose: CARS, a multiview stereo pipeline that from stereo pairs generates the corresponding DSM; Pandora, independent yet integrated in CARS, which is in charge of the stereo matching step from rectified images.
However, DSMs produced with current technology suffer from poor quality in urban areas. Indeed, even using very high resolution images, there is too little information at human object scale to be enough accurate during the stereo matching step, generating in turn disparity maps that poorly reproduce objects of very definite shape such as buildings. These errors are inherited by the DSM which returns an unsatisfactory 3D reconstruction.
To address this issue, one solution may be to artificially increase image resolution beyond the sensor limits. A denser sampling shall lead to a more precise disparity identification. One could simply use an interpolation technique (bicubic upsampling is the most used one) but this does not introduce any spectral structure that might be useful for the matching algorithm. On the other hand, super resolution (SR) algorithms are designed to recover high frequencies from low ones, introducing significant information in a scene characterized by strong and frequent discontinuities such as a city. State-of-the-art methods relying on Deep Neural Networks have shown remarkable results in this sense. The assumption is that such a spectral information can enhance the stereo matching step, increasing the confidence and the accuracy of the disparity estimation. The aim of this work is therefore to assess the contribution of single image SR Deep Learning techniques to the stereo matching and DSM generation in an urban context with satellite data, highlighting potential advantages and limitations that can emerge when introducing this technology in a multiview stereo pipeline, such as CARS-Pandora. Few similar experiences have been found in literature, in general leaving room for improvement for what concerns both super resolution model training and resulting DSM assessment. The proposed contributions are: a SR training dataset generation procedure that addresses a specific sensor model (i.e. Pléiades), realistically simulating most of the steps of its space and ground segment image pipeline; a local analysis of the consequences of deep learning SR on the stereo matching for remote sensing data, specifically targeting the disparity estimation via similarity measures of homologous neighbourhoods.
Keywords : Super resolution, Digital Surface Model, Stereo-Matching, Deep Neural Networks, Satellite Image Simulation, CO3D
The small-scale variations (at the order of 10km) of sea-ice thickness have a strong influence on heat flux, and are thus important to be determined in order to predict air-sea interactions. Current satellite products provide the sea ice thickness at a low resolution (order of 90 km) and other quantities (e.g., sea ice concentration and sea-ice deformation) at higher resolution. But within the pixel size of a satellite sea-ice product, the distribution of actual sea-ice thickness can vary greatly and have a huge impact on the heat flux computation.
On the other hand, the neXtSIM model gives realistic representations of sea ice thickness at a very high resolution. In this work, we use the neXtSIM model outputs to train a super-resolution neural network that reconstructs high-resolution sea-ice thickness (at 12 km) from observable features: low-resolution sea-ice thickness (90km), sea-ice concentration (12 km), and sea-ice deformation (60 km or 10 km).
The neural network model is first trained and validated on neXtSIM data. Two configurations are considered: (i) high-resolution deformation input data, as derived from SAR data, (ii) low-resolution deformation input data, as derived for passive microwave data which are less accurate than the one derived from SAR but have better coverage.
The two configurations are then tested on real satellite data from January 2021. The high-resolution sea-ice thicknesses estimated by the neural network are then validated against Cryosat 2 data, which provides sea-ice thickness estimation at a very high-resolution but with very sparse coverage. The results show that the neural network is able to reconstruct small scale variation of the sea-ice thickness on the whole domain, and the distribution of sea-ice values are better represented as in the low-resolution sea-ice thickness, especially for thin ices which have the more impact on heat fluxes.
High resolution air quality measurements: from city to neighbourhood level, combining Sentinel-5P, CAMS models and in-situ measurements
Abstract
Air quality has significantly degraded over the last century and is now considered one of the leading causes of death worldwide [1]. Human activities and growing population in urban areas [2] impose significant environmental pressure on surrounding ecosystems including the degradation of surrounding air quality. The understanding of the air quality drivers and their dynamics is the key to further develop efficient risk mitigation strategies. The World Health Organisation has recently updated their air quality guidelines which underlines the adverse effects of air pollution such as mortality due to major air pollutants especially those related to the transportation sector(NO2 and SO2).
The satellite imagery obtained through the European Copernicus programme (i.e. through Sentinel-5P) and the Copernicus Atmosphere Monitoring Service’s (CAMS) European air quality forecast [3] allow us to highlight the concentration of pollutants like NO2 and PM2.5 near the surface of the earth with a resolution of 0.1 degrees (around 10km at the equator level). This is a valuable source of information that allows the monitoring of air quality efficiently on a city level.
This paper focuses on downscaling Sentinel-5P observations and derived data from their original resolution to a 1km*1km resolution. The objective is to move from city level to neighbourhood level in urbanized areas to support the identification of air quality drivers and link it to human activities.
Two approaches are proposed to downscale level 2 NO2 observations from Sentinel-5P.
The first approach is a statistical model, based on a probability map which is built using the land cover information. The probability map is closely related to the physical nature of the underlying land cover. For instance, provided that NO2 is directly linked to fossil fuel combustion, in urban areas, the probability map attributes a higher coefficient for roads compared to buildings. The objective of this statistical model is to stay closely tied to the underlying physics of the area of interest. The observations from CAMS are then projected on the probabilistic model and the output is compared to the in-situ validation data coming from the open data available in major cities (e.g. Airparif data from Paris).
The second approach explores the use of in-situ data to train a machine learning model. The first physical statistical model is used as one of the inputs to the machine learning model, which is further augmented with meteorological data (temperature, precipitation and wind) and topographic data (i.e. digital elevation model). This approach allows the identification of the major drivers to NO2 concentrations in a given area, and subsequently allows training a high resolution model to downscale the satellite measurements.
Finally, the generalization of the developed approach is explored to apply the same methodology on different geographical locations. The performance of the developed models is assessed by comparing with available in-situ validation measurements.
[1] https://www.who.int/news/item/22-09-2021-new-who-global-air-quality-guidelines-aim-to-save-millions-of-lives-from-air-pollution
[2] https://www.nature.com/articles/s41598-020-74524-9
[3] https://ads.atmosphere.copernicus.eu/cdsapp#!/dataset/cams-europe-air-quality-forecasts?tab=overview
We want to demonstrate the potential of feature level data fusion for enhancing the spatiotemporal resolution of Sentinel products by using the example of the water vapour product.
Water vapour is the gaseous state of water. It is an important constituent of the Earth's atmosphere, its percentage varying between 0.01% and 4.24%. More than 99% of water vapour is located in the troposphere. It is also an important greenhouse gas as it is absorbing thermal infrared radiation.
Currently, there is an official water vapor product generated by the Sentinel-3 mission (Integrated Water Vapour Column, IWV). The spatial resolution of this data product is approximately 1.2km x 1.2km on ground for the nominal morning acquisitions. The revisit time of the OLCI (Ocean and Land Colour Instrument) instrument on board of Sentinel-3 is less than 2 days, as the pair of Sentinel-3A and Sentinel-3B is now in orbit.
For Sentinel-5P, a pre-operational water vapour processor (Total Column Water Vapour, TCWV) has already been implemented in the framework of the ESA S5P PAL project. The TROPOMI instrument on board of S5P provides early afternoon acquisitions with a daily global coverage and a spatial resolution of 5.5 x 3.5 km.
As Sentinel-5P has a higher revisit time but a lower spatial resolution than the Sentinel-3 mission, in this work we combine the S5P TCWV data with S3 IWV data in order to create a data product with increased spatiotemporal resolution. This is achieved by methods of machine learning for data fusion. The main challenges are the large gaps in both data sets when compared to the envisaged resulting information density and the high spatiotemporal dynamic of the water vapour feature. To overcome these obstacles and to interpolate and integrate the two heterogeneous data streams into a standardised four-dimensional grid, feature level data fusion based on supervised learning is applied.
The new combined product will provide more detailed and consistent water vapour information and can be used for various applications in atmospheric monitoring, ecohydrology and climate science.