Studies of land-cover and land-use change (LCLUC) on a global scale became possible when the first satellite of Landsat series was launched 50 years ago. Since then, land-change science has been rapidly developing to answer the questions on where changes are occurring, what is their extent and over what time scale, what are their causes, their consequences for ecosystems and human societies, their feedbacks with climate change, and what changes are expected in the future. LCLUC studies use a combination of space observations, in situ measurements, process studies and numerical modeling. To get the most out of current remote sensing capabilities researchers strive to utilize data sources, different in space and time resolution and in electromagnetic range. Fusing observations from optical sensors with radar data helps filling the cloud-induced gaps in optical data. The goal is to develop multi-sensor, multi-spectral methods to increase the spatiotemporal coverage and to advance the virtual constellation paradigm for moderate spatial resolution (10-60m) land imaging systems with continental to global scale coverage. Also, the use of commercial satellite very high (meter) resolution is accelerating with more data becoming available and accessible. On the other hand, socioeconomic research plays an important role in land-change science and includes analyses of the impacts of changes in human behavior at various levels on land use. Studies of the resultant impacts of land-use change on society, or how the social and economic aspects of land-use systems adapt to climate change are becoming more and more important as the climate crisis issues draw increasing attention.
The NASA LCLUC Program is developing interdisciplinary approaches combining aspects of physical, social, and economic sciences, with a high level of societal relevance, using remote sensing tools, methods, and data. The Program aims at developing the capability for annual satellite-based inventories of land cover and land use to characterize and monitor changes at the Earth’s surface to improve our understanding of LCLUC as an essential component of the Earth System. The Program currently focuses on detecting and quantifying rapid LCLUC in hotspot areas and examining their impact on the environment and interactions with climate and society. This talk will summarize the Program’s achievements during the 25 years since its inception with an emphasis on the most recent findings. It will describe the synergistic use of multi-source land imaging data including those from the instruments on the International Space Station. The examples will cover various land-cover and land-use sectors: forests, grasslands, agriculture, urban and wetlands.
The existing CCI Medium Resolution land cover (MRLC) product delineates 22 primary and 15 secondary land cover classes at 300-meter resolution with global coverage and an annual time step extending from 1992 to the present. Previously, translation of the land cover classes into the plant functional types (PFTs) used by Earth system and land surface models required the use of the CCI global cross-walking table that defines, for each land cover class, an invariant PFT fractional composition for every pixel of the class regardless of geographic location.
Here, we present a new time series data product that circumvents the need for a cross-walking table. We use a quantitative, globally consistent method that fuses the 300-meter MRLC product with a suite of existing high-resolution datasets to develop spatially explicit annual maps of PFT fractional composition at 300 meters. The new PFT product exhibits intraclass spatial variability in PFT fractional cover at the 300-meter pixel level and is complementary to the MRLC maps since the derived PFT fractions maintain consistency with the original land cover class legend. This was only possible by ingesting several key 30m resolution global binary maps like the urban, the open water, the tree cover, the tree height while controlling their compatibility thanks to the MRLC maps.
This dataset is a significant step forward towards ready-to-use PFT descriptions for climate modeling at the pixel level. For each of the 29 years, 14 new maps are produced (one for each of 14 PFTs: bare soil, surface water, permanent snow and ice, built, managed grasses, natural grasses, and trees and shrubs each split into broadleaved evergreen, broadleaved deciduous, needleleaved evergreen, and needleleaved deciduous), with data values at 300-meter resolution indicating the percentage cover (0–100%) of the PFT in the given year.
Based on land surface model simulations (ORCHIDEE and JULES models), we find significant differences in simulated carbon, water, and energy fluxes in some regions using the new PFT data product relative to the global cross-walking table applied to the MRLC maps. We additionally provide an updated user tool to assist in creating model-ready products to meet individual user needs (e.g., re-mapping, re-projection, PFT conversion, and spatial sub-setting).
Driven by advancements in capabilities of satellite data acquisition and processing and continued interests in monitoring the earth’s surface for a variety of needs, global land cover (GLC) mapping efforts have seen accelerated progress. As such, several GLC maps have been produced with increased temporal resolution containing annual updates and with increased spatial resolution e.g., at 10m resolution. However, the validation of GLC maps has not kept up the same pace as the map generations. Most GLC maps are validated using statistically rigorous accuracy assessment methods following internationally promoted guidelines (CEOS Stage-3). Still, updates (e.g. annual or per epoch) on GLC maps often lack rigorous accuracy assessments. Considering that validation datasets are collected using human interpretation, which is costly and time-consuming, validation datasets should be designed to be easily adjustable for timely validation of new releases of land cover products and also be suitable for assessing multiple maps.
Aiming towards operational land cover validation, this study presents a framework for operational validation of annual global land cover maps using efficient means for updating validation datasets that allow timely map validation according to recommendations in the CEOS Stage-4 validation guidelines (Figure 1)(Tsendbazar et al. 2021). The framework includes a regular update of a validation dataset and continuous map validation. For the regular update of a validation dataset, a partial revision of the validation dataset based on random and targeted rechecking (areas with a high probability of change) is proposed followed by additional validation data collection. For continuous map validation, an accuracy assessment of each map release is proposed including an assessment of stability in map accuracy targeting users that require multi-temporal maps.
This validation framework was applied to the validation of the Copernicus Global Land Service GLC product which includes annual GLC maps from 2015 to 2019. We developed a multi-purpose global validation dataset that is suitable for validating maps with 10-100m resolution for the reference year 2015 (Tsendbazar et al. 2018). As part of the operational validation, this dataset was updated to 2019 based on partial revision consisting of random and targeted revisions. The BFAST time series algorithm was used to target sample locations that are possibly changed during the update period. Additional sample sites were also collected to increase the sampling intensity in the land cover change areas.
Through this updating mechanism, we validated the annual GLC maps of the CGLS-LC100 product for 2015–2019. We further assessed the stability in class accuracy over this period. Implementation of this operational validation framework in the context of the Copernicus Global Land Service GLC product will be presented.
Since the validation dataset is a multi-purpose validation dataset that allows validating maps with 10-100m resolution, this dataset was further updated to the year 2020 to validate ESA’s WorldCover 2020 GLC map which is based on Sentinel 1 and Sentinel 2 data at 10m resolution. The approach and results of validating the WorldCover 2020 GLC map will also be included in this presentation.
As more operational land cover monitoring efforts are upcoming, we emphasize the importance of updated map validation and recommend improving the current validation practices towards operational map validation so that long-term land cover maps and their uncertainty information are well understood and properly used.
Index Terms— land cover validation, operational monitoring, dataset update, global land cover
Related literature
Tsendbazar, N., Herold, M., Li, L., Tarko, A., de Bruin, S., Masiliunas, D., Lesiv, M., Fritz, S., Buchhorn, M., Smets, B., Van De Kerchove, R., & Duerauer, M. (2021). Towards operational validation of annual global land cover maps. Remote Sensing of Environment, 266, 112686
Tsendbazar, N.E., Herold, M., de Bruin, S., Lesiv, M., Fritz, S., Van De Kerchove, R., Buchhorn, M., Duerauer, M., Szantoi, Z., & Pekel, J.F. (2018). Developing and applying a multi-purpose land cover validation dataset for Africa. Remote Sensing of Environment, 219, 298-309
1. Introduction
Land cover is one of the main environmental climate variables (ECVs) as it is highly correlated with climate change. In this context, in the framework of the Climate Change Initiative (CCI) of ESA [1], the High Resolution Land Cover (HRLC) project is aimed to study the role of the spatial resolution in the mapping of land cover and land-cover changes to support climate modelling research [2]. Land cover and related changes are indeed both cause and consequence of human-induced or natural climate changes. This has been demonstrated by the previous phase of the CCI program, focused on the generation of Medium Resolution (MR) Land Cover maps at global scale. Differently from the MR land cover CCI, which provided annual land cover maps at 300m resolution in the period 1992-2020 [3], the HRLC project produces regional maps characterized by a spatial resolution of 10m/30m. Moving from 300m to 30m requires the definition of new data analysis methods, reframing the perspective with respect to the MR project both from the theoretical and the operational viewpoints. Although HR potentially increases the capability of a detailed analysis of spatial patterns in the land cover, many challenges are introduced with respect to the MR case and limitations in the available data make the development of products at very large scale very challenging.
This contribution presents the architecture and the methodologies developed for implementing the full processing chains that have been developed to process Earth Observation (EO) data and generate the HRLC products. The primary products of the project consist of: (i) HR land-cover maps at subcontinental scale at 10m as reference static input (generated for 2019 only) to the climate models, (ii) a long-term record of regional HR land cover maps at 30m in the regions identified for the historical analysis every 5 years (generated in the period 1990-2015), and (iii) change information at 30m at yearly scale consistent with historical HR land-cover maps.
2. Methodology
The development of the proposed architecture was based on the observation that temporal availability of HR data in the past/current archives is much lower than that of the MR ones and strongly varies across the years. Differently from the MR case (e.g., SPOT-Vegetation archive), no daily acquisitions are available and only in the very recent years it was possible to get a quite dense temporal sampling due to Sentinel and Landsat-8 missions. Prior to them, the number of yearly-based images available in archives dramatically reduces (being Landsat Thematic Mapper, ASAR and ERS-1 and 2 the most relevant data sources), resulting in a much more challenging problem for the development of HRLC products. This scenario led to a complex process to produce historical time series of products. Moreover, it required a shift in the processing paradigm that moves from the analysis of many images per year acquired at MR to a few images (for some areas and years single or no images are available) characterized by high spatial resolution.
To produce the land-cover maps, two multisensor (optical and SAR) processing chains have been designed and implemented: one is based on the exploitation of Sentinel 1 (S1) and Sentinel 2 (S2) images for the generation of maps at 10 m resolution (used in the project for generating products in 2019) (Figure 1) and the other one generates historical maps every 5 years going back to 1990 by exploiting Landsat (Enhanced) Thematic Mapper images and ASAR and ERS-1/2 data (Figure 2). Both architectures share two pre-processing branches (one for optical and the other for SAR data) and a fusion module for the final map production. The main difference between the two processing chains is related to the pre-processing techniques (which consider the large differences in data quality and availability between Sentinel and previous missions) and in the paradigm exploited for the classification mechanism. The S1/S2 architecture classifies independently the time series of images acquired in the target year (in the project 2019) and generates the land-cover products by fusing the classification results obtained independently on the two branches (optical and SAR) by using consensus theory and Markov Random Field approaches [4]. The historical processing chain assumes as baseline the classification results generated with S1/S2 data and exploits the cascade classification paradigm [5] to properly model the temporal correlation between images when producing the historical land-cover maps. This is done to mitigate the well-known problem of error propagation in multitemporal classification, which is extremely critical when independent classification of multitemporal data is performed. The cascade classification paradigm is robust and theoretically well founded as it is based on the Bayesian decision theory. The classification techniques included in the optical and SAR branches include “shallow” machine learning techniques (Support Vector Machines, Random Forest) and specific SAR detectors focused on built -up and water related classes [6]. Both architectures provide in output uncertainty measures for the classification of each pixel in the map and also indications on the second alternative class for a better representation of the real complex conditions on the ground. These are crucial information to be given as input to the climate modelling task when using the generated products. Specific methodologies have also been devised to support the definition of the training sets to be used for the 2019 and the historical image classifications [7].
To produce land-cover change maps every year, a third architecture has been defined (figure 3) that is driven by the cascade classification output and is aimed at identifying the location in time of the changes on a yearly base. This allows to localize in time the changes occurred between 5 years maps. The change detection products have been generated by using optical data acquired by Landsat (Enhanced) Thematic Mapper. The main challenge is related to the very uneven distribution of data available in different areas and in different years. This was addressed by defining an architecture based on a feature extraction module, a time series regularization module (based on a “shallow” neural network) and an abrupt change detection module [8]. The change detection products have associated reliability information for each pixel in terms of the probability of change.
3. Product generation and conclusion
The processing chain has been developed according to the use of dockers and with the requirement to be able to process big data volumes of optical and SAR images. Processors have been fully integrated in Python-based pipelines that automatically retrieve the needed products for the specific task and perform the processing. The production has been run on Amazon Web Services (AWS) cloud computing, even if the processing chain is flexible and can be run on DIAS and other cloud infrastructures.
The Climate User Group involved in this project defined three large regions of particular interest to study the climate/LC feedbacks in three continents involving climate (tropical, semi-arid, boreal) and complex surface atmosphere interactions that have significant impact not only on the regional climate but also on large-scale climate structures. The three regions are in Amazon basin, the Sahel band in Africa and in the in the northern high latitudes od Siberia.
The products generated on the three areas have accuracies that given the complexity of the task and of the legend of land covers classes (which includes also seasonal classes) are satisfactory (see the “ESA CCI High Resolution Land Cover Products” presentation).
References
[1] ESA – European Space Agency: ESA Climate Change Initiative description, EOP-SEP/TN/0030-09/SP, Technical Note – 30 September 2009, 15 pp., 2009.
[2] L. Bruzzone et al, "CCI Essential Climate Variables: High Resolution Land Cover,” ESA Living Planet Symposium, Milan, Italy, 2019.
[3] P. Defourny et al (2017). Land Cover CCI Product User Guide Version 2.0. [online] Available at: http://maps.elie.ucl.ac.be/CCI/viewer/download/ESACCI-LC-Ph2-PUGv2_2.0.pdf
[4] D. Tuia, M. Volpi, G. Moser, “Decision fusion with multiple spatial supports by conditional random fields,” IEEE Trans. Geosci. Remote Sens., 56, 2018
[5] L. Bruzzone, R. Cossu, A multiple cascade-classifier system for a robust a partially unsupervised updating of land-cover maps, IEEE Trans. on Geosci. Remote Sens., Vol. 40, 2002.
[6] T. Sorriso, D. Marzi, P. Gamba, ”A General Land Cover Classification Framework for Sentinel-1 SAR Data.” in Proc. of the 2021 IEEE 6th Int. Forum on Research and Tech. for Society and Industry (RTSI), Napoli,. 2021.
[7] C. Paris, L. Orlandi, L. Bruzzone, “A Strategy for an Interactive Training Set Definition based on Active Self-Paced Learning,” IEEE Geosci. Remote Sens. Letters, Vol. 17, 2021.
[8] Y.T. Solano-Correa, K. Meshkini, F. Bovolo, L. Bruzzone, “A land cover-driven approach for fitting satellite image time series in a change detection context,” SPIE Conf. on Image and Signal Processing for Remote Sensing XXVI, 2020.
List of the other HRLC team members: C. Domingo (CREAF), L. Pesquer (CREAF), C. Lamarche (UCLouvain), P. Defourny (UCLouvain), L. Agrimano (Planetek), A. Amodio (Planetek), M. A. Brovelli (PoliMI), G. Bratic (PoliMI), M. Corsi (eGeos), C. Ottlé (LSCE), P. Peylin (LSCE), R. San Martin (LSCE), V. Bastrikov (LSCE), P. Pistillo (EGeos), M. Riffler (GeoVille), F. Ronci (eGeos), D. Kolitzus (GeoVille), Th. Castin (UCLouvain), R. San Martin (LSCE-IPSL), C. Ottlé, V. Bastrikov (LSCE-IPSL), Ph. Peylin (LSCE-IPSL).
With the rise of distributed constellations (Landsat, Sentinel, VIIRS, MODIS, Planet, Airbus, Maxar) there has been a general push to make earth observation data interoperable. This has led to the notion of harmonized data products. Moreover, there is a strong incentive for these data sources to be combined into virtual constellations to achieve high revisit rates for resulting sensor fusion products. Distributed constellations of small commercial satellites can deliver data that is higher in information density and radiometrically accurate when paired with traditional missions. Today, the need for high cadence time series to quantitatively leverage bio-optical models of vegetation and measure the impact human activity is driven by the urgency to measure the environmental dimensions of “sustainable development.”
Under the sponsorship of the European Union’s Horizon 2020 programme, we are exploiting the notion of a virtual constellation for the purpose of updating and maintaining the CORINE (CLC) land cover product, which is the flagship of the Copernicus Land Monitoring Service (CLMS). In our approach, we fuse global daily imagery from the PlanetScope contributing mission with Landsat 8, Sentinel-2, VIIRS, MODIS to create cloud free, harmonized, 3 m resolution, near-daily time series covering three full years starting from the latest CLC release: 2018, 2019, 2020. We sample half a million data cubes across the entire territory of the EU. We sample from each country relative to country surface area and perform stratified sampling with respect to the 44 CLC land cover classes. We label the data cubes based on the land cover classes present at each location in the 2018 reference year and we use this corpus to train machine learning models that can learn how to recognize the “pulse” of land cover types and detect changes on a short time scale in subsequent years.
The challenge is to develop novel AI architectures that can properly exploit the unprecedentedly high spatiotemporal resolution of these data streams and provide new insights into land cover dynamics. Exploratory data analyses show that 3 m daily time series of spectral indices such as NDVI are powerful indicators of biodiversity and excellent discriminators of land cover types. For instance, simple clustering of these fine temporal signatures can lead to segmentation of tree species at the crown level based on intraspecific and interspecific variations in leaf phenology measured in early spring and throughout the fall season, leading to a better assessments of forest composition. The same can be said about the phenometrics of agricultural crops and wild vegetation in general when captured at this scale. The high temporal cadence helps us improve our understanding of land use and challenging habitats such as wetlands, grasslands and pastures.
Our baseline models are supervised classification models that employ Convolutional Neural Networks (CNN) for spatial encoding of class distributions and Recurrent Neural Network (RRN) for modelling their temporal evolution. Of particular interest is the development of weakly or fully unsupervised spatiotemporal deep learning models that can learn to disentangle structural change from phenology based on multi-year observations and in the absence of labels. These constitute our more advanced models and rest on methodologies for self-supervised image representation learning. We compare and evaluate the potential impact of all these architectures for more continuous updates of the CLC product by showing results drawn from large regions of interest in Europe.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101004356, Copernicus evolution: Research activities in support of the evolution of the Copernicus services.