The Digital Twin of the Earth (DTE) will help to advance the monitoring, forecasting and visualisation of natural and human activity globally. High-resolution models will track the health of the planet based on a wide range of data, perform simulations of Earth’s interconnected system with human behaviour and thus support the field of sustainable development, reinforcing our efforts to create a better environment.
As one of ESA’s DTE Precursors, our project has supported ESA in defining the wider DTE concept by establishing the scientific and technical basis to realise elements of a DTE in the food systems vertical. The project, run by CGI, and in close collaboration with Oxford University Innovation, IIASA and Trillium, has focused on developing a Food Systems Digital Twin by linking end to end models from forcing meteorology though crop modelling to price impact assessment. This allows for testing policy options linking climate impacts, food production and sustainability, for example by assessing impacts and potential supply vulnerabilities from large scale crop re-zoning designed to enhance biodiversity. Our use case has involved prominent use of AI processing, challenges of model integration at different scales and ingestion of socio-economic as well as physical measurements, thus testing a number of DTE concepts. The end-to-end chain provides decision support outputs with innovation at each stage and have been tested in consultation with potential stakeholders.
The purpose of our use case has been to demonstrate the value of the DTE concept to the scientific community, by integrating the outputs of novel algorithms. We used a machine learning based extreme precipitation model to feed a Global Gridded Crop Model, and after regional downscaling integrated the result into cropland land use and pricing models. The potential benefits of these links include improvements in routine monitoring with regular seasonal assessments, contributions to short term policy responses to crop shortages due to extremes and support to long term policy development to apply appropriate incentives for land rezoning. Architecture and integration considerations for the DTE within the demonstration help to define the next development priorities as part of the roadmap.
The Digital Twin of the Earth as a whole aims to encompass the full information supply chain from data acquisition and data management, though data fusion and information extraction into decision support. This is the scope for ESA and the European Commission aim to reach with the use of Digital Twins, and the impact is aiming to not only be far reaching, but also to be used as a trusted source for sustainability-focused decision making.
In our Climate Impact Explorer we have built a prototype Digital Twin Earth system that brings together advanced Land Surface Modelling (JULES) of African soil moisture, processed using High Performance Computing infrastructure (on JASMIN - https://jasmin.ac.uk) and optimised via data assimilation (LAVENDAR) using state-of-the-art Earth Observation data (soil moisture and solar-induced fluorescence). We have developed a Machine Learning emulator on top of this system to enable fast exploration of a wider climate space, driven by ISIMIP-based climate scenarios, without costly model simulations. We condense these complex soil moisture outputs into key drought metrics relevant to our stakeholders and make them available via an interactive data portal and Jupyter Notebook environment hosted on JASMIN’s cloud. This system enables decision makers without expert technical knowledge to generate and visualise decision relevant drought information relating to regionalised impacts of climate change.
This work has been carried out as part of the ESA Digital Twin Earth Precursors activity. With the short timescale for the project (12 months) and emphasis on innovation and pioneering new applications it was essential to have the required computing resources easily at our disposal. The combination of an established HPC environment alongside cloud computing resources and large storage capacity on JASMIN meant that it was possible to commence development activities from the outset. This was assisted in large part by the Cluster-as-a-Service system available on JASMIN’s Cloud. This provides a web user interface to rapidly deploy a shrink-wrapped environment from pre-prepared templates - in this case templates for the deployment of Pangeo (Jupyter Notebook service with Dask) and an Identity Service to manage and authenticate users to the platform.
Data from the HPC system was output as regular netCDF files one per time step onto traditional POSIX storage. Using object storage it was possible to make outputs readily available to the cloud environment in analysis-ready form. This entailed serialisation of the data in Zarr format and rechunking to suit time series-based data queries, the predominant access pattern for analysis. This was critical in enabling the creation of responsive interactive map-based web user interfaces.
The ICT provisioned via JASMIN facilitated effectively an incubator environment for the Digital Twin demonstrator and in this respect mirrors the essential elements identified in the DestinE high-level architectural blueprint of an open core platform providing HPC, data sources and cloud computing capability.
The Forest Digital Twin Earth (Forest DTE) will be a data-driven, physical-coherent, approach to Earth system science, which will make use of existing Earth Observation (EO) capabilities and physically-based modelling, to create a digital replica of the world’s forests.
A precursor of the system was created by a consortium funded by the European Space Agency following the Destination Earth (DestinE) initiative. The consortium included VTT Technical Research Centre of Finland, Department of Forest Sciences of the University of Helsinki, Simosol OY, Unique GmbH, Cloudferro Sp z o.o., and Romanian National Institute of Forest Research (INCDS). DestinE, a part of the European Green Deal, aims to develop a high precision digital of the Earth to monitor and simulate natural phenomena and related human activities. Forest DTE, a specialized digital twin, would provide detailed information on the functioning, climate effects and carbon exchanges related to the forests, which cover approximately one-third of the planet's land surface. Satellite-based estimates of forest structure and above-ground biomass offer the only means to obtain homogeneous and extensive information on the state of the world's forests. Other information sources, such as field plot data, soil maps, climate predictions and forest management scenarios are needed to understand the ongoing biological and anthropogenic processes, and to predict the state of the forests in the future.
Before the precursor implementation, we identified the needs of forest sector users on the specific data products, and spatial and temporal resolutions of the Forest DTE. In 2020, the most important forest variables were related to the carbon stocks of the forests, the necessary temporal time scales were from seasonal or yearly changes to several forest successions (i.e., several hundreds of years), and the required spatial resolutions generally corresponded that of high-resolution optical imagery, with a need of spatial aggregations to forest management or administrative units. The Forest DTE Precursor was implemented on the Forestry Thematic Exploitation platform, hosted on the CREODIAS cloud, for selected and rather limited test areas in Europe. However, the existing computational facilities were found to be sufficient also for large-scale implementation of a forest digital twin at the spatial and temporal scales required by the user.
Based on the experience obtained during the precursor implementation, the following limitations related to user needs need to be addressed when implementing the full Forest DTE: 1) Availability of homogeneous (i.e. containing the minimum set of required variables and confirming to the basic quality standards) forestry field data in a computer-readable format; 2) Validation capabilities of the full Forest DTE chain at the spatial resolution of the products; 3) Improved determination of species or plant functional type and their proportions in mixed stands; 4) Integration with other components of the Digital Twin of the Earth: the relevant technical tools and protocols need to be developed. These forest-related limitations need to be overcome within a decade to reach the goal of DestinE of having a functional Digital Twin of the Earth in ten years.
Although field-measured forest data are scarce for many regions of the world, national forest inventories exist in many countries and can provide data, although sometimes at irregular intervals. In the forthcoming decades, robust machine learning-based tools need to be created to make best use of these data, match the field measurements with satellite observations, and allow reliable estimation of key forest variables from Earth observation data. The next step in Forest DTE requires a modeling of forest functioning at the spatial scale of very high resolution satellite sensors. In principle, tools for this exist, but contrary to forest variable estimation, direct validation of the forest productivity models at this spatial resolution, and at the temporal scales required by the users, is still a scientific challenge.
The validation calls for the use of different data sets of carbon and other fluxes, computed at vary different spatial and temporal and spatial resolutions, e.g. from atmospheric model inversion. A validation of a forest DTE, or any DTE in general, implies a full unification of top-down and bottom-up approaches: the fluxes computed from detailed measurements and simulations of the terrestrial and aquatic ecosystems need to match the global atmospheric simulations. New data and methods may need to be included for this. For example, satellite-borne measurement chlorophyll fluorescence will allow to create a global map of photosynthesis. Future hyperspectral constellations will allow a more detailed mapping of overstory species and, potentially, plant stress. In order to make a functional digital twin of the Earth, all these very diverse data sources will need to be integrated on a single platform.
When finally functioning as a part of the Digital Twin of the Earth, Forest DTE will act as a spatially explicit simulation tool, which can be initialized using a snapshot of data, including EO imagery, environmental data, and field measurements of forestry parameters. It will give users tailored access to high-quality information, services, models, scenarios, forecasts and visualizations as required by DestinE.
Ice sheets are a key component of the Earth system, impacting on global sea level, ocean circulation and bio-geochemical processes. Significant quantities of liquid water are being produced and transported at the ice sheet surface, base, and beneath its floating sections, creating complex feedback between the ice sheet, the bed underneath, the atmosphere, and ocean systems. Their future evolution, and the ice sheet response to a warming ocean and atmosphere, is a key uncertainty in projecting Sea Level for the future.
The Digital Twin Antarctica is part of a larger initiative by ESA and the EC to create a dynamic, digital replica of our planet which accurately mimics Earth’s behaviour. Based on Earth observation and in-situ data, artificial intelligence, and numerical simulations, Digital Twin Antarctica (DTA) aims at generating an advanced dynamic reconstruction of Antarctica’s hydrology, and of its interaction with ocean and atmosphere. The objectives of the reconstructions are to combine state of the art observation of past and current state, AI, and simulation of past, present and future state of the Earth system in and around Antarctica. DTA will help visualise and forecast the state of the Antarctic Ice Sheet and of interconnected, helping to support European environmental policies
Here we present a series of demonstrators to highlight the potential of a Digital Twin of Antarctica in addressing processes related to surface and basal melting, and of the interaction of the ice sheet with its sub-glacial environment, and the fringing Southern Ocean. We will also introduce a visualisation system allowing to dynamically and interactively navigate and interact with such a complex environment. Finally we will present a vision for expansion to a fully functional DTA, what its overall aim should be, what impact and scenarios should be addressed. We will focus in particular on requirements regarding the data lake, the orchestration of a large ecosystem of functionalities, and the visual environment allowing seamless interaction with the system.
Digital Twin of the Ocean: Ocean2 - Open Pilot for a European Operational Service
What is a DIGITAL TWIN of the OCEAN
A Digital Twin of the Ocean is a highly accurate models of the Ocean to monitor and predict environmental change, and human impact and vulnerability supporting an openly accessible and interoperable dataspace that can function as a central hub for informed decision making.
A Digital Twin of the Ocean provides a central information hub for informed decision making, by running highly accurate models of the Ocean to monitor and predict environmental change and human impact and that relies on openly accessible and interoperable dataspaces.
Such an information system consists of one or more digital replicas of the state and temporal evolution of the oceanic system constrained by the available observations and the laws of physics, making it imperative to integrate a set of models or software that pairs the digital world with physical assets and to feed this set with information from sensors.
IMAGE 1 - DTO circular workflow
Ocean 2, the European Digital Twin of the Ocean platform pilot, aims to deliver a holistic and cost-effective solution for the integration of all European assets related to seas and oceans with state-of-the-art Artificial intelligence and HPC resources into a digital, consistent, high-resolution, multi-dimensional and near real-time representation of the ocean. This will result in a new European shared capacity to access, manipulate, analyse and visualise marine information. The knowledge generated by this DTO platform will empower scientists, citizens, governments, and industries to collectively share the responsibility to monitor, preserve and enhance marine and coastal habitats, while fostering the assimilation of sustainable measures, ideals, and actions by the blue economy (tourism, fishing, aquaculture, transport, renewable energy, etc.), contributing to a healthy and productive ocean.
Construction of an open DTO service platform
To properly address the construction of a digital twin, breakthroughs are needed in various aspects of the digital twin information system, including information completeness and quality, information access and intervention as well as the underlying supporting infrastructure, tools, and services.
The operational pilot of DTO will encompass the production of a new quality of information, one that incorporates human systems in the prediction problem and that leverages advances in information theory and digital technologies. Ensembles of simulations combining models from different disciplines, informed by spatial correlations determined from high-resolution observations and by data-driven learning of unknown processes and missing constraints will enable this DTO to reduce uncertainty in the estimation and forecasting of ocean states, changes, and impacts.
Enhancing information quality requires a step change in computational complexity. This means adequate infrastructure including support of very high computing throughputs, concurrency, and extreme-scale hardware. However, it is vital to hide this complexity so that users can run and configure complex workflows and access the information in ways that do not require expert intervention. In addition, the underlying models and data need to be scientifically and applicatively sound.
This will require a multi-layered software framework where tasks like simulations, observational data ingestion, post-processing, and so on are treated as objects that are executed on federated computing infrastructures, feed data into virtual data repositories with standardized metadata, and from which a heavily machine-learning-based toolkit extracts information that can be manipulated in any possible way. The result should be the provision of on-demand, conveniently accessible and available modelling and simulation products, data and processes or Modelling and Simulation as a Service (MSaaS).
Underlying architecture
The multi-layered framework enabling this digital twin ocean pilot operational service comprises 3 major interrelated structural elements:
• A DTO data access layer that mixes results and tools from ongoing projects and existing infrastructures with new developments targeting data ingestion and data harmonising into a Data lake for subsequent use in the DTO engine;
• A DTO engine comprising a set of modelling capabilities, including on-demand modelling and what-if scenario modelling that fill the observational gaps in space and time in a physically consistent way, and observation-driven learning of unknown processes and missing constraints, which will enable us to reduce uncertainty in the estimation and forecasting;
• A DTO interactive service layer supplying tools, libraries, and interfaces to simplify the running and configuration of workflows and the access to the information, including its analysis and visualisation.
IMAGE 2 - DTO Architecture