Session C5.02 Scalable platform architectures enabling big EO data analytics in cloud-based platform
Michał Bylicki, Monika Krzyżanowska, CloudFerro
Efficient and scalable computing cloud access to Big Data EO repositories – CREODIAS, WEkEO, CODE-DE and other platforms
Copernicus program provides users with a tremendous amount of EO data available as a public good for anyone at no cost. With an increasing amount of data, we need to provide growing capabilities to process the data and use the information derived from the data. The size of the Copernicus data makes it difficult to process the data in the local environment. To foster the usability of EO data, the DIAS platforms (Data and Information Access Services/EO platforms) were launched.
EO platforms are essential component that enhance the EO data's usability, making it possible to change data into information. The platforms provide users with a computing power that enables fast and efficient processing and allow users to access, process and analyse Copernicus products in the cloud directly connected to the EO archive. The availability of data, storage, and processing in a single place makes it easier to develop and distribute scalable digital services based on the EO data.
We have built and operate several EO platforms based on a hybrid cloud in an Infrastructure-as-a-service model. Our services are based on open-source components, such as Open-Stack for cloud orchestration, Ceph for storage management, Prometheus and Graphana for monitoring, K8s for processing and others. Within this session, we will present the challenges of building cloud platforms with direct access to big data EO repositories. On the example of CREODIAS, WEkEO, CODE-DE and other platforms built and operated by CloudFerro, we will present the main building blocks and architecture of entirely European cloud platforms based on open-source software. We will show how we managed to create a new service - the EO platform as a service. We will as well discuss how such services can become part of the emerging European federated clouds and data ecosystem.
Federation and multi-platform approach allow users and operators to benefit from the synergy between different platforms, such as shared multi PB data repository available from various platforms and elasticity cloud components for additional processing needs. At the same time, it generates a technical challenge that we need to address. We will present several such challenges and outline our approach to face them, focusing (but not limiting to) on three main topics:
• Interoperability of the environments:
how to support easy migration between different clouds and provide direct access to the repositories available from different computing clouds, with multi-cloud processing enabled and dynamically adaptable usage and payment models.
• Data access and dissemination:
the multi-petabyte, scalable EO data repositories should provide fast, instantaneous access to many heterogeneous users and algorithms with different needs. At the same time, the data needs to be discoverable and ready for analysis in a context of constantly evolving data offers (new collections, commercial data, products), and users should be supported in data dissemination.
• Cloud efficiency:
how to optimize data access, storage, and processing costs and energy usage both for individual users and for the entire ecosystem.
In our presentation, we will present our approach to these challenges from the EO platform provider point of view. As an EO platforms provider active in a fast-growing and developing market, we need to constantly add and improve new services as required by the customers in order to compete with the biggest players. It is crucial to provide more advanced services and not to compete with our customers at the same time. We will discuss the perspectives and responsibilities of EO platform providers. We shall complement the presentation with a short discussion of the applicability of computing cloud vs HPC for EO processing. We will terminate with our vision of the emerging federated cloud and data ‘green’ ecosystem.
CLEOS (Cloud Erth Observtion Services) is e-GEOS satellite data and information platform offering access to a wide variety of geospatial datasets and enabling the development, test and large scale deplyment of geospatial data prcoessing pipelines. CLEOS adopts a multicloud approach and fosters interoprability through standards and practixs such as OpenEO for APIs and STAC for metadata catalogue.
High Level Architecture.
CLEOS Architecture is modular and it is based on five main components:
• The Marketplace, where Customers can access available products and services through a user-friendly UI;
• The API Layer, which exposes purchasing, processing and data access capabilities through a RESTful web interface, also enabling machine-to-machine integration with external systems;
• The Processing Platform, which orchestrates the scalable execution of Data Processing Pipelines for both Optical and SAR data, using e-GEOS proprietary algorithms and Third-Party tools. It includes the AI Factory for the management and development of AI-based applications;
• The Data Layer that all entire the metadata catalogue of the federated data sources connected to the infrastructure and a pool of locally available assets and resources, including EO and non-EO data;
• The Help Desk, used by e-GEOS Customer Service to provide the necessary support to CLEOS Customers.
CLEOS accesses multiple satellite and non-satellite data sources through a set of data collectors adapted to the interfaces offered by each data/contents provider (e.g. satellite missions Ground Segments). CLEOS also has a multi-cloud orchestration service that allows the deployment of the whole CLEOS platform or of single processing jobs in different commercial cloud infrastructures, including all Copernicus DIAS.
Design principles and innovations.
CLEOS Platform design adheres to technical design principles widely shared in the geospatial community.
1. Multi-Cloud & Data Locality. Space Earth Observation data are notably very large datasets (e.g. Sentinel-2 satellites acquire about 10 TB of data daily). The “data gravity” associated with these huge data archives requires a shift in the processing paradigm, which means bringing the processing close to the data, to minimize network congestion and increase the throughput of the overall system. This is called the Data Locality principle. However, space and geospatial data are not located in a single infrastructure, since today there are several endpoints offering access to the same datasets (e.g. Sentinel-2 data can be accessed in AWS, Google Cloud and five Copernicus DIAS). Additionally, the selection of the infrastructure where to access and/or process data is driven by multiple considerations: not only price/performance, but also constraints from Customers (including the option to process data in a local infrastructure for certain workloads).
This scenario had the following impact on CLEOS design:
• The Data Catalogue needs to register multiple endpoints where to access the same resource. CLEOS has adopted Spatio Temporal Asset Catalog (STAC) metadata structure, as it allows extensible definition of spatial assets and resources enhancing indexing and discovery process standardization;
• The Processing Platform must have the flexibility to pilot processing requests on different cloud platforms, including on-premises infrastructures, allowing also hybrid cloud workloads. CLEOS has adopted the Max-ICS platform, which exploits open source frameworks such as Mesos, Marathon, Puppet, and Terraform, to manage Infrastructure as a Service (IaaS) resources in multiple cloud infrastructures, abstracting the complexity of the platform control and service orchestration layers.
2.Elasticity&Scalability. Geospatial data processing use-cases can define a great variety of workloads types, from large batch processes in case of multi-temporal analysis over large areas, to synchronous data analytics requests of newly acquired data, or even stream processing. The simultaneous management of such heterogeneous workloads requires a strong optimization of resources usage and deployment. This scenario requires CLEOS to be able to scale up and down the available worker nodes, according to the active workload in an elastic way. CLEOS infrastructure is based on microservices, fragmenting complex workflows into elementary steps that can executed by independent nodes and easily orchestrated. Therefore CLEOS infrastructure can dynamically scale-up those microservices to fulfill the increasing number of processing jobs, deploying ultimately new on demand virtual hosts exploiting the elastic resources allocation made available by almost all cloud providers.
3.Microservices and Data Processing Pipelines. In CLEOS all the processing services are made available by nodes and pipelines. Nodes (microservices) and pipelines (set of nodes linked to each other to perform a workflow) are the two main components upon which to build a collaborative ecosystem in which CLEOS developers can reuse available standalone blocks to create new services with a modular LEGO logic. The definition of Data Processing Pipelines is central in modern platforms, as it allows the design and implementation of a workflow that is activated once data reach the first node of the pipeline, flow through it to produce a result that can be used as the input of another pipeline or be delivered to the user.
4.Platform Federation & Interoperability. Today, a platform cannot behave as a standalone system, but it needs to be interoperable and federated with other platforms on the two sides of the market:
• Supply: the platform needs to be able to access to several heterogeneous suppliers of data and services;
• Demand: the platform needs to offer its data and services to heterogeneous customers that, more and more often, will be other platforms and not humans.
To achieve this objective, the API layer plays a central role. In particular, CLEOS has adopted also the OpenEO standard to manage the end-to-end process of searching, configuring, buying, monitoring and accessing data and services. This choice was made to enable the OpenEO clients to connect with CLEOS with minimal effort. Through this API definition, it is possible to unify the access to the different service backends, abstracting proprietary implementations made by each vendor with their own internal interfaces.
CLEOS Data Layer
CLEOS Data Layer is a set of modules dedicated to the storage and cataloguing (Technical Catalogue) of available resources, being them data, metadata or capabilities available through the Processing Platform. The CLEOS storage relies on available Object Storage services in different cloud infrastructures, since most of data collections are available at different endpoints. Therefore, the Technical Catalogue in the Data Layer has a paramount importance, insofar it allows the Marketplace and the Processing Platform to have a unique point of reference about what resources/services are available and how/where they can be accessed. Those catalogues follow the STAC (SpatioTemporal Asset Catalog) specification. The aim of STAC is to define a common language to describe a range of geospatial informations, so they can more easily be indexed and discovered. CLEOS users/developers will take all the benefits of this adoption that prevents them from writing new code each time a new data set or service is available. The Data Layer also offers methods and interfaces (API in the OpenEO standard) to access available data in multiple ways and for multiple purposes (View, Download, Subset, …)
CLEOS Processing Platform, Developer Portal & AI Factory
CLEOS Processing Platform is the module that is responsible for execution of all processing tasks, from the simple retrieval and delivery of a product up to the orchestration of complex and long batch processing jobs involving thousands of processing nodes. The Processing Platform operates and deploys the Processing Pipelines created in the Developer Portal. The Developer Portal is the environment where e-GEOS and external developers can build new Processing Pipelines re-using available modules and building blocks, taking advantage of an Interactive Development Environment (IDE) and of CLEOS API to streamline data access and processing operations
The Processing Platform is able to:
• manage the provision of the necessary infrastructure resources in a dynamic way with elastic scaling up & down in multiple cloud and on premise infrastructure;
• manage the processing pipelines using a data driven and message-based approach where requests are queued and progressively managed, enlarging or reducing the size of the available worker nodes according to the demand;
• manage DevOps through a Continuous-Integration/Continuous-Development (CI/CD) pipeline • monitor the resources usage, node by node, pipeline by pipeline and infrastructure wise
• centralize all the microservices logs so that they can be accessed, filtered and analysed cost-effectively While the processing jobs follow as a stream due to backend event based architecture, CLEOS Processing Platform offers also the capabilities to retrieve and control the overall status of a request managing also customers notifications and updates.
At last, the AI Factory is the platform section dedicated to the development and management of AI models and corpus, where AI developers and users work together to develop, test and scale new AI based applications.
The AI Factory allows:
• to access a large set of pre-defined AI models or to import custom ones;
• to import training corpus or to build new ones using a simple and intuitive interface;
• to train, re-train models, benchmark performance metrics and manage model versioning;
• to directly include trained AI models into processing pipelines via their relative inference nodes.
Capturing video from satellites is one of the most exciting innovations to hit the remote sensing world in recent times. High-resolution, full-colour EO video is enabling fundamental and disruptive changes for the Geospatial Intelligence and Earth Observation industries. When EO video analytics are combined with complementary geospatial information sources, the results can generate powerful information for end users.
EO based video delivers a new dimension of temporal information that captures instantaneous motion on and above the Earth’s surface. This allows determination of direction, speed and rate of change of moving targets on or above the ground. This is a novel observational capability that can potentially change the way we view our planet by enabling new types of analyses and could drastically improve situational awareness, critical decision-making and improved forecasting.
Earth-i and CGI are currently developing the EO Video Analytics and Exploitation Platform (VANTAGE), funded by the European Space Agency, to promote and enable the widespread exploitation of EO video. CGI provides unparalleled expertise in the development and deployment of Exploitation Platforms, whilst Earth-i has a proven track record of delivering powerful analysis of EO imagery and video, using advanced analytics and Artificial Intelligence.
The ultimate aim of VANTAGE is for users to be able to upload their own data and algorithms and use the platform for processing and interrogating that data in conjunction with EO video, and export the results back to their own working environments. VANTAGE is being built using a cloud-based EO platform architecture that enables scalable processing and analytics, tailored for large-volume satellite video data. The platform provides an archive catalogue of over 280 full-colour high-resolution EO videos for the users to interrogate, and a library of predefined analytics tools that are tailored to extract value from EO video, alongside Jupyter Hub integration for collaboration and interactive development of customised user workflows.
To showcase the types of capabilities that the platform can offer, Earth-i and CGI have been developing a number of use cases, including topics such as deforestation, urban sprawl and construction monitoring. Example workflows are being deployed, showing potential users from research, commercial business or public sector organisations how to extract unique information from EO video. A coding challenge was held in November 2021 and another is planned for February 2022, the results of which will be presented.
In this presentation, Earth-i will provide an overview of VANTAGE and its leading edge architecture and functionality, and we will demonstrate some of the key functionality, showing how users can extract cloud-free data from an EO video, or identify and track moving objects in the videos, or construct a 3D model from the video data. We will show how to use EO video data in combination with traditional satellite data acquisitions to derive higher level information products.
In recent years, advances have been made for users e.g., in Marine Safety to make Earth Observation (EO) data available shortly after data capture. However, it requires considerable infrastructure that is usually not available to the wider community. Near real-time (NRT) exploitation of EO data requires reducing the time from sensing to actionable information, including downlinking, processing and delivery of imagery and value-added data. Full utilization of the NRT potential must therefore include optimal exploitation of satellite capabilities, ground station availability, a flexible and scalable processing environment and reducing the necessary computations by intelligent data selection early in the pipeline (for example based on a user’s area of interest).
The ESA funded EOPORT project implements a cloud native subscription-based NRT exploitation platform for EO data, demonstrating the concept with live Sentinel-1 data. Today, users obtaining Sentinel-1 data on the Copernicus Open Access Hub or the national ground segments are used to getting data within 3 hours (NRT-3h) or 24 hours (NRT-24h). However, over Europe Sentinel-1 is often operated in passthrough mode (NRT-Direct), in which data is downlinked to a ground station directly following sensor data capture.
Exploitation of passthrough data have traditionally been limited to ground station providers. EOPORT offers this capability by providing a low-cost entry to exploiting true NRT data in a public cloud environment. The capabilities complement and can be federated with existing platform initiatives based on published EO products, such as the Thematic Exploitation Platforms (TEPs) and the Data and Information Access Services (DIASes).
The authors will demonstrate the state-of-the-art architecture, scalable processing and pilot achievements, demonstrating downlinking of passthrough Sentinel-1 data at KSAT’s ground station in Tromsø, making raw data chunks available within seconds on T-Systems’ Open Telekom Cloud, producing Level-1 data with Kongsberg Defence & Aerospace’s NRTSAR processor within just a few minutes after sensing. The highly scalable implementation on a Kubernetes cluster ensures that any processing stage is stated and run in parallel, as soon as sufficient measurement data and auxiliary data is available.
There are many web-based platforms offering access to a wealth of satellite earth observation (EO) data. Increasingly, these are collocated with cloud computing resources and applications for exploiting the data. Users are beginning to appreciate the advantages of processing close to the data, some maintaining accounts on multiple platforms. The ‘Exploitation Platform’ concept derives from the need to process an ever-growing volume of data. Rather than download the data, the exploitation platform offers a cloud environment with hosted EO data and associated compute and tools that facilitate the analysis and processing close-to-the-data.
The Users benefit from the data proximity, scalability and performance of the cloud infrastructure – and avoid the need to maintain their own hardware. The Infrastructure Providers gain an increased cloud user base. The Data hosted in the cloud infrastructure reaches a wider audience.
In order to fully exploit the potential of these complementary resources we anticipate the need to encourage interoperation amongst the platforms, such that users of one platform may consume the services of another directly platform-to-platform. This leads to an open network of resources, facilitating easier access and more efficient exploitation of the rapidly growing body of EO and other data.
Thus, the goal of the Common Architecture is to define and agree a re-usable exploitation platform architecture using open interfaces to encourage interoperation and federation within this Network of Resources. Interoperability through open standards is a key guiding force for the Common Architecture:
• Platform developers are more likely to invest their efforts in standard implementations that have wide usage
• Off the shelf solutions are more likely to be found for standards-based solutions
The system architecture is designed to meet a set of defined use cases for various levels of user, from expert application developers to consumers. The main system functionalities are organised into high-level domain areas: 'User Management', 'Processing & Chaining' and 'Resource Management'.
We are developing an open source Reference Implementation, to validate and refine the architecture, and to provide an implementation to the community.
Our solution comprises a OGC API Processes engine that uses Kubernetes to provide an auto-scaling compute solution for ad-hoc analytics, systematic and bulk processing - supported by a hosted processor development environment as an assist to expert users.
Data and applications are discovered through a resource catalogue offering STAC, OGC CSW & API Records, and OpenSearch interfaces. A user-centred workspace provides catalogue and data access interfaces through which the user exploits their own added value products.
For platforms to successfully interoperate they must federate user access in order for requests between services to respect the user's authorisation scope and to account for their actions. Access to resources is secured by means of our identity and access management framework, which uses OpenID Connect and User Managed Access standards to enforce all access attempts in accordance with configured policy rules.
This presentation will highlight the generalised architecture, standards, best practice and open source software components available.
Academic Technology transfer (TT) is an important source of innovation and economic development. The TT success depends on quick and easy adoption of a scientific innovation by an external partner. At FERN.Lab, Helmholtz innovation lab for TT, we use serverless Micro-Service Architectures (MSA) as the main structural style for successful development and transfer of Earth Observation (EO) technology.
MSA allows us to quickly bootstrap from a research prototype to a product, which fits the requirements of different customers ranging from public governmental sectors to private companies, in a question of months. The adopted architecture is easy to integrate into existent workflows, suitable for agile project management, understandable and reusable to facilitate collaborations, and modular to take advantage of new technologies. The latter allows us to integrate various Geo-spatial processing services which contribute to the overall functionality.
At FERN.Lab we believe that TT activities can only have large impact if innovation is continuously integrated into the product development cycles. An MSA allows us to isolate the work of a scientist and it abstracts the scientist from the complexity of an infrastructure such as a cloud. Such isolation combined with the computing environments offered by commercial cloud providers (Google, Amazon, Microsoft) makes it possible to leverage cloud main features without requiring both users and developers a deep knowledge on distributed computing, and thus let the scientist to solely focus in the core of the technology.
The ecosystem of computation environments is rich, Amazon provides AWS Lambda and Fargate, Google offers App Engine and Cloud Run, etc. Besides compute environments for micro-services, cloud providers, such as Amazon and Google, also have large archives of remote sensing data, such as Sentinel-2 and Landsat, which provide easy and efficient access to terabytes of data. Via standard catalog specifications, such as the SpatioTemporal Asset Catalog (STAC), users easily identify which blocks need to be downloaded, or in case of Cloud Optimized GeoTiffs (COGs), streamed for processing. Due to this easy and efficient access to data, micro-services can run in stateless containers and thus, open doors for serverless micro-services.
We present exemplary the use of MSA for habitat classification, land cover prediction, and land surface temperature homogenization. Our work shows the benefits and challenges in accessing and processing large EO data sets from different satellite sensors, Sentinel-1 and -2, Landsat-7, -8, and -9, and ECOSTRESS using different compute environments on Google Cloud. For each of the use-cases, the scientist algorithm is containerized and registered to either be deployed as a micro-service or as a function. For easy portability and interoperability, the algorithms interface is a WebAPI.
The algorithm needs to be stateless and its memory- and storage-footprint should not exceed the resources of the computation node. Despite the amount of resources available for an event driven computation have increased considerably, as an example, recently Google Cloud Run has doubled the amount of available memory to 16-GB main-memory, it is not always possible to deploy a service. A straightforward solution is often data partitioning, but in some cases, it is still not enough. For these situations the algorithm is instead registered as a function in OpenFaaS. With OpenFaaS on Kubernetes the resource limits are easily overcome at the cost of having a more complex setup and higher processing costs.
With service/function registered in a compute environment, their deployment is triggered by calls from a queue system. Such calls are tasks containing a set of parameters defined either by a user via an UI or a workflow from an external partner. With different levels of integration and involvement of external users and developers from four companies and one NGO, we show how MSA helped us to develop and transfer efficient, scalable, and cost-efficient products.