Day 2 – Cloud-Native EO Data#
Cloud technology is driving the future of EO and Earth System Data. Geospatial data continues to grow in size and significane, which makes managing and analyzing it effectively resource demanding.
Cloud-native geospatial formats are designed for seamless, on-demand access—enabling users to search, filter, and process data directly in the cloud without needing to download it locally. This shift is powered by a thriving ecosystem of open-source tools that support scalable, remote analysis of geospatial information.
Cloud Object Storage#
In a cloud-native approach, geospatial data is best hosted in object storage systems that are accessible over the web—ideally through publicly accessible URLs. This model promotes open, scalable, and resilient data sharing. Major commercial providers of cloud object storage include:
Beyond the commercial space, many institutions and organizations also deploy private cloud infrastructure using S3-compatible storage solutions such as:
These systems allow users to implement cloud-native storage principles within local or consortium-based environments, combining the flexibility of object storage with data sovereignty and control.
Leveraging existing cloud storage infrastructure:
Reduces the burden on data providers to build and maintain their own hosting environments or custom APIs.
Enables them to focus on curating high-quality datasets while ensuring those datasets are reliably accessible and easily shared.
Helps mitigate the risks of hardware failures and data loss, offering a durable and cost-effective alternative to traditional local storage.
Traditional vs. Cloud-Native Geospatial Formats#
Category |
Traditional Geospatial Formats |
Cloud-Native Geospatial Formats |
|---|---|---|
Raster Data |
GeoTIFF, IMG, HDF, NetCDF |
Cloud Optimized GeoTIFF (COG), Zarr |
Vector Data |
Shapefile, GeoJSON, KML |
FlatGeobuf, Parquet (with GeoParquet), GeoArrow |
Storage Requirements |
Typically stored locally |
Optimized for cloud object storage (e.g., S3) |
Access Pattern |
Requires full download before access |
Supports partial reads and streaming access |
Metadata Handling |
Embedded or separate sidecar files |
Designed for embedded metadata and efficient discovery |
Performance |
Slower in distributed or remote environments |
Tuned for high-performance access in cloud workflows |
Compatibility |
Widely supported in legacy desktop tools |
Increasing support in modern cloud-based tools |
Scalability |
Limited by local machine capabilities |
Scalable with cloud compute and distributed frameworks |
Examples of Usage |
QGIS, ArcGIS Desktop |
STAC, Dask, Xarray, Rasterio, GeoPandas (cloud configs) |
Cloud Data Sharing and Discovery#
Sharing geospatial data has been improved with the use of Cloud-Native formats. These formats allow users to access data using http streaming without downloading the entire dataset and are best stored in Cloud Object Storage.
One of the main efforts for cloud-native geospatial data sharing is SpatioTemporal Assest Catalog STAC. It is one of the widely used standards to describe geospatial information that can be georeferenced images and vector files to make it indexed and discovered. Some of the examples are given below:
There are 3 main specifications for the STAC standard:
STAC Catalog: a JSON file that provides a structure to organize and browse STAC items.
STAC Collection: an extension of the STAC catalog that provides additional information on STAC items in that collection such as the spatial and temporal extents, license providers, etc.
STAC item: the core atomic unit that represents a single spatiotemporal asset as a GeoJSON feature plus datetime and links.
The implementation of these 3 specifications forms a static STAC catalog that cannot be dynamically queried. Static STAC catalog is a set of interconnected JSON files on a web server that are often stored in a cloud storage service. Any http server can expose a static catalog as files. This catalog can be crawled by search engines, which makes the data discoverable.
On the other hand, a dynamic STAC catalog is implemented in software as an HTTP-based API following the same specified JSON structure for items, catalogs and collections. It shares the same essence with static catalog as its objective is to make data discoverable. However, if the dynamic catalog implements STAC API specifications, it allows the indexed data to be searchable.
Important open-source tools associated with STAC include:
STAC Browser: a web-based tool for browsing and searching STAC catalogs.
STAC API: a RESTful API for querying and accessing STAC items and collections.
PGSTAC: a PostgreSQL-based implementation of STAC for managing and querying geospatial data.
STAC Tools: a basic command line interface and API for working with STAC catalogs.





