Day 2 – Cloud-Native EO Data#

Cloud technology is driving the future of EO and Earth System Data. Geospatial data continues to grow in size and significane, which makes managing and analyzing it effectively resource demanding.

Cloud-native geospatial formats are designed for seamless, on-demand access—enabling users to search, filter, and process data directly in the cloud without needing to download it locally. This shift is powered by a thriving ecosystem of open-source tools that support scalable, remote analysis of geospatial information.

Cloud Object Storage#

In a cloud-native approach, geospatial data is best hosted in object storage systems that are accessible over the web—ideally through publicly accessible URLs. This model promotes open, scalable, and resilient data sharing. Major commercial providers of cloud object storage include:

Beyond the commercial space, many institutions and organizations also deploy private cloud infrastructure using S3-compatible storage solutions such as:

These systems allow users to implement cloud-native storage principles within local or consortium-based environments, combining the flexibility of object storage with data sovereignty and control.

Leveraging existing cloud storage infrastructure:

  • Reduces the burden on data providers to build and maintain their own hosting environments or custom APIs.

  • Enables them to focus on curating high-quality datasets while ensuring those datasets are reliably accessible and easily shared.

  • Helps mitigate the risks of hardware failures and data loss, offering a durable and cost-effective alternative to traditional local storage.

Traditional vs. Cloud-Native Geospatial Formats#

Category

Traditional Geospatial Formats

Cloud-Native Geospatial Formats

Raster Data

GeoTIFF, IMG, HDF, NetCDF

Cloud Optimized GeoTIFF (COG), Zarr

Vector Data

Shapefile, GeoJSON, KML

FlatGeobuf, Parquet (with GeoParquet), GeoArrow

Storage Requirements

Typically stored locally

Optimized for cloud object storage (e.g., S3)

Access Pattern

Requires full download before access

Supports partial reads and streaming access

Metadata Handling

Embedded or separate sidecar files

Designed for embedded metadata and efficient discovery

Performance

Slower in distributed or remote environments

Tuned for high-performance access in cloud workflows

Compatibility

Widely supported in legacy desktop tools

Increasing support in modern cloud-based tools

Scalability

Limited by local machine capabilities

Scalable with cloud compute and distributed frameworks

Examples of Usage

QGIS, ArcGIS Desktop

STAC, Dask, Xarray, Rasterio, GeoPandas (cloud configs)

Cloud Data Sharing and Discovery#

Sharing geospatial data has been improved with the use of Cloud-Native formats. These formats allow users to access data using http streaming without downloading the entire dataset and are best stored in Cloud Object Storage.

One of the main efforts for cloud-native geospatial data sharing is SpatioTemporal Assest Catalog STAC. It is one of the widely used standards to describe geospatial information that can be georeferenced images and vector files to make it indexed and discovered. Some of the examples are given below:

There are 3 main specifications for the STAC standard:

  • STAC Catalog: a JSON file that provides a structure to organize and browse STAC items.

  • STAC Collection: an extension of the STAC catalog that provides additional information on STAC items in that collection such as the spatial and temporal extents, license providers, etc.

  • STAC item: the core atomic unit that represents a single spatiotemporal asset as a GeoJSON feature plus datetime and links.

The implementation of these 3 specifications forms a static STAC catalog that cannot be dynamically queried. Static STAC catalog is a set of interconnected JSON files on a web server that are often stored in a cloud storage service. Any http server can expose a static catalog as files. This catalog can be crawled by search engines, which makes the data discoverable. On the other hand, a dynamic STAC catalog is implemented in software as an HTTP-based API following the same specified JSON structure for items, catalogs and collections. It shares the same essence with static catalog as its objective is to make data discoverable. However, if the dynamic catalog implements STAC API specifications, it allows the indexed data to be searchable.

Important open-source tools associated with STAC include:

  • STAC Browser: a web-based tool for browsing and searching STAC catalogs.

  • STAC API: a RESTful API for querying and accessing STAC items and collections.

  • PGSTAC: a PostgreSQL-based implementation of STAC for managing and querying geospatial data.

  • STAC Tools: a basic command line interface and API for working with STAC catalogs.