Create Cloud Native Geospatial Formats#
This notebook provides step-by-step instructions for converting geospatial data into cloud-native formats, which are optimized for efficient storage, access, and processing in cloud environments. The notebook covers the following workflows:
Convert JP2000 to Cloud Optimized GeoTIFF (COG): Demonstrates how to use the GDAL library to transform raster data from the JP2000 format into the COG format, enabling better performance for cloud-based applications.
Convert SHP to GeoParquet: Explains how to use GeoPandas to convert vector data from the SHP format to the GeoParquet format, which supports efficient querying and storage.
Convert NetCDF to Zarr: Shows how to use the xarray library to convert NetCDF files into the Zarr format, a cloud-friendly format designed for scalable and efficient data storage.
Convert a raster JP2000 to Cloud Optimized GeoTiff (COG) using GDAL#
import glob
from osgeo import gdal
list_input_jp2 = glob.glob("../data/*.jp2")
print (list_input_jp2)
for input_jp2 in list_input_jp2:
print (input_jp2)
output_cog = input_jp2.replace('.jp2', '_cog.tif')
print (output_cog)
options = gdal.TranslateOptions(
format='COG',
creationOptions=[
'COMPRESS=LZW', # Compression
'BLOCKSIZE=512', # Block size for better cloud access
'OVERVIEWS=IGNORE_EXISTING' # Force creation of overviews
]
)
gdal.Translate(destName=output_cog, srcDS=input_jp2, options=options)
['../data/T34SGH_20240608T090601_TCI_10m.jp2', '../data/T34SFG_20240621T092031_TCI_10m.jp2', '../data/T34SEG_20240601T092031_TCI_10m.jp2', '../data/T34SFF_20220209T091131_TCI_10m.jp2']
../data/T34SGH_20240608T090601_TCI_10m.jp2
../data/T34SGH_20240608T090601_TCI_10m_cog.tif
/Users/syam/virtualenvs/myvenv/lib/python3.13/site-packages/osgeo/gdal.py:330: FutureWarning: Neither gdal.UseExceptions() nor gdal.DontUseExceptions() has been explicitly called. In GDAL 4.0, exceptions will be enabled by default.
warnings.warn(
../data/T34SFG_20240621T092031_TCI_10m.jp2
../data/T34SFG_20240621T092031_TCI_10m_cog.tif
../data/T34SEG_20240601T092031_TCI_10m.jp2
../data/T34SEG_20240601T092031_TCI_10m_cog.tif
../data/T34SFF_20220209T091131_TCI_10m.jp2
../data/T34SFF_20220209T091131_TCI_10m_cog.tif
Convert a SHP file to GeoParquet using Geopandas (copy-paste code in a .py file)#
# import geopandas as gpd
# input_shp = "../data/sentinel-2-tiles-greece.shp"
# output_geoparquet = "../data/test.parquet"
# gdf = gpd.read_file(input_shp)
# gdf.to_parquet(output_geoparquet, engine="pyarrow")
# geoparquet = gpd.read_parquet(output_geoparquet)
# print(geoparquet.head())
# print(geoparquet.columns)
# print(geoparquet.dtypes)
# print(geoparquet.crs)
# print(geoparquet.geometry.head())
# print(geoparquet.geometry.dtypes)
# print(geoparquet.geometry.iloc[0].wkt)
# print(geoparquet.geometry.iloc[0].type)
# print(geoparquet.geometry.iloc[0].bounds)
# print(geoparquet.geometry.iloc[0].area)
Convert NetCDF to Zarr using xarray#
import xarray as xr
netcdf_file = "../data/era5.nc"
ds = xr.open_dataset(netcdf_file)
zarr_file = "../data/era5.zarr"
ds.to_zarr(zarr_file, mode='w')
print(f"Conversion complete: {zarr_file}")
ds_zarr = xr.open_zarr("../data/era5.zarr")
# Check metadata and structure
print("dataset",ds_zarr)
print("dimenstions",ds_zarr.dims)
print("variables",ds_zarr.variables)
/Users/syam/virtualenvs/myvenv/lib/python3.13/site-packages/zarr/codecs/vlen_utf8.py:44: UserWarning: The codec `vlen-utf8` is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
return cls(**configuration_parsed)
/Users/syam/virtualenvs/myvenv/lib/python3.13/site-packages/zarr/core/array.py:3989: UserWarning: The dtype `<U4` is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
meta = AsyncArray._create_metadata_v3(
/Users/syam/virtualenvs/myvenv/lib/python3.13/site-packages/zarr/codecs/vlen_utf8.py:44: UserWarning: The codec `vlen-utf8` is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
return cls(**configuration_parsed)
/Users/syam/virtualenvs/myvenv/lib/python3.13/site-packages/zarr/api/asynchronous.py:203: UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
warnings.warn(
Conversion complete: ../data/era5.zarr
dataset <xarray.Dataset> Size: 4MB
Dimensions: (pressure_level: 1, valid_time: 1, latitude: 721,
longitude: 1440)
Coordinates:
* pressure_level (pressure_level) float64 8B 1e+03
number int64 8B ...
* latitude (latitude) float64 6kB 90.0 89.75 89.5 ... -89.75 -90.0
expver object 8B ...
* longitude (longitude) float64 12kB -180.0 -179.8 ... 179.5 179.8
* valid_time (valid_time) datetime64[ns] 8B 2023-02-01T13:00:00
Data variables:
z (valid_time, pressure_level, latitude, longitude) float32 4MB dask.array<chunksize=(1, 1, 181, 720), meta=np.ndarray>
Attributes:
GRIB_centre: ecmf
GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts
GRIB_subCentre: 0
Conventions: CF-1.7
institution: European Centre for Medium-Range Weather Forecasts
history: 2025-04-11T08:09 GRIB to CDM+CF via cfgrib-0.9.1...
dimenstions FrozenMappingWarningOnValuesAccess({'pressure_level': 1, 'valid_time': 1, 'latitude': 721, 'longitude': 1440})
variables Frozen({'pressure_level': <xarray.IndexVariable 'pressure_level' (pressure_level: 1)> Size: 8B
array([1000.])
Attributes:
long_name: pressure
units: hPa
positive: down
stored_direction: decreasing
standard_name: air_pressure, 'number': <xarray.Variable ()> Size: 8B
[1 values with dtype=int64]
Attributes:
long_name: ensemble member numerical id
units: 1
standard_name: realization, 'z': <xarray.Variable (valid_time: 1, pressure_level: 1, latitude: 721,
longitude: 1440)> Size: 4MB
dask.array<open_dataset-z, shape=(1, 1, 721, 1440), dtype=float32, chunksize=(1, 1, 181, 720), chunktype=numpy.ndarray>
Attributes: (12/31)
GRIB_paramId: 129
GRIB_dataType: an
GRIB_numberOfPoints: 1038240
GRIB_typeOfLevel: isobaricInhPa
GRIB_stepUnits: 1
GRIB_stepType: instant
... ...
GRIB_shortName: z
GRIB_totalNumber: 0
GRIB_units: m**2 s**-2
long_name: Geopotential
units: m**2 s**-2
standard_name: geopotential, 'latitude': <xarray.IndexVariable 'latitude' (latitude: 721)> Size: 6kB
array([ 90. , 89.75, 89.5 , ..., -89.5 , -89.75, -90. ], shape=(721,))
Attributes:
units: degrees_north
standard_name: latitude
long_name: latitude
stored_direction: decreasing, 'expver': <xarray.Variable ()> Size: 8B
[1 values with dtype=object], 'longitude': <xarray.IndexVariable 'longitude' (longitude: 1440)> Size: 12kB
array([-180. , -179.75, -179.5 , ..., 179.25, 179.5 , 179.75],
shape=(1440,))
Attributes:
units: degrees_east
standard_name: longitude
long_name: longitude, 'valid_time': <xarray.IndexVariable 'valid_time' (valid_time: 1)> Size: 8B
array(['2023-02-01T13:00:00.000000000'], dtype='datetime64[ns]')
Attributes:
long_name: time
standard_name: time})
/Users/syam/virtualenvs/myvenv/lib/python3.13/site-packages/zarr/codecs/vlen_utf8.py:44: UserWarning: The codec `vlen-utf8` is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
return cls(**configuration_parsed)