Download Data from CDSE STAC API As a DataCube Using STACSTACK#

Notebook Description#

This Jupyter Notebook demonstrates how to download and process Sentinel-2 L2A satellite imagery data from the Copernicus Data Space Ecosystem (CDSE) STAC API. The notebook provides a step-by-step guide to querying, retrieving, and visualizing geospatial data as a datacube using the stackstac library.

Key Steps#

  1. Library Imports:

    • Utilizes libraries such as pystac_client, stackstac, geopandas, shapely, rasterio, and dask for geospatial data processing and visualization.

  2. Environment Setup:

    • Configures API URLs and AWS S3 credentials using environment variables for secure access to the CDSE STAC API.

  3. STAC Client Initialization:

    • Demonstrates how to connect to the CDSE STAC API and enable item search functionality.

  4. Data Search:

    • Queries the Sentinel-2 L2A collection for satellite imagery based on spatial (bounding box), temporal (date range), and cloud cover filters.

  5. Bounding Box Creation:

    • Creates a bounding box around a point of interest and reprojects it to the appropriate coordinate reference system (CRS).

  6. Datacube Retrieval:

    • Uses the stackstac library to retrieve and process the datacube, specifying desired bands, resolution, and resampling methods.

  7. Visualization:

    • Visualizes the RGB bands of the datacube as images for visual inspection.

Use Case#

This notebook is ideal for users who want to:

  • Access and process Sentinel-2 satellite imagery from the Copernicus Data Space Ecosystem.

  • Learn how to use the stackstac library for creating datacubes.

  • Visualize geospatial data for analysis and interpretation.

Prerequisites#

  • Create an account on the Copernicus Data Space Ecosystem.

  • Generate S3 Access Key and Secret Key.

Import the required libraries#

import os
import geopandas as gpd
import pandas as pd
from shapely import Point
from pystac_client import Client
import stackstac
from rasterio.enums import Resampling
import os
from rasterio.session import AWSSession
import dask
from dotenv import load_dotenv
load_dotenv("../.env")
dask.config.set(scheduler="single-threaded")
<dask.config.set at 0x12f7bc550>

Define API URL and ENV variables for S3 storage#

BASE_URL = "https://stac.dataspace.copernicus.eu/v1/" # new version
# BASE_URL = "https://catalogue.dataspace.copernicus.eu/stac"
ACCESS_KEY = os.getenv("ACCESS_KEY")
SECRET_KEY = os.getenv("SECRET_KEY")
CDSE_ENDPOINT = "eodata.dataspace.copernicus.eu"
os.environ["GDAL_HTTP_TCP_KEEPALIVE"] = "YES"
os.environ["AWS_S3_ENDPOINT"] = CDSE_ENDPOINT
os.environ["AWS_ACCESS_KEY_ID"] = ACCESS_KEY
os.environ["AWS_SECRET_ACCESS_KEY"] = SECRET_KEY
os.environ["AWS_HTTPS"] = "YES"
os.environ["AWS_VIRTUAL_HOSTING"] = "FALSE"
os.environ["GDAL_HTTP_UNSAFESSL"] = "YES"
os.environ["GDAL_SKIP"]="netCDF"

Initiate STAC client catalouge#

client = Client.open(BASE_URL)
client.add_conforms_to("ITEM_SEARCH")

Search Sentinel-2-L2A collection for images#

# Define the bounding box
lat, lon = 37.9838, 23.7275
# Search for Sentinel-2 L2A items based on the bounding box and cloud cover criteria
search = client.search(
    collections=["sentinel-2-l2a"],
    bbox=(lon - 1e-5, lat - 1e-5, lon + 1e-5, lat + 1e-5),
    datetime="2023-08-01/2023-08-30",
    query={"eo:cloud_cover": {"lt": 20}},
)

# Process the search results and select unique items based on acquisition date
all_items = list(search.items())
unique_items = {}
for item in all_items:
    acq_date = item.datetime.date()
    if acq_date not in unique_items:
        unique_items[acq_date] = item

# Sort the items by their acquisition date
items = list(unique_items.values())
items = sorted(items, key=lambda item: item.datetime)
items[0]

Create a bounding box around the point#

epsg = "32635"

# Convert point of interest into the image projection
# (assumes all images are in the same projection)
poidf = gpd.GeoDataFrame(
    pd.DataFrame(),
    crs="EPSG:4326",
    geometry=[Point(lon, lat)],
).to_crs(epsg)

coords = poidf.iloc[0].geometry.coords[0]

# Create bounds in projection
size = 256
gsd = 10
bounds = (
    coords[0] - (size * gsd) // 2,
    coords[1] - (size * gsd) // 2,
    coords[0] + (size * gsd) // 2,
    coords[1] + (size * gsd) // 2,
)

Use STACKSTAC to retrieve the datacube#

gdal_env = stackstac.DEFAULT_GDAL_ENV.updated(
    {"GDAL_HTTP_UNSAFESSL":"YES",
     "GDAL_HTTP_TCP_KEEPALIVE":"YES",
     "AWS_VIRTUAL_HOSTING":"FALSE",
     "AWS_HTTPS":"YES"
    }
)

# Create the stack using stackstac
stack = stackstac.stack(
    items,
    bounds=bounds,
    snap_bounds=False,
    resolution=gsd,
    dtype="float64",
    rescale=False,
    fill_value=0,
    assets=["B02_10m", "B03_10m", "B04_10m", "B08_10m"],
    resampling=Resampling.nearest,
    epsg=32635,
    gdal_env=gdal_env,
)

print(stack)

# Compute the stack
stack = stack.compute()
<xarray.DataArray 'stackstac-1830d39c342e736bfaf74c48d39e8b4a' (time: 5,
                                                                band: 4,
                                                                y: 256, x: 256)> Size: 10MB
dask.array<fetch_raster_window, shape=(5, 4, 256, 256), dtype=float64, chunksize=(1, 1, 256, 256), chunktype=numpy.ndarray>
Coordinates: (12/53)
  * time                                   (time) datetime64[ns] 40B 2023-08-...
    id                                     (time) <U60 1kB 'S2A_MSIL2A_202308...
  * band                                   (band) <U7 112B 'B02_10m' ... 'B08...
  * x                                      (x) float64 2kB 2.113e+05 ... 2.13...
  * y                                      (y) float64 2kB 4.21e+06 ... 4.208...
    processing:level                       <U2 8B 'L2'
    ...                                     ...
    proj:transform                         object 8B {0, 10, 199980, 4300020,...
    proj:shape                             object 8B {10980}
    title                                  (band) <U20 320B 'Blue (band 2) - ...
    storage:refs                           object 8B {'creodias-s3', 'cdse-s3'}
    proj:code                              <U10 40B 'EPSG:32635'
    epsg                                   int64 8B 32635
Attributes:
    spec:        RasterSpec(epsg=32635, bounds=(211301.8144811028, 4207792.27...
    crs:         epsg:32635
    transform:   | 10.00, 0.00, 211301.81|\n| 0.00,-10.00, 4210352.27|\n| 0.0...
    resolution:  10

Visualize the RGB bands#

stack.sel(band=["B04_10m", "B03_10m", "B02_10m"]).plot.imshow(
    row="time", rgb="band", vmin=0, vmax=2000, col_wrap=6
)
<xarray.plot.facetgrid.FacetGrid at 0x12ff456a0>
../_images/78e7dcc9d934fe2539e9ca7449d51f2f6579d4c98f9f84475522c59e8a6a2a1f.png