Download Sentinel-2-L2A Data from AWS STAC Catalouge#

Notebook Description#

This Jupyter Notebook demonstrates how to download and process Sentinel-2 L2A satellite imagery data from the AWS STAC Catalogue. The notebook provides a step-by-step guide to querying, retrieving, and visualizing geospatial data using the stackstac library.

Key Steps#

  1. Library Imports:

    • Utilizes libraries such as pystac_client, stackstac, geopandas, shapely, rasterio, and folium for geospatial data processing and visualization.

  2. API Initialization:

    • Connects to the AWS STAC Catalogue using the API base URL and initializes the STAC client.

  3. Data Search:

    • Queries the Sentinel-2 L2A collection for satellite imagery based on spatial (bounding box), temporal (date range), and cloud cover filters.

  4. Bounding Box Creation:

    • Creates a bounding box around a point of interest and reprojects it to the appropriate coordinate reference system (CRS).

  5. Datacube Retrieval:

    • Uses the stackstac library to retrieve and process the datacube, specifying desired bands (e.g., RGB and NIR), resolution, and resampling methods.

  6. Visualization:

    • Visualizes the RGB bands of the datacube as images for visual inspection.

Use Case#

This notebook is ideal for users who want to:

  • Access and process Sentinel-2 satellite imagery from the AWS STAC Catalogue.

  • Learn how to use the stackstac library for creating datacubes.

  • Visualize geospatial data for analysis and interpretation.

Import the required libraries#

import geopandas as gpd
import numpy as np
import pandas as pd
import pystac_client
import stackstac
import rasterio
from rasterio.enums import Resampling
from pyproj import Transformer
from shapely import Point
import xarray as xr

Define the API URL and collection name#

STAC_API = "https://earth-search.aws.element84.com/v1"
COLLECTION = "sentinel-2-l2a"

Search the Sentinel-2-L2A collection#

lat, lon = 39.3336, -0.3545
start = "2024-01-01"
end = "2024-12-01"
# Search the catalogue
catalog = pystac_client.Client.open(STAC_API)
search = catalog.search(
    collections=[COLLECTION],
    datetime=f"{start}/{end}",
    bbox=(lon - 1e-3, lat - 1e-3, lon + 1e-3, lat + 1e-3),
    max_items=100,
    query={"eo:cloud_cover": {"lt": 50}},
)

all_items = search.get_all_items()

# Reduce to one per date (there might be some duplicates
# based on the location)
items = []
dates = []
for item in all_items:
    if item.datetime.date() not in dates:
        items.append(item)
        dates.append(item.datetime.date())

print(f"Found {len(items)} items")
/Users/syam/virtualenvs/myvenv/lib/python3.13/site-packages/pystac_client/item_search.py:896: FutureWarning: get_all_items() is deprecated, use item_collection() instead.
  warnings.warn(
Found 46 items

Create a bounding box around the point#

# Extract coordinate system from first item
epsg = items[0].properties["proj:code"]

# Convert point of interest into the image projection
# (assumes all images are in the same projection)
poidf = gpd.GeoDataFrame(
    pd.DataFrame(),
    crs="EPSG:4326",
    geometry=[Point(lon, lat)],
).to_crs(epsg)

coords = poidf.iloc[0].geometry.coords[0]

# Create bounds in projection
size = 2048
gsd = 10
bounds = (
    coords[0] - (size * gsd) // 2,
    coords[1] - (size * gsd) // 2,
    coords[0] + (size * gsd) // 2,
    coords[1] + (size * gsd) // 2,)

Retrieve the pixel values, for the bounding box in the target projection. In this example we use only the RGB and NIR bands.#

stack = stackstac.stack(
    items,
    bounds=bounds,
    snap_bounds=False,
    epsg=int(epsg.split(":")[-1]),
    resolution=gsd,
    dtype="float64",
    rescale=False,
    fill_value=0,
    assets=["blue", "green", "red", "nir"],
    resampling=Resampling.nearest,
)

print(stack)

stack = stack.compute()
<xarray.DataArray 'stackstac-5ad5cb8b74cbfc433180d1d84415214f' (time: 46,
                                                                band: 4,
                                                                y: 2048, x: 2048)> Size: 6GB
dask.array<fetch_raster_window, shape=(46, 4, 2048, 2048), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/54)
  * time                                     (time) datetime64[ns] 368B 2024-...
    id                                       (time) <U24 4kB 'S2B_30SYJ_20240...
  * band                                     (band) <U5 80B 'blue' ... 'nir'
  * x                                        (x) float64 16kB 7.178e+05 ... 7...
  * y                                        (y) float64 16kB 4.367e+06 ... 4...
    instruments                              <U3 12B 'msi'
    ...                                       ...
    proj:transform                           object 8B {0, 4400040, 10, -10, ...
    gsd                                      int64 8B 10
    common_name                              (band) <U5 80B 'blue' ... 'nir'
    center_wavelength                        (band) float64 32B 0.49 ... 0.842
    full_width_half_max                      (band) float64 32B 0.098 ... 0.145
    epsg                                     int64 8B 32630
Attributes:
    spec:        RasterSpec(epsg=32630, bounds=(717774.9815119422, 4346895.43...
    crs:         epsg:32630
    transform:   | 10.00, 0.00, 717774.98|\n| 0.00,-10.00, 4367375.43|\n| 0.0...
    resolution:  10

Visualize the retrieved results#

stack.sel(band=["red", "green", "blue"]).plot.imshow(
    row="time", rgb="band", vmin=0, vmax=2000, col_wrap=6
)
<xarray.plot.facetgrid.FacetGrid at 0x128fac590>
../_images/8853ffce3835c122c2f54cba7d45cff633b84475719b741d973f3c207b2cc325.png

Compute NDVI#

# Get bands then compute NDVI
stack = stackstac.stack(
    items,
    bounds=bounds,
    snap_bounds=False,
    epsg=int(epsg.split(":")[-1]),
    resolution=gsd,
    dtype="float64",
    rescale=False,
    fill_value=0,
    assets=["red", "nir"],  # Only need red and nir bands for NDVI
    resampling=Resampling.nearest,
)

# Compute the stack
stack = stack.compute()

# stack shape: (time, band, y, x)
# Get band indexes
red_index = list(stack.band.values).index("red")
nir_index = list(stack.band.values).index("nir")

# Extract bands
red = stack.sel(band="red")
nir = stack.sel(band="nir")

# Compute NDVI
ndvi = (nir - red) / (nir + red + 1e-10)  # small value to avoid division by zero

# Optionally: name the band
ndvi.name = "NDVI"
ndvi_ds = xr.Dataset({"NDVI": ndvi})
print(ndvi_ds)
<xarray.Dataset> Size: 2GB
Dimensions:                                  (time: 46, x: 2048, y: 2048)
Coordinates: (12/49)
  * time                                     (time) datetime64[ns] 368B 2024-...
    id                                       (time) <U24 4kB 'S2B_30SYJ_20240...
  * x                                        (x) float64 16kB 7.178e+05 ... 7...
  * y                                        (y) float64 16kB 4.367e+06 ... 4...
    earthsearch:boa_offset_applied           bool 1B True
    s2:product_type                          <U7 28B 'S2MSI2A'
    ...                                       ...
    view:sun_elevation                       (time) float64 368B 26.3 ... 27.97
    raster:bands                             object 8B {'nodata': 0, 'data_ty...
    gsd                                      int64 8B 10
    proj:shape                               object 8B {10980}
    proj:transform                           object 8B {0, 4400040, 10, -10, ...
    epsg                                     int64 8B 32630
Data variables:
    NDVI                                     (time, y, x) float64 2GB 0.3465 ...
# Compute NDVI at once using stackstac
stack = stackstac.stack(
    items,
    bounds=bounds,
    snap_bounds=False,
    epsg=int(epsg.split(":")[-1]),
    resolution=gsd,
    dtype="float64",
    rescale=False,
    fill_value=0,
    assets=["red", "nir"],  # Only red and NIR bands for NDVI
    resampling=Resampling.nearest,
)

# Extract bands by name
red = stack.sel(band="red")
nir = stack.sel(band="nir")

# Compute NDVI safely
ndvi = (nir - red) / (nir + red + 1e-10)  # small value avoids division by 0

ndvi = ndvi.compute()

# Create a dataset
ndvi.name = "NDVI"
ndvi_ds = xr.Dataset({"NDVI": ndvi})

print(ndvi_ds)
<xarray.Dataset> Size: 2GB
Dimensions:                                  (time: 46, x: 2048, y: 2048)
Coordinates: (12/49)
  * time                                     (time) datetime64[ns] 368B 2024-...
    id                                       (time) <U24 4kB 'S2B_30SYJ_20240...
  * x                                        (x) float64 16kB 7.178e+05 ... 7...
  * y                                        (y) float64 16kB 4.367e+06 ... 4...
    instruments                              <U3 12B 'msi'
    processing:software                      object 8B {'sentinel2-to-stac': ...
    ...                                       ...
    s2:thin_cirrus_percentage                (time) float64 368B 11.14 ... 0.124
    proj:shape                               object 8B {10980}
    raster:bands                             object 8B {'nodata': 0, 'data_ty...
    proj:transform                           object 8B {0, 4400040, 10, -10, ...
    gsd                                      int64 8B 10
    epsg                                     int64 8B 32630
Data variables:
    NDVI                                     (time, y, x) float64 2GB 0.3465 ...

Visualize the NDVI#

ndvi.plot.imshow(
    row="time",
    cmap="RdYlGn",
    vmin=-1,
    vmax=1,
    col_wrap=6,
)
<xarray.plot.facetgrid.FacetGrid at 0x16850ee90>
../_images/ccd7a5e7224273beaf252e3ba301d5732e147f8c43fb991d79631cae13a9ba67.png