Skip to content

SimpleSTAC : STAC tools to simplify STAC use.🔗

Documentation

licence python build status

Features🔗

Build a STAC ItemCollection based on local raster data:

  • functions to get the minimal raster metadata (bbox, geometry, projection) in STAC format
  • class MyStacItem to create a simple STAC item with raster files
  • function build_item_collection to build your small ItemCollection with a template for further metadata

Extends class pystac.ItemCollection with methods to simplify data manipulation:

  • sort collection items
  • filter (subset) cube by spatio-temporal coordinates and assets
  • convert to a lazy dask DataArray cube
  • convert to a geodataframe
  • apply a function to each item or on a rolling window, write and add the created assets to current collection.

Additional functions:

  • apply formula to a DataArray cube (e.g. "(B08-B04)/(B08+B04)")
  • write collection assets to local files

Get started🔗

Recommendations🔗

Install the package in a virtual environment. See miniforge for conda/mamba, or venv for virtualenv.

Install🔗

Create a conda env:

mamba env create -n simplestac --file https://forgemia.inra.fr/umr-tetis/stac/simplestac/-/raw/main/environment.yml

Within a virtualenv:

pip install git+https://forgemia.inra.fr/umr-tetis/stac/simplestac

Update🔗

Update simplestac within an env (conda or virtualenv):

pip install git+https://forgemia.inra.fr/umr-tetis/stac/simplestac

Notebooks🔗

Example notebooks make use of optional packages such as ipykernel or xpystac which can be installed with:

# in conda env
mamba env update -n simplestac --file https://forgemia.inra.fr/umr-tetis/stac/simplestac/-/raw/main/environment-notebook.yml
# or in venv
pip install git+https://forgemia.inra.fr/umr-tetis/stac/simplestac.git#egg=simplestac[notebook]

Known issues🔗

On Windows, the installation of simplestac may end with an error due to Filename too long in stac-static, a dependency of simplestac. Git can be configured to manage such filenames:

git config --system core.longpaths true

Examples🔗

See example scripts for a detailed demo of simpleSTAC features.

Below is a small teaser.

Example data🔗

Example data can be downloaded here or with the script download_data.py

Create a local STAC ItemCollection🔗

In this example, local files (see previous section) are parsed to build a STAC ItemCollection.

from path import Path
from simplestac.utils import ItemCollection
from simplestac.local import collection_format, build_item_collection

image_dir = data_dir / "s2_scenes"
res_dir = data_dir / "local_stac"
print(res_dir.mkdir_p())

col_file = res_dir / "collection.json"

# Let's start from the example collection built as in static_stac.py
# directory containing the remote sensing scenes
fmt = collection_format("S2_L2A_THEIA")
col = build_item_collection(image_dir, fmt)
col.save_object(col_file)

Anyone can make his own format depending on the naming of the file names. The parsing is based on regex patterns.

The expected minimal structure is a json with the following:

  • item: metadata that are item-wise (i.e. independent of the asset)
  • pattern: how to parse the item id from the item directory
  • datetime: relative to datetime parsing (pattern, format)
  • item_assets: the metadata relative to each asset
  • asset key:
    • pattern: regex pattern to find the band among the recursive list of files

See a simple Theia format made for the example.

Extended pystac.ItemCollection🔗

After executing previous code, the following:

  1. converts the created ItemCollection into a geodataframe and plots its bouding box over the geometry of a region of interest,
  2. computes NDVI over a collection subset and plots it.

The same could be done with a remote item collection, see examples.

import geopandas as gpd
from simplestac.utils import apply_formula
# Load the item collection
ItemCollection.from_file(col_file)

# Load the region of interest
roi = gpd.read_file(data_dir / "roi.geojson")

# Plot the collection geometry and the region of interest
ax = col.to_geodataframe().iloc[:1,:].to_crs(roi.crs).boundary.plot(color="red")
roi.plot(ax=ax)

# Compute the NDVI on a subset of the collection (clipped by datetime and geometry). Each NDVI raster is written in a local COG file and inserted in item assets in order to avoid memory overflow.
col.apply_items(
    fun=apply_formula, # a function that returns one or more xarray.DataArray
    name="NDVI",
    formula="((B08 - B04) / (B08 + B04))",
    output_dir=res_dir / "NDVI",
    datetime="2018-01-01/..",
    geometry=roi.geometry,
    inplace=True
)

# Select scenes with the NDVI
arr = col.filter(with_assets="NDVI").to_xarray()
assert "NDVI" in arr.band.values

# Plot NDVI applying cloud mask
mask = arr.sel(band="CLM") > 0
arr.sel(band="NDVI").where(~mask).isel(time=range(4)).plot(col="time", col_wrap=2)