Satellite Data Readers¶

This module provides specialized readers for various optical satellite missions. All these readers implement the GeoData protocol, which means they provide a consistent interface for spatial operations, data access, and manipulation.

These readers make it easy to work with official data formats from different Earth observation missions, and they can be used with all the functions available in the georeader.read module.

Readers available:

Sentinel-2
Proba-V
SpotVGT
EMIT
PRISMA
EnMAP
Carbon Mapper

Sentinel-2 Reader¶

The Sentinel-2 reader provides functionality for reading Sentinel-2 L1C and L2A products in SAFE format. It supports:

Direct reading from local files or cloud storage (Google Cloud Storage)
Windowed reading for efficient memory usage
Conversion from digital numbers to radiance
Access to metadata, including viewing geometry and solar angles

Tutorial examples:

API Reference¶

Sentinel-2 SAFE Product Reader for L1C and L2A Data.

This module provides readers for Sentinel-2 satellite imagery in the SAFE format, supporting both Level-1C (top-of-atmosphere reflectance) and Level-2A (surface reflectance) products. It handles local files and cloud storage (Google Cloud).

Sentinel-2 Product Levels¶

::

┌─────────────────────────────────────────────────────────────────────────┐
│                 SENTINEL-2 PROCESSING LEVELS                             │
│                                                                          │
│   Level-1C (L1C)                      Level-2A (L2A)                     │
│   ─────────────────                   ─────────────────                  │
│                                                                          │
│   ☀️ Sun                               ☀️ Sun                             │
│    │                                   │                                 │
│    ▼                                   ▼                                 │
│   ┌─────────┐                        ┌─────────┐                        │
│   │Atmosphere│ ◄─ NOT corrected      │Atmosphere│ ◄─ CORRECTED          │
│   └────┬────┘                        └────┬────┘                        │
│        │                                  │                              │
│        ▼                                  ▼                              │
│   ┌─────────┐                        ┌─────────┐                        │
│   │ Surface │                        │ Surface │                        │
│   └─────────┘                        └─────────┘                        │
│        │                                  │                              │
│        ▼ 🛰️                              ▼ 🛰️                           │
│                                                                          │
│   TOA Reflectance                     BOA Reflectance                   │
│   - Includes atmospheric effects      - Surface reflectance             │
│   - Globally available                - Atmospheric correction applied  │
│   - Can convert to radiance           - Scene Classification (SCL)     │
│   - 13 bands (incl. B10 cirrus)       - 12 bands (no B10)              │
│                                                                          │
│   Use for:                            Use for:                          │
│   - Radiance-based analysis           - Land cover mapping              │
│   - Custom atmospheric correction     - Vegetation indices (NDVI)       │
│   - Cloud studies (B10)               - Change detection                │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Spectral Bands¶

::

Band │ Central λ │ Bandwidth │ Resolution │ L1C │ L2A │ Description
─────┼───────────┼───────────┼────────────┼─────┼─────┼─────────────────────
B01  │   443 nm  │   20 nm   │    60m     │  ✓  │  ✓  │ Coastal/Aerosol
B02  │   490 nm  │   65 nm   │    10m     │  ✓  │  ✓  │ Blue
B03  │   560 nm  │   35 nm   │    10m     │  ✓  │  ✓  │ Green
B04  │   665 nm  │   30 nm   │    10m     │  ✓  │  ✓  │ Red
B05  │   705 nm  │   15 nm   │    20m     │  ✓  │  ✓  │ Red Edge 1
B06  │   740 nm  │   15 nm   │    20m     │  ✓  │  ✓  │ Red Edge 2
B07  │   783 nm  │   20 nm   │    20m     │  ✓  │  ✓  │ Red Edge 3
B08  │   842 nm  │  115 nm   │    10m     │  ✓  │  ✓  │ NIR
B8A  │   865 nm  │   20 nm   │    20m     │  ✓  │  ✓  │ NIR Narrow
B09  │   945 nm  │   20 nm   │    60m     │  ✓  │  ✓  │ Water Vapour
B10  │  1375 nm  │   30 nm   │    60m     │  ✓  │  ✗  │ Cirrus (L1C only)
B11  │  1610 nm  │   90 nm   │    20m     │  ✓  │  ✓  │ SWIR 1
B12  │  2190 nm  │  180 nm   │    20m     │  ✓  │  ✓  │ SWIR 2

Data Access¶

Products can be loaded from:

Local SAFE folders::

s2 = S2ImageL2A("/data/S2A_MSIL2A_20240115T...SAFE")
Google Cloud Public Bucket (free, no auth)::

path = "gs://gcp-public-data-sentinel-2/tiles/32/T/QM/..." s2 = S2ImageL2A(path)
Other cloud storage (via fsspec)::

s2 = S2ImageL2A("s3://bucket/S2A_MSIL2A_...SAFE", requester_pays=True)

Quick Start Examples¶

Load L2A surface reflectance (most common)::

from georeader.readers.S2_SAFE_reader import S2ImageL2A
from shapely.geometry import box

# Define area of interest in WGS84
aoi = box(-3.75, 40.40, -3.65, 40.50)  # Madrid area

# Load from Google Cloud public bucket
s2 = S2ImageL2A(
    "gs://gcp-public-data-sentinel-2/L2/tiles/30/T/VK/"
    "S2A_MSIL2A_20240115T110351_N0510_R094_T30TVK_20240115T144512.SAFE",
    polygon=aoi,
    out_res=10,  # 10m resolution
    bands=["B04", "B03", "B02", "B08"]  # RGBNIR
)

# Load as GeoTensor
gt = s2.load()
print(f"Shape: {gt.shape}")  # (4, H, W)
print(f"CRS: {gt.crs}")      # EPSG:32630 (UTM 30N)

Load L1C and convert to radiance::

from georeader.readers.S2_SAFE_reader import S2ImageL1C

s2_l1c = S2ImageL1C("/path/to/S2A_MSIL1C_...SAFE", polygon=aoi)

# Read tile metadata for solar angles
s2_l1c.read_metadata_tl()

# Get solar zenith angle
sza = s2_l1c.mean_sza

# Convert DN to at-sensor radiance (W/m²/sr/µm)
radiance = s2_l1c.DN_to_radiance(bands=["B04", "B03", "B02"])

Classes¶

S2Image Base class with shared functionality (don't use directly) S2ImageL1C Level-1C reader with TOA reflectance and angle accessors S2ImageL2A Level-2A reader with surface reflectance

References¶

ESA Sentinel-2 User Guide: https://sentinel.esa.int/web/sentinel/user-guides/sentinel-2-msi
Google Cloud Sentinel-2 Bucket: https://cloud.google.com/storage/docs/public-datasets/sentinel-2
Sentinel-2 Radiometric Resolution: https://sentiwiki.copernicus.eu/web/s2-processing

Authors: Gonzalo Mateo-García, Dan Lopez-Puigdollers

`S2Image` ¶

Base Sentinel-2 image reader for handling Sentinel-2 satellite products. Do Not use this class directly, use S2ImageL1C or S2ImageL2A instead.

This class provides functionality to read and manipulate Sentinel-2 satellite imagery. It handles the specific format and metadata of Sentinel-2 products, supporting operations like loading bands, masks, and converting digital numbers to radiance.

Parameters:

Name	Type	Description	Default
`s2folder`	`str`	Path to the Sentinel-2 SAFE product folder.	required
`polygon`	`Optional[Polygon]`	Polygon defining the area of interest in EPSG:4326. Defaults to None (entire image).	`None`
`granules`	`Optional[Dict[str, str]]`	Dictionary mapping band names to file paths. Defaults to None (automatically discovered).	`None`
`out_res`	`int`	Output resolution in meters. Must be one of 10, 20, or 60. Defaults to 10.	`10`
`window_focus`	`Optional[Window]`	Window to focus on a specific region of the image. Defaults to None (entire image).	`None`
`bands`	`Optional[List[str]]`	List of bands to read. If None, all available bands will be loaded based on the product type.	`None`
`metadata_msi`	`Optional[str]`	Path to metadata file. If None, it is assumed to be in the SAFE folder.	`None`

Attributes:

Name	Type	Description
`mission`	`str`	Mission identifier (e.g., 'S2A', 'S2B').
`producttype`	`str`	Product type identifier (e.g., 'MSIL1C', 'MSIL2A').
`pdgs`	`str`	PDGS Processing Baseline number.
`relorbitnum`	`str`	Relative Orbit number.
`tile_number_field`	`str`	Tile Number field.
`product_discriminator`	`str`	Product Discriminator.
`name`	`str`	Base name of the product.
`folder`	`str`	Path to the product folder.
`datetime`	`datetime`	Acquisition datetime.
`metadata_msi`	`str`	Path to the MSI metadata file.
`out_res`	`int`	Output resolution in meters.
`bands`	`List[str]`	List of bands to read.
`dims`	`Tuple[str]`	Names of the dimensions ("band", "y", "x").
`fill_value_default`	`int`	Default fill value (typically 0).
`band_check`	`str`	Band used as template for reading.
`granule_readers`	`Dict[str, RasterioReader]`	Dictionary of readers for each band.
`window_focus`	`Window`	Current window focus.
`transform`		Affine transform for the window.
`crs`		Coordinate reference system.
`shape`		Shape of the data (bands, height, width).
`bounds`		Bounds of the window.
`res`	`Tuple[float, float]`	Resolution of the data.

Source code in georeader/readers/S2_SAFE_reader.py

class S2Image:
    """
    Base Sentinel-2 image reader for handling Sentinel-2 satellite products.
    Do Not use this class directly, use S2ImageL1C or S2ImageL2A instead.

    This class provides functionality to read and manipulate Sentinel-2 satellite imagery.
    It handles the specific format and metadata of Sentinel-2 products, supporting operations
    like loading bands, masks, and converting digital numbers to radiance.

    Args:
        s2folder (str): Path to the Sentinel-2 SAFE product folder.
        polygon (Optional[Polygon]): Polygon defining the area of interest in EPSG:4326.
            Defaults to None (entire image).
        granules (Optional[Dict[str, str]]): Dictionary mapping band names to file paths.
            Defaults to None (automatically discovered).
        out_res (int): Output resolution in meters. Must be one of 10, 20, or 60. Defaults to 10.
        window_focus (Optional[rasterio.windows.Window]): Window to focus on a specific
            region of the image. Defaults to None (entire image).
        bands (Optional[List[str]]): List of bands to read. If None, all available bands
            will be loaded based on the product type.
        metadata_msi (Optional[str]): Path to metadata file. If None, it is assumed to be
            in the SAFE folder.

    Attributes:
        mission (str): Mission identifier (e.g., 'S2A', 'S2B').
        producttype (str): Product type identifier (e.g., 'MSIL1C', 'MSIL2A').
        pdgs (str): PDGS Processing Baseline number.
        relorbitnum (str): Relative Orbit number.
        tile_number_field (str): Tile Number field.
        product_discriminator (str): Product Discriminator.
        name (str): Base name of the product.
        folder (str): Path to the product folder.
        datetime (datetime): Acquisition datetime.
        metadata_msi (str): Path to the MSI metadata file.
        out_res (int): Output resolution in meters.
        bands (List[str]): List of bands to read.
        dims (Tuple[str]): Names of the dimensions ("band", "y", "x").
        fill_value_default (int): Default fill value (typically 0).
        band_check (str): Band used as template for reading.
        granule_readers (Dict[str, RasterioReader]): Dictionary of readers for each band.
        window_focus (rasterio.windows.Window): Current window focus.
        transform: Affine transform for the window.
        crs: Coordinate reference system.
        shape: Shape of the data (bands, height, width).
        bounds: Bounds of the window.
        res: Resolution of the data.

    """

    def __init__(
        self,
        s2folder: str,
        polygon: Optional[Polygon] = None,
        granules: Optional[Dict[str, str]] = None,
        out_res: int = 10,
        window_focus: Optional[rasterio.windows.Window] = None,
        bands: Optional[List[str]] = None,
        metadata_msi: Optional[str] = None,
    ):
        """
        Sentinel-2 image reader class.

        Args:
            s2folder: name of the SAFE product expects name
            polygon: in CRS EPSG:4326
            granules: dictionary with granule name and path
            out_res: output resolution in meters one of 10, 20, 60 (default 10)
            window_focus: rasterio window to read. All reads will be based on this window
            bands: list of bands to read. If None all bands are read.
            metadata_msi: path to metadata file. If None it is assumed to be in the SAFE folder

        """
        (
            self.mission,
            self.producttype,
            sensing_date_str,
            self.pdgs,
            self.relorbitnum,
            self.tile_number_field,
            self.product_discriminator,
        ) = s2_name_split(s2folder)

        # Remove last trailing slash
        s2folder = (
            s2folder[:-1]
            if (s2folder.endswith("/") or s2folder.endswith("\\"))
            else s2folder
        )
        self.name = os.path.basename(os.path.splitext(s2folder)[0])

        self.folder = s2folder
        self.datetime = datetime.datetime.strptime(
            sensing_date_str, "%Y%m%dT%H%M%S"
        ).replace(tzinfo=datetime.timezone.utc)

        info_granules_metadata = None

        if metadata_msi is None:
            info_granules_metadata = _get_info_granules_metadata(self.folder)
            if info_granules_metadata is not None:
                self.metadata_msi = info_granules_metadata["metadata_msi"]
                if "metadata_tl" in info_granules_metadata:
                    self.metadata_tl = info_granules_metadata["metadata_tl"]
            else:
                self.metadata_msi = os.path.join(
                    self.folder, f"MTD_{self.producttype}.xml"
                ).replace("\\", "/")

        else:
            self.metadata_msi = metadata_msi

        out_res = int(out_res)

        # TODO increase possible out_res to powers of 2 of 10 meters and 60 meters
        # rst = rasterio.open('gs://gcp-public-data-sentinel-2/tiles/49/S/GV/S2B_MSIL1C_20220527T030539_N0400_R075_T49SGV_20220527T051042.SAFE/GRANULE/L1C_T49SGV_A027271_20220527T031740/IMG_DATA/T49SGV_20220527T030539_B02.jp2')
        # rst.overviews(1) -> [2, 4, 8, 16]
        assert out_res in {10, 20, 60}, "Not valid output resolution.Choose 10, 20, 60"

        # Default resolution to read
        self.out_res = out_res

        if bands is None:
            if self.producttype == "MSIL2A":
                self.bands = list(BANDS_S2_L2A)
            else:
                self.bands = list(BANDS_S2)
        else:
            self.bands = normalize_band_names(bands)

        self.dims = ("band", "y", "x")
        self.fill_value_default = 0

        # Select the band that will be used as template when reading
        self.band_check = None
        for band in self.bands:
            if BANDS_RESOLUTION[band] == self.out_res:
                self.band_check = band
                break

        assert (
            self.band_check is not None
        ), f"Not band found of resolution {self.out_res} in {self.bands}"

        # This dict will be filled by the _get_reader function
        self.granule_readers: Dict[str, RasterioReader] = {}
        self.window_focus = window_focus
        self.root_metadata_msi = None
        self._radio_add_offsets = None
        self._solar_irradiance = None
        self._scale_factor_U = None
        self._quantification_value = None

        # The code below could be only triggered if required
        if not granules:
            # This is useful when copying with cache_product_to_local_dir func
            if info_granules_metadata is None:
                info_granules_metadata = _get_info_granules_metadata(self.folder)

            if info_granules_metadata is not None:
                self.granules = info_granules_metadata["granules"]

            else:
                self.load_metadata_msi()
                bands_elms = self.root_metadata_msi.findall(".//IMAGE_FILE")
                all_granules = [
                    os.path.join(self.folder, b.text + ".jp2").replace("\\", "/")
                    for b in bands_elms
                ]
                if self.producttype == "MSIL2A":
                    self.granules = {j.split("_")[-2]: j for j in all_granules}
                else:
                    self.granules = {
                        j.split("_")[-1].replace(".jp2", ""): j for j in all_granules
                    }
        else:
            self.granules = granules

        self._pol = polygon
        if self._pol is not None:
            self._pol_crs = window_utils.polygon_to_crs(
                self._pol, "EPSG:4326", self.crs
            )
        else:
            self._pol_crs = None

    def cache_product_to_local_dir(
        self,
        path_dest: Optional[str] = None,
        print_progress: bool = True,
        format_bands: Optional[str] = None,
    ) -> "__class__":
        """
        Copy the product to a local directory and return a new instance of the class with the new path

        Args:
            path_dest: path to the destination folder. If None, the current folder ()".") is used
            print_progress: print progress bar. Default True
            format_bands: format of the bands. Default None (keep original format). Options: "COG", "GeoTIFF"

        Returns:
            A new instance of the class pointing to the new path
        """
        if path_dest is None:
            path_dest = "."

        if format_bands is not None:
            assert format_bands in {
                "COG",
                "GeoTIFF",
            }, "Not valid format_bands. Choose 'COG' or 'GeoTIFF'"

        name_with_safe = f"{self.name}.SAFE"
        dest_folder = os.path.join(path_dest, name_with_safe)

        # Copy metadata
        metadata_filename = os.path.basename(self.metadata_msi)
        metadata_output_path = os.path.join(dest_folder, metadata_filename)
        if not os.path.exists(metadata_output_path):
            os.makedirs(dest_folder, exist_ok=True)
            self.load_metadata_msi()
            ET.ElementTree(self.root_metadata_msi).write(metadata_output_path)
            root_metadata_msi = self.root_metadata_msi
        else:
            root_metadata_msi = read_xml(metadata_output_path)

        bands_elms = root_metadata_msi.findall(".//IMAGE_FILE")
        if self.producttype == "MSIL2A":
            granules_name_metadata = {b.text.split("_")[-2]: b.text for b in bands_elms}
        else:
            granules_name_metadata = {b.text.split("_")[-1]: b.text for b in bands_elms}

        new_granules = {}
        with tqdm(total=len(self.bands), disable=not print_progress) as pbar:
            for b in self.bands:
                granule = self.granules[b]
                ext_origin = os.path.splitext(granule)[1]

                if format_bands is not None:
                    if ext_origin.startswith(".tif"):
                        convert = False
                    else:
                        convert = True

                    ext_dst = ".tif"
                else:
                    convert = False
                    ext_dst = ext_origin

                namefile = os.path.splitext(granules_name_metadata[b])[0]
                new_granules[b] = namefile + ext_dst
                new_granules_path = os.path.join(dest_folder, new_granules[b])
                if not os.path.exists(new_granules_path):
                    new_granules_path_tmp = os.path.join(
                        dest_folder, namefile + ext_origin
                    )
                    pbar.set_description(
                        f"Donwloading band {b} from {granule} to {new_granules_path}"
                    )
                    dir_granules_path = os.path.dirname(new_granules_path)
                    os.makedirs(dir_granules_path, exist_ok=True)
                    get_file(granule, new_granules_path_tmp)
                    if convert:
                        image = RasterioReader(new_granules_path_tmp).load().squeeze()
                        if format_bands == "COG":
                            save_cog(image, new_granules_path, descriptions=[b])
                        elif format_bands == "GeoTIFF":
                            save_tiled_geotiff(
                                image, new_granules_path, descriptions=[b]
                            )
                        else:
                            raise NotImplementedError(f"Not implemented {format_bands}")
                        os.remove(new_granules_path_tmp)

                pbar.update(1)

        # Save granules for fast reading
        granules_path = os.path.join(dest_folder, "granules.json").replace("\\", "/")
        if not os.path.exists(granules_path):
            with open(granules_path, "w") as fh:
                json.dump(
                    {"granules": new_granules, "metadata_msi": metadata_filename}, fh
                )

        new_granules_full_path = {
            k: os.path.join(dest_folder, v) for k, v in new_granules.items()
        }

        obj = s2loader(
            s2folder=dest_folder,
            out_res=self.out_res,
            window_focus=self.window_focus,
            bands=self.bands,
            granules=new_granules_full_path,
            polygon=self._pol,
            metadata_msi=metadata_output_path,
        )
        obj.root_metadata_msi = root_metadata_msi
        return obj

    def DN_to_radiance(self, dn_data: Optional[GeoTensor] = None) -> GeoTensor:
        return DN_to_radiance(self, dn_data)

    def load_metadata_msi(self) -> ET.Element:
        if self.root_metadata_msi is None:
            self.root_metadata_msi = read_xml(self.metadata_msi)
        return self.root_metadata_msi

    def footprint(self, crs: Optional[str] = None) -> Polygon:
        if self._pol_crs is None:
            self.load_metadata_msi()
            footprint_txt = self.root_metadata_msi.findall(".//EXT_POS_LIST")[0].text
            coords_split = footprint_txt.split(" ")[:-1]
            self._pol = Polygon(
                [
                    (float(lngstr), float(latstr))
                    for latstr, lngstr in zip(coords_split[::2], coords_split[1::2])
                ]
            )
            self._pol_crs = window_utils.polygon_to_crs(
                self._pol, "EPSG:4326", self.crs
            )

        pol_window = window_utils.window_polygon(
            self._get_reader().window_focus, self.transform
        )

        pol = self._pol_crs.intersection(pol_window)

        if (crs is None) or window_utils.compare_crs(self.crs, crs):
            return pol

        return window_utils.polygon_to_crs(pol, self.crs, crs)

    def radio_add_offsets(self) -> Dict[str, float]:
        if self._radio_add_offsets is None:
            self.load_metadata_msi()
            radio_add_offsets = self.root_metadata_msi.findall(".//RADIO_ADD_OFFSET")
            if len(radio_add_offsets) == 0:
                self._radio_add_offsets = {b: 0 for b in BANDS_S2}
            else:
                self._radio_add_offsets = {
                    BANDS_S2[int(r.attrib["band_id"])]: int(r.text)
                    for r in radio_add_offsets
                }

        return self._radio_add_offsets

    def solar_irradiance(self) -> Dict[str, float]:
        """
        Returns solar irradiance per nanometer: W/m²/nm

        Reads solar irradiance from metadata_msi:
            <SOLAR_IRRADIANCE bandId="0" unit="W/m²/µm">1874.3</SOLAR_IRRADIANCE>
        """
        if self._solar_irradiance is None:
            self.load_metadata_msi()
            sr = self.root_metadata_msi.findall(".//SOLAR_IRRADIANCE")
            self._solar_irradiance = {
                BANDS_S2[int(r.attrib["bandId"])]: float(r.text) / 1_000 for r in sr
            }

        return self._solar_irradiance

    def scale_factor_U(self) -> float:
        if self._scale_factor_U is None:
            self.load_metadata_msi()
            self._scale_factor_U = float(self.root_metadata_msi.find(".//U").text)

        return self._scale_factor_U

    def quantification_value(self) -> int:
        """Returns the quantification value stored in the metadata msi file (this is always: 10_000)"""
        if self._quantification_value is None:
            self.load_metadata_msi()
            self._quantification_value = int(
                self.root_metadata_msi.find(".//QUANTIFICATION_VALUE").text
            )

        return self._quantification_value

    def get_reader(
        self, band_names: Union[str, List[str]], overview_level: Optional[int] = None
    ) -> RasterioReader:
        """
        Provides a RasterioReader object to read all the bands at the same resolution

        Args:
            band_names: List of band names or band. raises assertion error if bands have different resolution.
            overview_level: level of the pyramid to read (same as in rasterio)

        Returns:
            RasterioReader

        """
        if isinstance(band_names, str):
            band_names = [band_names]

        band_names = normalize_band_names(band_names)

        assert all(
            BANDS_RESOLUTION[band_names[0]] == BANDS_RESOLUTION[b] for b in band_names
        ), f"Bands: {band_names} have different resolution"

        reader = RasterioReader(
            [self.granules[band_name] for band_name in band_names],
            window_focus=None,
            stack=False,
            fill_value_default=self.fill_value_default,
            overview_level=overview_level,
        )
        window_in = read.window_from_bounds(reader, self.bounds)
        window_in_rounded = read.round_outer_window(window_in)
        reader.set_window(window_in_rounded)
        return reader

    def _get_reader(self, band_name: Optional[str] = None) -> RasterioReader:
        if band_name is None:
            band_name = self.band_check

        if band_name not in self.granule_readers:
            # TODO handle different out_res than 10, 20, 60?
            if self.out_res == BANDS_RESOLUTION[band_name]:
                overview_level = None
                has_out_res = True
            elif self.out_res == BANDS_RESOLUTION[band_name] * 2:
                # out_res == 20 and BANDS_RESOLUTION[band_name]==10 -> read from first overview
                overview_level = 0
                has_out_res = True
            elif self.out_res > BANDS_RESOLUTION[band_name]:
                # out_res 60 and BANDS_RESOLUTION[band_name] == 10 or BANDS_RESOLUTION[band_name] == 20
                overview_level = 1 if BANDS_RESOLUTION[band_name] == 10 else 0
                has_out_res = False
            else:
                overview_level = None
                has_out_res = False

            # figure out which window_focus to set

            if band_name == self.band_check:
                window_focus = self.window_focus
                set_window_after = False
            elif has_out_res:
                window_focus = self.window_focus
                set_window_after = False
            else:
                set_window_after = True
                window_focus = None

            self.granule_readers[band_name] = RasterioReader(
                self.granules[band_name],
                window_focus=window_focus,
                fill_value_default=self.fill_value_default,
                overview_level=overview_level,
            )
            if set_window_after:
                window_in = read.window_from_bounds(
                    self.granule_readers[band_name], self.bounds
                )
                window_in_rounded = read.round_outer_window(window_in)
                self.granule_readers[band_name].set_window(window_in_rounded)

        return self.granule_readers[band_name]

    @property
    def dtype(self):
        # This is always np.uint16
        reader_band_check = self._get_reader()
        return reader_band_check.dtype

    @property
    def shape(self):
        reader_band_check = self._get_reader()
        return (len(self.bands),) + reader_band_check.shape[-2:]

    @property
    def transform(self):
        reader_band_check = self._get_reader()
        return reader_band_check.transform

    @property
    def crs(self):
        reader_band_check = self._get_reader()
        return reader_band_check.crs

    @property
    def bounds(self):
        reader_band_check = self._get_reader()
        return reader_band_check.bounds

    @property
    def res(self) -> Tuple[float, float]:
        reader_band_check = self._get_reader()
        return reader_band_check.res

    def __str__(self):
        return self.folder

    def __repr__(self) -> str:
        return f""" 
         {self.folder}
         Transform: {self.transform}
         Shape: {self.shape}
         Resolution: {self.res}
         Bounds: {self.bounds}
         CRS: {self.crs}
         bands: {self.bands}
         fill_value_default: {self.fill_value_default}
        """

    def read_from_band_names(self, band_names: List[str]) -> "__class__":
        """
        Read from band names

        Args:
            band_names: List of band names

        Returns:
            Copy of current object with band names set to band_names
        """
        s2obj = s2loader(
            s2folder=self.folder,
            out_res=self.out_res,
            window_focus=self.window_focus,
            bands=band_names,
            granules=self.granules,
            polygon=self._pol,
            metadata_msi=self.metadata_msi,
        )
        s2obj.root_metadata_msi = self.root_metadata_msi
        return s2obj

    def read_from_window(
        self, window: rasterio.windows.Window, boundless: bool = True
    ) -> "__class__":
        # return GeoTensor(values=self.values, transform=self.transform, crs=self.crs)

        reader_ref = self._get_reader()
        rasterio_reader_ref = reader_ref.read_from_window(
            window=window, boundless=boundless
        )
        s2obj = s2loader(
            s2folder=self.folder,
            out_res=self.out_res,
            window_focus=rasterio_reader_ref.window_focus,
            bands=self.bands,
            granules=self.granules,
            polygon=self._pol,
            metadata_msi=self.metadata_msi,
        )
        # Set band check to avoid re-reading
        s2obj.granule_readers[self.band_check] = rasterio_reader_ref
        s2obj.band_check = self.band_check

        s2obj.root_metadata_msi = self.root_metadata_msi

        return s2obj

    def load(self, boundless: bool = True) -> GeoTensor:
        reader_ref = self._get_reader()
        geotensor_ref = reader_ref.load(boundless=boundless)

        array_out = np.full(
            (len(self.bands),) + geotensor_ref.shape[-2:],
            fill_value=geotensor_ref.fill_value_default,
            dtype=np.int32,
        )

        # Deal with NODATA values
        invalids = (geotensor_ref.values == 0) | (geotensor_ref.values == (2**16) - 1)

        radio_add = self.radio_add_offsets()
        for idx, b in enumerate(self.bands):
            if b == self.band_check:

                # Avoid bug of band names without zero before
                if len(b) == 2:
                    b = f"B0{b[-1]}"

                geotensor_iter = geotensor_ref
            else:
                reader_iter = self._get_reader(b)
                if (
                    np.mean(
                        np.abs(np.array(reader_iter.res) - np.array(geotensor_ref.res))
                    )
                    < 1e-6
                ):
                    geotensor_iter = reader_iter.load(boundless=boundless)
                else:
                    geotensor_iter = read.read_reproject_like(
                        reader_iter, geotensor_ref
                    )

            # Important: Adds radio correction! otherwise images after 2022-01-25 shifted (PROCESSING_BASELINE '04.00' or above)
            array_out[idx] = geotensor_iter.values[0].astype(np.int32) + radio_add[b]

        array_out[:, invalids[0]] = self.fill_value_default

        if np.any(array_out < 0):
            raise ValueError("Negative values found in the image")

        array_out = array_out.astype(np.uint16)

        return GeoTensor(
            values=array_out,
            transform=geotensor_ref.transform,
            crs=geotensor_ref.crs,
            fill_value_default=self.fill_value_default,
        )

    @property
    def values(self) -> np.ndarray:
        return self.load().values

    def load_mask(self) -> GeoTensor:
        reader_ref = self._get_reader()
        geotensor_ref = reader_ref.load(boundless=True)
        # Boolean mask from an integer band: build a new GeoTensor rather than
        # assigning into .values (which forbids dtype changes in place).
        mask = (geotensor_ref.values == 0) | (geotensor_ref.values == (2**16) - 1)
        return GeoTensor(mask, transform=geotensor_ref.transform,
                         crs=geotensor_ref.crs,
                         fill_value_default=geotensor_ref.fill_value_default)

`init(s2folder, polygon=None, granules=None, out_res=10, window_focus=None, bands=None, metadata_msi=None)` ¶

Sentinel-2 image reader class.

Parameters:

Name	Type	Description	Default
`s2folder`	`str`	name of the SAFE product expects name	required
`polygon`	`Optional[Polygon]`	in CRS EPSG:4326	`None`
`granules`	`Optional[Dict[str, str]]`	dictionary with granule name and path	`None`
`out_res`	`int`	output resolution in meters one of 10, 20, 60 (default 10)	`10`
`window_focus`	`Optional[Window]`	rasterio window to read. All reads will be based on this window	`None`
`bands`	`Optional[List[str]]`	list of bands to read. If None all bands are read.	`None`
`metadata_msi`	`Optional[str]`	path to metadata file. If None it is assumed to be in the SAFE folder	`None`

Source code in georeader/readers/S2_SAFE_reader.py

def __init__(
    self,
    s2folder: str,
    polygon: Optional[Polygon] = None,
    granules: Optional[Dict[str, str]] = None,
    out_res: int = 10,
    window_focus: Optional[rasterio.windows.Window] = None,
    bands: Optional[List[str]] = None,
    metadata_msi: Optional[str] = None,
):
    """
    Sentinel-2 image reader class.

    Args:
        s2folder: name of the SAFE product expects name
        polygon: in CRS EPSG:4326
        granules: dictionary with granule name and path
        out_res: output resolution in meters one of 10, 20, 60 (default 10)
        window_focus: rasterio window to read. All reads will be based on this window
        bands: list of bands to read. If None all bands are read.
        metadata_msi: path to metadata file. If None it is assumed to be in the SAFE folder

    """
    (
        self.mission,
        self.producttype,
        sensing_date_str,
        self.pdgs,
        self.relorbitnum,
        self.tile_number_field,
        self.product_discriminator,
    ) = s2_name_split(s2folder)

    # Remove last trailing slash
    s2folder = (
        s2folder[:-1]
        if (s2folder.endswith("/") or s2folder.endswith("\\"))
        else s2folder
    )
    self.name = os.path.basename(os.path.splitext(s2folder)[0])

    self.folder = s2folder
    self.datetime = datetime.datetime.strptime(
        sensing_date_str, "%Y%m%dT%H%M%S"
    ).replace(tzinfo=datetime.timezone.utc)

    info_granules_metadata = None

    if metadata_msi is None:
        info_granules_metadata = _get_info_granules_metadata(self.folder)
        if info_granules_metadata is not None:
            self.metadata_msi = info_granules_metadata["metadata_msi"]
            if "metadata_tl" in info_granules_metadata:
                self.metadata_tl = info_granules_metadata["metadata_tl"]
        else:
            self.metadata_msi = os.path.join(
                self.folder, f"MTD_{self.producttype}.xml"
            ).replace("\\", "/")

    else:
        self.metadata_msi = metadata_msi

    out_res = int(out_res)

    # TODO increase possible out_res to powers of 2 of 10 meters and 60 meters
    # rst = rasterio.open('gs://gcp-public-data-sentinel-2/tiles/49/S/GV/S2B_MSIL1C_20220527T030539_N0400_R075_T49SGV_20220527T051042.SAFE/GRANULE/L1C_T49SGV_A027271_20220527T031740/IMG_DATA/T49SGV_20220527T030539_B02.jp2')
    # rst.overviews(1) -> [2, 4, 8, 16]
    assert out_res in {10, 20, 60}, "Not valid output resolution.Choose 10, 20, 60"

    # Default resolution to read
    self.out_res = out_res

    if bands is None:
        if self.producttype == "MSIL2A":
            self.bands = list(BANDS_S2_L2A)
        else:
            self.bands = list(BANDS_S2)
    else:
        self.bands = normalize_band_names(bands)

    self.dims = ("band", "y", "x")
    self.fill_value_default = 0

    # Select the band that will be used as template when reading
    self.band_check = None
    for band in self.bands:
        if BANDS_RESOLUTION[band] == self.out_res:
            self.band_check = band
            break

    assert (
        self.band_check is not None
    ), f"Not band found of resolution {self.out_res} in {self.bands}"

    # This dict will be filled by the _get_reader function
    self.granule_readers: Dict[str, RasterioReader] = {}
    self.window_focus = window_focus
    self.root_metadata_msi = None
    self._radio_add_offsets = None
    self._solar_irradiance = None
    self._scale_factor_U = None
    self._quantification_value = None

    # The code below could be only triggered if required
    if not granules:
        # This is useful when copying with cache_product_to_local_dir func
        if info_granules_metadata is None:
            info_granules_metadata = _get_info_granules_metadata(self.folder)

        if info_granules_metadata is not None:
            self.granules = info_granules_metadata["granules"]

        else:
            self.load_metadata_msi()
            bands_elms = self.root_metadata_msi.findall(".//IMAGE_FILE")
            all_granules = [
                os.path.join(self.folder, b.text + ".jp2").replace("\\", "/")
                for b in bands_elms
            ]
            if self.producttype == "MSIL2A":
                self.granules = {j.split("_")[-2]: j for j in all_granules}
            else:
                self.granules = {
                    j.split("_")[-1].replace(".jp2", ""): j for j in all_granules
                }
    else:
        self.granules = granules

    self._pol = polygon
    if self._pol is not None:
        self._pol_crs = window_utils.polygon_to_crs(
            self._pol, "EPSG:4326", self.crs
        )
    else:
        self._pol_crs = None

`cache_product_to_local_dir(path_dest=None, print_progress=True, format_bands=None)` ¶

Copy the product to a local directory and return a new instance of the class with the new path

Parameters:

Name	Type	Description	Default
`path_dest`	`Optional[str]`	path to the destination folder. If None, the current folder ()".") is used	`None`
`print_progress`	`bool`	print progress bar. Default True	`True`
`format_bands`	`Optional[str]`	format of the bands. Default None (keep original format). Options: "COG", "GeoTIFF"	`None`

Returns:

Type	Description
`__class__`	A new instance of the class pointing to the new path

Source code in georeader/readers/S2_SAFE_reader.py

def cache_product_to_local_dir(
    self,
    path_dest: Optional[str] = None,
    print_progress: bool = True,
    format_bands: Optional[str] = None,
) -> "__class__":
    """
    Copy the product to a local directory and return a new instance of the class with the new path

    Args:
        path_dest: path to the destination folder. If None, the current folder ()".") is used
        print_progress: print progress bar. Default True
        format_bands: format of the bands. Default None (keep original format). Options: "COG", "GeoTIFF"

    Returns:
        A new instance of the class pointing to the new path
    """
    if path_dest is None:
        path_dest = "."

    if format_bands is not None:
        assert format_bands in {
            "COG",
            "GeoTIFF",
        }, "Not valid format_bands. Choose 'COG' or 'GeoTIFF'"

    name_with_safe = f"{self.name}.SAFE"
    dest_folder = os.path.join(path_dest, name_with_safe)

    # Copy metadata
    metadata_filename = os.path.basename(self.metadata_msi)
    metadata_output_path = os.path.join(dest_folder, metadata_filename)
    if not os.path.exists(metadata_output_path):
        os.makedirs(dest_folder, exist_ok=True)
        self.load_metadata_msi()
        ET.ElementTree(self.root_metadata_msi).write(metadata_output_path)
        root_metadata_msi = self.root_metadata_msi
    else:
        root_metadata_msi = read_xml(metadata_output_path)

    bands_elms = root_metadata_msi.findall(".//IMAGE_FILE")
    if self.producttype == "MSIL2A":
        granules_name_metadata = {b.text.split("_")[-2]: b.text for b in bands_elms}
    else:
        granules_name_metadata = {b.text.split("_")[-1]: b.text for b in bands_elms}

    new_granules = {}
    with tqdm(total=len(self.bands), disable=not print_progress) as pbar:
        for b in self.bands:
            granule = self.granules[b]
            ext_origin = os.path.splitext(granule)[1]

            if format_bands is not None:
                if ext_origin.startswith(".tif"):
                    convert = False
                else:
                    convert = True

                ext_dst = ".tif"
            else:
                convert = False
                ext_dst = ext_origin

            namefile = os.path.splitext(granules_name_metadata[b])[0]
            new_granules[b] = namefile + ext_dst
            new_granules_path = os.path.join(dest_folder, new_granules[b])
            if not os.path.exists(new_granules_path):
                new_granules_path_tmp = os.path.join(
                    dest_folder, namefile + ext_origin
                )
                pbar.set_description(
                    f"Donwloading band {b} from {granule} to {new_granules_path}"
                )
                dir_granules_path = os.path.dirname(new_granules_path)
                os.makedirs(dir_granules_path, exist_ok=True)
                get_file(granule, new_granules_path_tmp)
                if convert:
                    image = RasterioReader(new_granules_path_tmp).load().squeeze()
                    if format_bands == "COG":
                        save_cog(image, new_granules_path, descriptions=[b])
                    elif format_bands == "GeoTIFF":
                        save_tiled_geotiff(
                            image, new_granules_path, descriptions=[b]
                        )
                    else:
                        raise NotImplementedError(f"Not implemented {format_bands}")
                    os.remove(new_granules_path_tmp)

            pbar.update(1)

    # Save granules for fast reading
    granules_path = os.path.join(dest_folder, "granules.json").replace("\\", "/")
    if not os.path.exists(granules_path):
        with open(granules_path, "w") as fh:
            json.dump(
                {"granules": new_granules, "metadata_msi": metadata_filename}, fh
            )

    new_granules_full_path = {
        k: os.path.join(dest_folder, v) for k, v in new_granules.items()
    }

    obj = s2loader(
        s2folder=dest_folder,
        out_res=self.out_res,
        window_focus=self.window_focus,
        bands=self.bands,
        granules=new_granules_full_path,
        polygon=self._pol,
        metadata_msi=metadata_output_path,
    )
    obj.root_metadata_msi = root_metadata_msi
    return obj

`get_reader(band_names, overview_level=None)` ¶

Provides a RasterioReader object to read all the bands at the same resolution

Parameters:

Name	Type	Description	Default
`band_names`	`Union[str, List[str]]`	List of band names or band. raises assertion error if bands have different resolution.	required
`overview_level`	`Optional[int]`	level of the pyramid to read (same as in rasterio)	`None`

Returns:

Type	Description
`RasterioReader`	RasterioReader

Source code in georeader/readers/S2_SAFE_reader.py

def get_reader(
    self, band_names: Union[str, List[str]], overview_level: Optional[int] = None
) -> RasterioReader:
    """
    Provides a RasterioReader object to read all the bands at the same resolution

    Args:
        band_names: List of band names or band. raises assertion error if bands have different resolution.
        overview_level: level of the pyramid to read (same as in rasterio)

    Returns:
        RasterioReader

    """
    if isinstance(band_names, str):
        band_names = [band_names]

    band_names = normalize_band_names(band_names)

    assert all(
        BANDS_RESOLUTION[band_names[0]] == BANDS_RESOLUTION[b] for b in band_names
    ), f"Bands: {band_names} have different resolution"

    reader = RasterioReader(
        [self.granules[band_name] for band_name in band_names],
        window_focus=None,
        stack=False,
        fill_value_default=self.fill_value_default,
        overview_level=overview_level,
    )
    window_in = read.window_from_bounds(reader, self.bounds)
    window_in_rounded = read.round_outer_window(window_in)
    reader.set_window(window_in_rounded)
    return reader

`quantification_value()` ¶

Returns the quantification value stored in the metadata msi file (this is always: 10_000)

Source code in georeader/readers/S2_SAFE_reader.py

def quantification_value(self) -> int:
    """Returns the quantification value stored in the metadata msi file (this is always: 10_000)"""
    if self._quantification_value is None:
        self.load_metadata_msi()
        self._quantification_value = int(
            self.root_metadata_msi.find(".//QUANTIFICATION_VALUE").text
        )

    return self._quantification_value

`read_from_band_names(band_names)` ¶

Read from band names

Parameters:

Name	Type	Description	Default
`band_names`	`List[str]`	List of band names	required

Returns:

Type	Description
`__class__`	Copy of current object with band names set to band_names

Source code in georeader/readers/S2_SAFE_reader.py

def read_from_band_names(self, band_names: List[str]) -> "__class__":
    """
    Read from band names

    Args:
        band_names: List of band names

    Returns:
        Copy of current object with band names set to band_names
    """
    s2obj = s2loader(
        s2folder=self.folder,
        out_res=self.out_res,
        window_focus=self.window_focus,
        bands=band_names,
        granules=self.granules,
        polygon=self._pol,
        metadata_msi=self.metadata_msi,
    )
    s2obj.root_metadata_msi = self.root_metadata_msi
    return s2obj

`solar_irradiance()` ¶

Returns solar irradiance per nanometer: W/m²/nm

Reads solar irradiance from metadata_msi

1874.3

Source code in georeader/readers/S2_SAFE_reader.py

def solar_irradiance(self) -> Dict[str, float]:
    """
    Returns solar irradiance per nanometer: W/m²/nm

    Reads solar irradiance from metadata_msi:
        <SOLAR_IRRADIANCE bandId="0" unit="W/m²/µm">1874.3</SOLAR_IRRADIANCE>
    """
    if self._solar_irradiance is None:
        self.load_metadata_msi()
        sr = self.root_metadata_msi.findall(".//SOLAR_IRRADIANCE")
        self._solar_irradiance = {
            BANDS_S2[int(r.attrib["bandId"])]: float(r.text) / 1_000 for r in sr
        }

    return self._solar_irradiance

`S2ImageL1C` ¶

Bases: S2Image

Sentinel-2 Level 1C (top of atmosphere reflectance) image reader.

This class extends the base S2Image class to handle Sentinel-2 Level 1C products, which provide calibrated and orthorectified top of atmosphere reflectance data. It also provides methods to access viewing and solar angle information.

Parameters:

Name	Type	Description	Default
`s2folder`	`str`	Path to the Sentinel-2 SAFE product folder.	required
`granules`	`Dict[str, str]`	Dictionary mapping band names to file paths.	required
`polygon`	`Polygon`	Polygon defining the area of interest in EPSG:4326.	required
`out_res`	`int`	Output resolution in meters. Must be one of 10, 20, or 60. Defaults to 10.	`10`
`window_focus`	`Optional[Window]`	Window to focus on a specific region of the image. Defaults to None (entire image).	`None`
`bands`	`Optional[List[str]]`	List of bands to read. If None, all available bands will be loaded.	`None`
`metadata_msi`	`Optional[str]`	Path to metadata file. If None, it is assumed to be in the SAFE folder.	`None`

Attributes:

Name	Type	Description
`Additional`	`to S2Image attributes`
`granule_folder`	`str`	Path to the granule folder.
`msk_clouds_file`	`str`	Path to the cloud mask file.
`metadata_tl`	`str`	Path to the TL metadata file.
`root_metadata_tl`		Root element of the TL metadata XML.
`tileId`	`str`	Tile identifier.
`satId`	`str`	Satellite identifier.
`procLevel`	`str`	Processing level.
`dimsByRes`	`Dict`	Dimensions by resolution.
`ulxyByRes`	`Dict`	Upper-left coordinates by resolution.
`tileAnglesNode`	`Dict`	Tile angles node from metadata.
`mean_sza`	`float`	Mean solar zenith angle.
`mean_saa`	`float`	Mean solar azimuth angle.
`mean_vza`	`Dict[str, float]`	Mean viewing zenith angle per band.
`mean_vaa`	`Dict[str, float]`	Mean viewing azimuth angle per band.
`vaa`	`Dict[str, GeoTensor]`	Viewing azimuth angle as GeoTensor per band.
`vza`	`Dict[str, GeoTensor]`	Viewing zenith angle as GeoTensor per band.
`saa`	`GeoTensor`	Solar azimuth angle as GeoTensor.
`sza`	`GeoTensor`	Solar zenith angle as GeoTensor.
`anglesULXY`	`Tuple[float, float]`	Upper-left coordinates of the angle grids.

Examples:

>>> # Initialize the S2ImageL1C reader with a data path
>>> s2_l1c = S2ImageL1C('/path/to/S2A_MSIL1C_20170717T235959_N0205_R072_T01WCP_20170718T000256.SAFE',
...                     granules=granules_dict, polygon=aoi_polygon)
>>> # Load all bands
>>> l1c_data = s2_l1c.load()
>>> # Read angle information
>>> s2_l1c.read_metadata_tl()
>>> solar_zenith = s2_l1c.sza
>>> # Convert to radiance
>>> radiance_data = s2_l1c.DN_to_radiance()

Source code in georeader/readers/S2_SAFE_reader.py

class S2ImageL1C(S2Image):
    """
    Sentinel-2 Level 1C (top of atmosphere reflectance) image reader.

    This class extends the base S2Image class to handle Sentinel-2 Level 1C products,
    which provide calibrated and orthorectified top of atmosphere reflectance data.
    It also provides methods to access viewing and solar angle information.

    Args:
        s2folder (str): Path to the Sentinel-2 SAFE product folder.
        granules (Dict[str, str]): Dictionary mapping band names to file paths.
        polygon (Polygon): Polygon defining the area of interest in EPSG:4326.
        out_res (int): Output resolution in meters. Must be one of 10, 20, or 60. Defaults to 10.
        window_focus (Optional[rasterio.windows.Window]): Window to focus on a specific
            region of the image. Defaults to None (entire image).
        bands (Optional[List[str]]): List of bands to read. If None, all available bands will be loaded.
        metadata_msi (Optional[str]): Path to metadata file. If None, it is assumed to be
            in the SAFE folder.

    Attributes:
        Additional to S2Image attributes:
        granule_folder (str): Path to the granule folder.
        msk_clouds_file (str): Path to the cloud mask file.
        metadata_tl (str): Path to the TL metadata file.
        root_metadata_tl: Root element of the TL metadata XML.
        tileId (str): Tile identifier.
        satId (str): Satellite identifier.
        procLevel (str): Processing level.
        dimsByRes (Dict): Dimensions by resolution.
        ulxyByRes (Dict): Upper-left coordinates by resolution.
        tileAnglesNode: Tile angles node from metadata.
        mean_sza (float): Mean solar zenith angle.
        mean_saa (float): Mean solar azimuth angle.
        mean_vza (Dict[str, float]): Mean viewing zenith angle per band.
        mean_vaa (Dict[str, float]): Mean viewing azimuth angle per band.
        vaa (Dict[str, GeoTensor]): Viewing azimuth angle as GeoTensor per band.
        vza (Dict[str, GeoTensor]): Viewing zenith angle as GeoTensor per band.
        saa (GeoTensor): Solar azimuth angle as GeoTensor.
        sza (GeoTensor): Solar zenith angle as GeoTensor.
        anglesULXY (Tuple[float, float]): Upper-left coordinates of the angle grids.

    Examples:
        >>> # Initialize the S2ImageL1C reader with a data path
        >>> s2_l1c = S2ImageL1C('/path/to/S2A_MSIL1C_20170717T235959_N0205_R072_T01WCP_20170718T000256.SAFE',
        ...                     granules=granules_dict, polygon=aoi_polygon)
        >>> # Load all bands
        >>> l1c_data = s2_l1c.load()
        >>> # Read angle information
        >>> s2_l1c.read_metadata_tl()
        >>> solar_zenith = s2_l1c.sza
        >>> # Convert to radiance
        >>> radiance_data = s2_l1c.DN_to_radiance()
    """

    def __init__(
        self,
        s2folder,
        granules: Dict[str, str],
        polygon: Polygon,
        out_res: int = 10,
        window_focus: Optional[rasterio.windows.Window] = None,
        bands: Optional[List[str]] = None,
        metadata_msi: Optional[str] = None,
    ):
        super(S2ImageL1C, self).__init__(
            s2folder=s2folder,
            granules=granules,
            polygon=polygon,
            out_res=out_res,
            bands=bands,
            window_focus=window_focus,
            metadata_msi=metadata_msi,
        )

        assert (
            self.producttype == "MSIL1C"
        ), f"Unexpected product type {self.producttype} in image {self.folder}"

        first_granule = self.granules[list(self.granules.keys())[0]]
        self.granule_folder = os.path.dirname(os.path.dirname(first_granule))
        self.msk_clouds_file = os.path.join(
            self.granule_folder, "MSK_CLOUDS_B00.gml"
        ).replace("\\", "/")
        if not hasattr(self, "metadata_tl"):
            self.metadata_tl = os.path.join(self.granule_folder, "MTD_TL.xml").replace(
                "\\", "/"
            )

        self.root_metadata_tl = None

        # Granule in L1C does not include TCI
        # Assert bands in self.granule are ordered as in BANDS_S2
        # assert all(granule[-7:-4] == bname for bname, granule in zip(BANDS_S2, self.granule)), f"some granules are not in the expected order {self.granule}"

    def read_from_window(
        self, window: rasterio.windows.Window, boundless: bool = True
    ) -> "__class__":
        out = super().read_from_window(window, boundless=boundless)

        if self.root_metadata_tl is None:
            return out

        # copy all metadata from the original image
        for atribute in [
            "tileId",
            "root_metadata_tl",
            "satId",
            "procLevel",
            "dimsByRes",
            "ulxyByRes",
            "tileAnglesNode",
            "mean_sza",
            "mean_saa",
            "mean_vza",
            "mean_vaa",
            "vaa",
            "vza",
            "saa",
            "sza",
            "anglesULXY",
        ]:
            setattr(out, atribute, getattr(self, atribute))

        return out

    def cache_product_to_local_dir(
        self,
        path_dest: Optional[str] = None,
        print_progress: bool = True,
        format_bands: Optional[str] = None,
    ) -> "__class__":
        """
        Overrides the parent method to copy the MTD_TL.xml file

        Args:
            path_dest (Optional[str], optional): path to the destination folder. Defaults to None.
            print_progress (bool, optional): whether to print progress. Defaults to True.

        Returns:
            __class__: the cached object
        """
        new_obj = super().cache_product_to_local_dir(
            path_dest=path_dest,
            print_progress=print_progress,
            format_bands=format_bands,
        )

        if os.path.exists(new_obj.metadata_tl):
            # the cached product already exists. returns
            return new_obj

        if self.root_metadata_tl is not None:
            new_obj.root_metadata_tl = self.root_metadata_tl
            ET.ElementTree(new_obj.metadata_tl).write(new_obj.metadata_tl)
            # copy all metadata from the original image
            for atribute in [
                "tileId",
                "root_metadata_tl",
                "satId",
                "procLevel",
                "dimsByRes",
                "ulxyByRes",
                "tileAnglesNode",
                "mean_sza",
                "mean_saa",
                "mean_vza",
                "mean_vaa",
                "vaa",
                "vza",
                "saa",
                "sza",
                "anglesULXY",
            ]:
                if hasattr(self, atribute):
                    setattr(new_obj, atribute, getattr(self, atribute))
        else:
            get_file(self.metadata_tl, new_obj.metadata_tl)

        granule_folder_rel = new_obj.granule_folder.replace("\\", "/").replace(
            new_obj.folder.replace("\\", "/") + "/", ""
        )
        # Add metadata_tl to granules.json
        granules_path = os.path.join(new_obj.folder, "granules.json").replace("\\", "/")
        with open(granules_path, "r") as fh:
            info_granules_metadata = json.load(fh)
        info_granules_metadata["metadata_tl"] = os.path.join(
            granule_folder_rel, "MTD_TL.xml"
        ).replace("\\", "/")
        with open(granules_path, "w") as f:
            json.dump(info_granules_metadata, f)

        return new_obj

    def read_metadata_tl(self):
        """
        Read metadata TILE to parse information about the acquisition and properties of GRANULE bands.

        It populates the following attributes:
            - mean_sza
            - mean_saa
            - mean_vza
            - mean_vaa
            - vaa
            - vza
            - saa
            - sza
            - anglesULXY
            - tileId
            - satId
            - procLevel
            - epsg_code
            - dimsByRes
            - ulxyByRes
            - tileAnglesNode
            - root_metadata_tl

        """
        if self.root_metadata_tl is not None:
            return

        self.root_metadata_tl = read_xml(self.metadata_tl)

        # Stoopid XML namespace prefix
        nsPrefix = self.root_metadata_tl.tag[: self.root_metadata_tl.tag.index("}") + 1]
        nsDict = {"n1": nsPrefix[1:-1]}

        self.mean_sza = float(
            self.root_metadata_tl.find(".//Mean_Sun_Angle/ZENITH_ANGLE").text
        )
        self.mean_saa = float(
            self.root_metadata_tl.find(".//Mean_Sun_Angle/AZIMUTH_ANGLE").text
        )

        generalInfoNode = self.root_metadata_tl.find("n1:General_Info", nsDict)
        # N.B. I am still not entirely convinced that this SENSING_TIME is really
        # the acquisition time, but the documentation is rubbish.
        sensingTimeNode = generalInfoNode.find("SENSING_TIME")
        sensingTimeStr = sensingTimeNode.text.strip()
        # self.datetime = datetime.datetime.strptime(sensingTimeStr, "%Y-%m-%dT%H:%M:%S.%fZ")
        tileIdNode = generalInfoNode.find("TILE_ID")
        tileIdFullStr = tileIdNode.text.strip()
        self.tileId = tileIdFullStr.split("_")[-2]
        self.satId = tileIdFullStr[:3]
        self.procLevel = tileIdFullStr[
            13:16
        ]  # Not sure whether to use absolute pos or split by '_'....

        geomInfoNode = self.root_metadata_tl.find("n1:Geometric_Info", nsDict)
        geocodingNode = geomInfoNode.find("Tile_Geocoding")
        self.epsg_code = geocodingNode.find("HORIZONTAL_CS_CODE").text

        # Dimensions of images at different resolutions.
        self.dimsByRes = {}
        sizeNodeList = geocodingNode.findall("Size")
        for sizeNode in sizeNodeList:
            res = sizeNode.attrib["resolution"]
            nrows = int(sizeNode.find("NROWS").text)
            ncols = int(sizeNode.find("NCOLS").text)
            self.dimsByRes[res] = (nrows, ncols)

        # Upper-left corners of images at different resolutions. As far as I can
        # work out, these coords appear to be the upper left corner of the upper left
        # pixel, i.e. equivalent to GDAL's convention. This also means that they
        # are the same for the different resolutions, which is nice.
        self.ulxyByRes = {}
        posNodeList = geocodingNode.findall("Geoposition")
        for posNode in posNodeList:
            res = posNode.attrib["resolution"]
            ulx = float(posNode.find("ULX").text)
            uly = float(posNode.find("ULY").text)
            self.ulxyByRes[res] = (ulx, uly)

        # Sun and satellite angles.
        # Zenith
        self.tileAnglesNode = geomInfoNode.find("Tile_Angles")
        sunZenithNode = self.tileAnglesNode.find("Sun_Angles_Grid").find("Zenith")
        # <Zenith>
        #  <COL_STEP unit="m">5000</COL_STEP>
        #  <ROW_STEP unit="m">5000</ROW_STEP>
        angleGridXres = float(sunZenithNode.find("COL_STEP").text)
        angleGridYres = float(sunZenithNode.find("ROW_STEP").text)
        sza = self._makeValueArray(sunZenithNode.find("Values_List"))
        mask_nans = np.isnan(sza)
        if np.any(mask_nans):
            from skimage.restoration import inpaint_biharmonic

            sza = inpaint_biharmonic(sza, mask_nans)
        transform_zenith = rasterio.transform.from_origin(
            self.ulxyByRes[str(self.out_res)][0],
            self.ulxyByRes[str(self.out_res)][1],
            angleGridXres,
            angleGridYres,
        )

        self.sza = GeoTensor(sza, transform=transform_zenith, crs=self.epsg_code)

        # Azimuth
        sunAzimuthNode = self.tileAnglesNode.find("Sun_Angles_Grid").find("Azimuth")
        angleGridXres = float(sunAzimuthNode.find("COL_STEP").text)
        angleGridYres = float(sunAzimuthNode.find("ROW_STEP").text)
        saa = self._makeValueArray(sunAzimuthNode.find("Values_List"))
        mask_nans = np.isnan(saa)
        if np.any(mask_nans):
            from skimage.restoration import inpaint_biharmonic

            saa = inpaint_biharmonic(saa, mask_nans)
        transform_azimuth = rasterio.transform.from_origin(
            self.ulxyByRes[str(self.out_res)][0],
            self.ulxyByRes[str(self.out_res)][1],
            angleGridXres,
            angleGridYres,
        )
        self.saa = GeoTensor(saa, transform=transform_azimuth, crs=self.epsg_code)

        # Now build up the viewing angle per grid cell, from the separate layers
        # given for each detector for each band. Initially I am going to keep
        # the bands separate, just to see how that looks.
        # The names of things in the XML suggest that these are view angles,
        # but the numbers suggest that they are angles as seen from the pixel's
        # frame of reference on the ground, i.e. they are in fact what we ultimately want.
        viewingAngleNodeList = self.tileAnglesNode.findall(
            "Viewing_Incidence_Angles_Grids"
        )
        vza = self._buildViewAngleArr(viewingAngleNodeList, "Zenith")
        vaa = self._buildViewAngleArr(viewingAngleNodeList, "Azimuth")

        self.vaa = {}
        for k, varr in vaa.items():
            mask_nans = np.isnan(varr)
            if np.any(mask_nans):
                from skimage.restoration import inpaint_biharmonic

                varr = inpaint_biharmonic(varr, mask_nans)

            self.vaa[k] = GeoTensor(
                varr, transform=transform_azimuth, crs=self.epsg_code
            )

        self.vza = {}
        for k, varr in vza.items():
            mask_nans = np.isnan(varr)
            if np.any(mask_nans):
                from skimage.restoration import inpaint_biharmonic

                varr = inpaint_biharmonic(varr, mask_nans)
            self.vza[k] = GeoTensor(
                varr, transform=transform_zenith, crs=self.epsg_code
            )

        # Make a guess at the coordinates of the angle grids. These are not given
        # explicitly in the XML, and don't line up exactly with the other grids, so I am
        # making a rough estimate. Because the angles don't change rapidly across these
        # distances, it is not important if I am a bit wrong (although it would be nice
        # to be exactly correct!).
        (ulx, uly) = self.ulxyByRes["10"]
        self.anglesULXY = (ulx - angleGridXres / 2.0, uly + angleGridYres / 2.0)

        # Read mean viewing angles for each band.
        self.mean_vaa = {}
        self.mean_vza = {}
        for elm in self.tileAnglesNode.find("Mean_Viewing_Incidence_Angle_List"):
            band_name = BANDS_S2[int(elm.attrib["bandId"])]
            viewing_zenith_angle = float(elm.find("ZENITH_ANGLE").text)
            viewing_azimuth_angle = float(elm.find("AZIMUTH_ANGLE").text)
            self.mean_vza[band_name] = viewing_zenith_angle
            self.mean_vaa[band_name] = viewing_azimuth_angle

    def _buildViewAngleArr(self, viewingAngleNodeList, angleName):
        """
        Build up the named viewing angle array from the various detector strips given as
        separate arrays. I don't really understand this, and may need to re-write it once
        I have worked it out......

        The angleName is one of 'Zenith' or 'Azimuth'.
        Returns a dictionary of 2-d arrays, keyed by the bandId string.
        """
        angleArrDict = {}
        for viewingAngleNode in viewingAngleNodeList:
            band_name = BANDS_S2[int(viewingAngleNode.attrib["bandId"])]
            detectorId = viewingAngleNode.attrib["detectorId"]

            angleNode = viewingAngleNode.find(angleName)
            angleArr = self._makeValueArray(angleNode.find("Values_List"))
            if band_name not in angleArrDict:
                angleArrDict[band_name] = angleArr
            else:
                mask = ~np.isnan(angleArr)
                angleArrDict[band_name][mask] = angleArr[mask]
        return angleArrDict

    @staticmethod
    def _makeValueArray(valuesListNode):
        """
        Take a <Values_List> node from the XML, and return an array of the values contained
        within it. This will be a 2-d numpy array of float32 values (should I pass the dtype in??)

        """
        valuesList = valuesListNode.findall("VALUES")
        vals = []
        for valNode in valuesList:
            text = valNode.text
            vals.append([np.float32(x) for x in text.strip().split()])

        return np.array(vals)

`cache_product_to_local_dir(path_dest=None, print_progress=True, format_bands=None)` ¶

Overrides the parent method to copy the MTD_TL.xml file

Parameters:

Name	Type	Description	Default
`path_dest`	`Optional[str]`	path to the destination folder. Defaults to None.	`None`
`print_progress`	`bool`	whether to print progress. Defaults to True.	`True`

Returns:

Name	Type	Description
`__class__`	`__class__`	the cached object

Source code in georeader/readers/S2_SAFE_reader.py

def cache_product_to_local_dir(
    self,
    path_dest: Optional[str] = None,
    print_progress: bool = True,
    format_bands: Optional[str] = None,
) -> "__class__":
    """
    Overrides the parent method to copy the MTD_TL.xml file

    Args:
        path_dest (Optional[str], optional): path to the destination folder. Defaults to None.
        print_progress (bool, optional): whether to print progress. Defaults to True.

    Returns:
        __class__: the cached object
    """
    new_obj = super().cache_product_to_local_dir(
        path_dest=path_dest,
        print_progress=print_progress,
        format_bands=format_bands,
    )

    if os.path.exists(new_obj.metadata_tl):
        # the cached product already exists. returns
        return new_obj

    if self.root_metadata_tl is not None:
        new_obj.root_metadata_tl = self.root_metadata_tl
        ET.ElementTree(new_obj.metadata_tl).write(new_obj.metadata_tl)
        # copy all metadata from the original image
        for atribute in [
            "tileId",
            "root_metadata_tl",
            "satId",
            "procLevel",
            "dimsByRes",
            "ulxyByRes",
            "tileAnglesNode",
            "mean_sza",
            "mean_saa",
            "mean_vza",
            "mean_vaa",
            "vaa",
            "vza",
            "saa",
            "sza",
            "anglesULXY",
        ]:
            if hasattr(self, atribute):
                setattr(new_obj, atribute, getattr(self, atribute))
    else:
        get_file(self.metadata_tl, new_obj.metadata_tl)

    granule_folder_rel = new_obj.granule_folder.replace("\\", "/").replace(
        new_obj.folder.replace("\\", "/") + "/", ""
    )
    # Add metadata_tl to granules.json
    granules_path = os.path.join(new_obj.folder, "granules.json").replace("\\", "/")
    with open(granules_path, "r") as fh:
        info_granules_metadata = json.load(fh)
    info_granules_metadata["metadata_tl"] = os.path.join(
        granule_folder_rel, "MTD_TL.xml"
    ).replace("\\", "/")
    with open(granules_path, "w") as f:
        json.dump(info_granules_metadata, f)

    return new_obj

`read_metadata_tl()` ¶

Read metadata TILE to parse information about the acquisition and properties of GRANULE bands.

It populates the following attributes

mean_sza
mean_saa
mean_vza
mean_vaa
vaa
vza
saa
sza
anglesULXY
tileId
satId
procLevel
epsg_code
dimsByRes
ulxyByRes
tileAnglesNode
root_metadata_tl

Source code in georeader/readers/S2_SAFE_reader.py

def read_metadata_tl(self):
    """
    Read metadata TILE to parse information about the acquisition and properties of GRANULE bands.

    It populates the following attributes:
        - mean_sza
        - mean_saa
        - mean_vza
        - mean_vaa
        - vaa
        - vza
        - saa
        - sza
        - anglesULXY
        - tileId
        - satId
        - procLevel
        - epsg_code
        - dimsByRes
        - ulxyByRes
        - tileAnglesNode
        - root_metadata_tl

    """
    if self.root_metadata_tl is not None:
        return

    self.root_metadata_tl = read_xml(self.metadata_tl)

    # Stoopid XML namespace prefix
    nsPrefix = self.root_metadata_tl.tag[: self.root_metadata_tl.tag.index("}") + 1]
    nsDict = {"n1": nsPrefix[1:-1]}

    self.mean_sza = float(
        self.root_metadata_tl.find(".//Mean_Sun_Angle/ZENITH_ANGLE").text
    )
    self.mean_saa = float(
        self.root_metadata_tl.find(".//Mean_Sun_Angle/AZIMUTH_ANGLE").text
    )

    generalInfoNode = self.root_metadata_tl.find("n1:General_Info", nsDict)
    # N.B. I am still not entirely convinced that this SENSING_TIME is really
    # the acquisition time, but the documentation is rubbish.
    sensingTimeNode = generalInfoNode.find("SENSING_TIME")
    sensingTimeStr = sensingTimeNode.text.strip()
    # self.datetime = datetime.datetime.strptime(sensingTimeStr, "%Y-%m-%dT%H:%M:%S.%fZ")
    tileIdNode = generalInfoNode.find("TILE_ID")
    tileIdFullStr = tileIdNode.text.strip()
    self.tileId = tileIdFullStr.split("_")[-2]
    self.satId = tileIdFullStr[:3]
    self.procLevel = tileIdFullStr[
        13:16
    ]  # Not sure whether to use absolute pos or split by '_'....

    geomInfoNode = self.root_metadata_tl.find("n1:Geometric_Info", nsDict)
    geocodingNode = geomInfoNode.find("Tile_Geocoding")
    self.epsg_code = geocodingNode.find("HORIZONTAL_CS_CODE").text

    # Dimensions of images at different resolutions.
    self.dimsByRes = {}
    sizeNodeList = geocodingNode.findall("Size")
    for sizeNode in sizeNodeList:
        res = sizeNode.attrib["resolution"]
        nrows = int(sizeNode.find("NROWS").text)
        ncols = int(sizeNode.find("NCOLS").text)
        self.dimsByRes[res] = (nrows, ncols)

    # Upper-left corners of images at different resolutions. As far as I can
    # work out, these coords appear to be the upper left corner of the upper left
    # pixel, i.e. equivalent to GDAL's convention. This also means that they
    # are the same for the different resolutions, which is nice.
    self.ulxyByRes = {}
    posNodeList = geocodingNode.findall("Geoposition")
    for posNode in posNodeList:
        res = posNode.attrib["resolution"]
        ulx = float(posNode.find("ULX").text)
        uly = float(posNode.find("ULY").text)
        self.ulxyByRes[res] = (ulx, uly)

    # Sun and satellite angles.
    # Zenith
    self.tileAnglesNode = geomInfoNode.find("Tile_Angles")
    sunZenithNode = self.tileAnglesNode.find("Sun_Angles_Grid").find("Zenith")
    # <Zenith>
    #  <COL_STEP unit="m">5000</COL_STEP>
    #  <ROW_STEP unit="m">5000</ROW_STEP>
    angleGridXres = float(sunZenithNode.find("COL_STEP").text)
    angleGridYres = float(sunZenithNode.find("ROW_STEP").text)
    sza = self._makeValueArray(sunZenithNode.find("Values_List"))
    mask_nans = np.isnan(sza)
    if np.any(mask_nans):
        from skimage.restoration import inpaint_biharmonic

        sza = inpaint_biharmonic(sza, mask_nans)
    transform_zenith = rasterio.transform.from_origin(
        self.ulxyByRes[str(self.out_res)][0],
        self.ulxyByRes[str(self.out_res)][1],
        angleGridXres,
        angleGridYres,
    )

    self.sza = GeoTensor(sza, transform=transform_zenith, crs=self.epsg_code)

    # Azimuth
    sunAzimuthNode = self.tileAnglesNode.find("Sun_Angles_Grid").find("Azimuth")
    angleGridXres = float(sunAzimuthNode.find("COL_STEP").text)
    angleGridYres = float(sunAzimuthNode.find("ROW_STEP").text)
    saa = self._makeValueArray(sunAzimuthNode.find("Values_List"))
    mask_nans = np.isnan(saa)
    if np.any(mask_nans):
        from skimage.restoration import inpaint_biharmonic

        saa = inpaint_biharmonic(saa, mask_nans)
    transform_azimuth = rasterio.transform.from_origin(
        self.ulxyByRes[str(self.out_res)][0],
        self.ulxyByRes[str(self.out_res)][1],
        angleGridXres,
        angleGridYres,
    )
    self.saa = GeoTensor(saa, transform=transform_azimuth, crs=self.epsg_code)

    # Now build up the viewing angle per grid cell, from the separate layers
    # given for each detector for each band. Initially I am going to keep
    # the bands separate, just to see how that looks.
    # The names of things in the XML suggest that these are view angles,
    # but the numbers suggest that they are angles as seen from the pixel's
    # frame of reference on the ground, i.e. they are in fact what we ultimately want.
    viewingAngleNodeList = self.tileAnglesNode.findall(
        "Viewing_Incidence_Angles_Grids"
    )
    vza = self._buildViewAngleArr(viewingAngleNodeList, "Zenith")
    vaa = self._buildViewAngleArr(viewingAngleNodeList, "Azimuth")

    self.vaa = {}
    for k, varr in vaa.items():
        mask_nans = np.isnan(varr)
        if np.any(mask_nans):
            from skimage.restoration import inpaint_biharmonic

            varr = inpaint_biharmonic(varr, mask_nans)

        self.vaa[k] = GeoTensor(
            varr, transform=transform_azimuth, crs=self.epsg_code
        )

    self.vza = {}
    for k, varr in vza.items():
        mask_nans = np.isnan(varr)
        if np.any(mask_nans):
            from skimage.restoration import inpaint_biharmonic

            varr = inpaint_biharmonic(varr, mask_nans)
        self.vza[k] = GeoTensor(
            varr, transform=transform_zenith, crs=self.epsg_code
        )

    # Make a guess at the coordinates of the angle grids. These are not given
    # explicitly in the XML, and don't line up exactly with the other grids, so I am
    # making a rough estimate. Because the angles don't change rapidly across these
    # distances, it is not important if I am a bit wrong (although it would be nice
    # to be exactly correct!).
    (ulx, uly) = self.ulxyByRes["10"]
    self.anglesULXY = (ulx - angleGridXres / 2.0, uly + angleGridYres / 2.0)

    # Read mean viewing angles for each band.
    self.mean_vaa = {}
    self.mean_vza = {}
    for elm in self.tileAnglesNode.find("Mean_Viewing_Incidence_Angle_List"):
        band_name = BANDS_S2[int(elm.attrib["bandId"])]
        viewing_zenith_angle = float(elm.find("ZENITH_ANGLE").text)
        viewing_azimuth_angle = float(elm.find("AZIMUTH_ANGLE").text)
        self.mean_vza[band_name] = viewing_zenith_angle
        self.mean_vaa[band_name] = viewing_azimuth_angle

`S2ImageL2A` ¶

Bases: S2Image

Sentinel-2 Level 2A (surface reflectance) image reader.

This class extends the base S2Image class to handle Sentinel-2 Level 2A products, which provide surface reflectance data with atmospheric corrections applied.

Parameters:

Name	Type	Description	Default
`s2folder`	`str`	Path to the Sentinel-2 SAFE product folder.	required
`granules`	`Dict[str, str]`	Dictionary mapping band names to file paths.	required
`polygon`	`Polygon`	Polygon defining the area of interest in EPSG:4326.	required
`out_res`	`int`	Output resolution in meters. Must be one of 10, 20, or 60. Defaults to 10.	`10`
`window_focus`	`Optional[Window]`	Window to focus on a specific region of the image. Defaults to None (entire image).	`None`
`bands`	`Optional[List[str]]`	List of bands to read. If None, the default L2A bands (excluding B10) will be loaded.	`None`
`metadata_msi`	`Optional[str]`	Path to metadata file. If None, it is assumed to be in the SAFE folder.	`None`

Attributes:

Name	Type	Description
`mission`	`str`	Mission identifier (e.g., 'S2A', 'S2B').
`producttype`	`str`	Product type identifier (e.g., 'MSIL2A').
`pdgs`	`str`	PDGS Processing Baseline number.
`relorbitnum`	`str`	Relative Orbit number.
`tile_number_field`	`str`	Tile Number field.
`product_discriminator`	`str`	Product Discriminator.
`name`	`str`	Base name of the product.
`folder`	`str`	Path to the product folder.
`datetime`	`datetime`	Acquisition datetime.
`metadata_msi`	`str`	Path to the MSI metadata file.
`out_res`	`int`	Output resolution in meters.
`bands`	`List[str]`	List of bands to read.
`dims`	`Tuple[str]`	Names of the dimensions ("band", "y", "x").
`fill_value_default`	`int`	Default fill value (typically 0).
`band_check`	`str`	Band used as template for reading.
`granule_readers`	`Dict[str, RasterioReader]`	Dictionary of readers for each band.
`window_focus`	`Window`	Current window focus.

Examples:

>>> # Initialize the S2ImageL2A reader with a data path
>>> s2_l2a = S2ImageL2A('/path/to/S2A_MSIL2A_20170717T235959_N0205_R072_T01WCP_20170718T000256.SAFE',
...                     granules=granules_dict, polygon=aoi_polygon)
>>> # Load all bands
>>> l2a_data = s2_l2a.load()

Source code in georeader/readers/S2_SAFE_reader.py

class S2ImageL2A(S2Image):
    """
    Sentinel-2 Level 2A (surface reflectance) image reader.

    This class extends the base S2Image class to handle Sentinel-2 Level 2A products,
    which provide surface reflectance data with atmospheric corrections applied.

    Args:
        s2folder (str): Path to the Sentinel-2 SAFE product folder.
        granules (Dict[str, str]): Dictionary mapping band names to file paths.
        polygon (Polygon): Polygon defining the area of interest in EPSG:4326.
        out_res (int): Output resolution in meters. Must be one of 10, 20, or 60. Defaults to 10.
        window_focus (Optional[rasterio.windows.Window]): Window to focus on a specific
            region of the image. Defaults to None (entire image).
        bands (Optional[List[str]]): List of bands to read. If None, the default L2A bands
            (excluding B10) will be loaded.
        metadata_msi (Optional[str]): Path to metadata file. If None, it is assumed to be
            in the SAFE folder.

    Attributes:
        mission (str): Mission identifier (e.g., 'S2A', 'S2B').
        producttype (str): Product type identifier (e.g., 'MSIL2A').
        pdgs (str): PDGS Processing Baseline number.
        relorbitnum (str): Relative Orbit number.
        tile_number_field (str): Tile Number field.
        product_discriminator (str): Product Discriminator.
        name (str): Base name of the product.
        folder (str): Path to the product folder.
        datetime (datetime): Acquisition datetime.
        metadata_msi (str): Path to the MSI metadata file.
        out_res (int): Output resolution in meters.
        bands (List[str]): List of bands to read.
        dims (Tuple[str]): Names of the dimensions ("band", "y", "x").
        fill_value_default (int): Default fill value (typically 0).
        band_check (str): Band used as template for reading.
        granule_readers (Dict[str, RasterioReader]): Dictionary of readers for each band.
        window_focus (rasterio.windows.Window): Current window focus.

    Examples:
        >>> # Initialize the S2ImageL2A reader with a data path
        >>> s2_l2a = S2ImageL2A('/path/to/S2A_MSIL2A_20170717T235959_N0205_R072_T01WCP_20170718T000256.SAFE',
        ...                     granules=granules_dict, polygon=aoi_polygon)
        >>> # Load all bands
        >>> l2a_data = s2_l2a.load()
    """

    def __init__(
        self,
        s2folder: str,
        granules: Dict[str, str],
        polygon: Polygon,
        out_res: int = 10,
        window_focus: Optional[rasterio.windows.Window] = None,
        bands: Optional[List[str]] = None,
        metadata_msi: Optional[str] = None,
    ):
        if bands is None:
            bands = BANDS_S2_L2A

        super(S2ImageL2A, self).__init__(
            s2folder=s2folder,
            granules=granules,
            polygon=polygon,
            out_res=out_res,
            bands=bands,
            window_focus=window_focus,
            metadata_msi=metadata_msi,
        )

        assert (
            self.producttype == "MSIL2A"
        ), f"Unexpected product type {self.producttype} in image {self.folder}"

`s2loader(s2folder, out_res=10, bands=None, window_focus=None, granules=None, polygon=None, metadata_msi=None)` ¶

Loads a S2ImageL2A or S2ImageL1C depending on the product type

Parameters:

Name	Type	Description	Default
`s2folder`	`str`	.SAFE folder. Expected standard ESA naming convention (see s2_name_split fun)	required
`out_res`	`int`	default output resolution {10, 20, 60}	`10`
`bands`	`Optional[List[str]]`	Bands to read. Default to BANDS_S2 or BANDS_S2_L2A depending on the product type	`None`
`window_focus`	`Optional[Window]`	window to read when creating the object	`None`
`granules`	`Optional[Dict[str, str]]`	Dict where keys are the band names and values are paths to the band location	`None`
`polygon`	`Optional[Polygon]`	polygon with the footprint of the object	`None`
`metadata_msi`	`Optional[str]`	path to metadata file	`None`

Returns:

Type	Description
`Union[S2ImageL2A, S2ImageL1C]`	S2Image reader

Source code in georeader/readers/S2_SAFE_reader.py

def s2loader(
    s2folder: str,
    out_res: int = 10,
    bands: Optional[List[str]] = None,
    window_focus: Optional[rasterio.windows.Window] = None,
    granules: Optional[Dict[str, str]] = None,
    polygon: Optional[Polygon] = None,
    metadata_msi: Optional[str] = None,
) -> Union[S2ImageL2A, S2ImageL1C]:
    """
    Loads a S2ImageL2A or S2ImageL1C depending on the product type

    Args:
        s2folder: .SAFE folder. Expected standard ESA naming convention (see s2_name_split fun)
        out_res: default output resolution {10, 20, 60}
        bands: Bands to read. Default to BANDS_S2 or BANDS_S2_L2A depending on the product type
        window_focus: window to read when creating the object
        granules: Dict where keys are the band names and values are paths to the band location
        polygon: polygon with the footprint of the object
        metadata_msi: path to metadata file

    Returns:
        S2Image reader
    """

    _, producttype_nos2, _, _, _, _, _ = s2_name_split(s2folder)

    if producttype_nos2 == "MSIL2A":
        return S2ImageL2A(
            s2folder,
            granules=granules,
            polygon=polygon,
            out_res=out_res,
            bands=bands,
            window_focus=window_focus,
            metadata_msi=metadata_msi,
        )
    elif producttype_nos2 == "MSIL1C":
        return S2ImageL1C(
            s2folder,
            granules=granules,
            polygon=polygon,
            out_res=out_res,
            bands=bands,
            window_focus=window_focus,
            metadata_msi=metadata_msi,
        )

    raise NotImplementedError(f"Don't know how to load {producttype_nos2} products")

`s2_public_bucket_path(s2file, check_exists=False, mode='gcp')` ¶

Returns the expected patch in the public bucket of the S2 file

Parameters:

Name	Type	Description	Default
`s2file`	`str`	safe file (e.g. S2B_MSIL1C_20220527T030539_N0400_R075_T49SGV_20220527T051042.SAFE)	required
`check_exists`	`bool`	check if the file exists in the bucket, This will not work if GOOGLE_APPLICATION_CREDENTIALS and/or GS_USER_PROJECT env variables are not set. Default to False	`False`
`mode`	`str`	"gcp" or "rest"	`'gcp'`

Returns:

Type	Description
`str`	full path to the file (e.g. gs://gcp-public-data-sentinel-2/tiles/49/S/GV/S2B_MSIL1C_20220527T030539_N0400_R075_T49SGV_20220527T051042.SAFE)

Source code in georeader/readers/S2_SAFE_reader.py

def s2_public_bucket_path(
    s2file: str, check_exists: bool = False, mode: str = "gcp"
) -> str:
    """
    Returns the expected patch in the public bucket of the S2 file

    Args:
        s2file: safe file (e.g.  S2B_MSIL1C_20220527T030539_N0400_R075_T49SGV_20220527T051042.SAFE)
        check_exists: check if the file exists in the bucket, This will not work if GOOGLE_APPLICATION_CREDENTIALS and/or GS_USER_PROJECT
            env variables are not set. Default to False
        mode: "gcp" or "rest"

    Returns:
        full path to the file (e.g. gs://gcp-public-data-sentinel-2/tiles/49/S/GV/S2B_MSIL1C_20220527T030539_N0400_R075_T49SGV_20220527T051042.SAFE)
    """
    (
        mission,
        producttype,
        sensing_date_str,
        pdgs,
        relorbitnum,
        tile_number_field,
        product_discriminator,
    ) = s2_name_split(s2file)
    s2file = s2file[:-1] if s2file.endswith("/") else s2file

    if not s2file.endswith(".SAFE"):
        s2file += ".SAFE"

    basename = os.path.basename(s2file)
    if mode == "gcp":
        s2folder = f"{FULL_PATH_PUBLIC_BUCKET_SENTINEL_2}tiles/{tile_number_field[:2]}/{tile_number_field[2]}/{tile_number_field[3:]}/{basename}"
    elif mode == "rest":
        s2folder = f"https://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/{tile_number_field[:2]}/{tile_number_field[2]}/{tile_number_field[3:]}/{basename}"
    else:
        raise NotImplementedError(f"Mode {mode} unknown")

    if check_exists and (mode == "gcp"):
        fs = get_filesystem(s2folder)

        if not fs.exists(s2folder):
            raise FileNotFoundError(f"Sentinel-2 file not found in {s2folder}")

    return s2folder

`read_srf(satellite, srf_file=SRF_FILE_DEFAULT, cache=True)` ¶

Process the spectral response function file. By default it reads the SRF document (COPE-GSEG-EOPG-TN-15-0007 v4.0) bundled with the package, so no network access is needed. Pass an http(s):// URL as srf_file to use a different revision; it is downloaded once and cached in ~/.georeader.

This function requires pandas and openpyxl for reading excel files, and fsspec if srf_file is a URL.

Parameters:

Name	Type	Description	Default
`satellite`	`str`	satellite name (S2A, S2B or S2C)	required
`srf_file`	`str`	path or URL of the srf file	`SRF_FILE_DEFAULT`
`cache`	`bool`	if True, the srf is cached for future calls. Default True	`True`

Returns:

Type	Description
`DataFrame`	pd.DataFrame: spectral response function for each of the bands of S2

Source code in georeader/readers/S2_SAFE_reader.py

def read_srf(
    satellite: str, srf_file: str = SRF_FILE_DEFAULT, cache: bool = True
) -> pd.DataFrame:
    """
    Process the spectral response function file. By default it reads the SRF
    document (COPE-GSEG-EOPG-TN-15-0007 v4.0) bundled with the package, so no
    network access is needed. Pass an ``http(s)://`` URL as `srf_file` to use a
    different revision; it is downloaded once and cached in ``~/.georeader``.

    This function requires pandas and openpyxl for reading excel files, and
    fsspec if `srf_file` is a URL.

    Args:
        satellite (str): satellite name (S2A, S2B or S2C)
        srf_file (str): path or URL of the srf file
        cache (bool): if True, the srf is cached for future calls. Default True

    Returns:
        pd.DataFrame: spectral response function for each of the bands of S2
    """
    assert satellite in ["S2A", "S2B", "S2C"], "satellite must be S2A, S2B or S2C"

    if cache:
        global SRF_S2
        if satellite in SRF_S2:
            return SRF_S2[satellite]

    if srf_file.startswith(("http://", "https://")):
        home_dir = os.path.join(os.path.expanduser("~"), ".georeader")
        os.makedirs(home_dir, exist_ok=True)
        srf_filename = os.path.basename(srf_file)

        # Decode the url to get the filename. Also, replace spaces with underscores
        import urllib.parse

        srf_filename = urllib.parse.unquote(srf_filename).replace(" ", "_")

        srf_file_local = os.path.join(home_dir, srf_filename)
        if not os.path.exists(srf_file_local):
            import fsspec

            with fsspec.open(srf_file, "rb") as f:
                with open(srf_file_local, "wb") as f2:
                    f2.write(f.read())
        srf_file = srf_file_local

    srf_s2 = pd.read_excel(srf_file, sheet_name=f"Spectral Responses ({satellite})")

    srf_s2 = srf_s2.set_index("SR_WL")

    # remove rows with all values zero
    any_not_cero = np.any((srf_s2 > 1e-6).values, axis=1)
    srf_s2 = srf_s2.loc[any_not_cero]

    # remove the satellite name from the columns
    srf_s2.columns = [c.replace(f"{satellite}_SR_AV_", "") for c in srf_s2.columns]
    srf_s2.columns = normalize_band_names(srf_s2.columns)

    if cache:
        SRF_S2[satellite] = srf_s2

    return srf_s2

Proba-V Reader¶

The Proba-V reader enables access to Proba-V Level 2A and Level 3 products. It handles:

Reading TOA reflectance from HDF5 files
Mask handling for clouds, shadows, and invalid pixels
Extraction of metadata and acquisition parameters

Tutorial example:

Reading overlapping Proba-V and Sentinel-2 images

API Reference¶

Proba-V reader

Unnoficial Proba-V reader. This reader is based in the Proba-V user manual: https://publications.vito.be/2017-1333-probav-products-user-manual.pdf

Author: Gonzalo Mateo-García

`ProbaV` ¶

Proba-V reader for handling Proba-V satellite products.

This class provides functionality to read and manipulate Proba-V satellite imagery products. It handles the specific format and metadata of Proba-V HDF5 files, supporting operations like loading radiometry data, masks, and cloud information.

Parameters:

Name	Type	Description	Default
`hdf5_file`	`str`	Path to the HDF5 file containing the Proba-V product.	required
`window`	`Optional[Window]`	Optional window to focus on a specific region of the image. Defaults to None (entire image).	`None`
`level_name`	`str`	Processing level of the product, either "LEVEL2A" or "LEVEL3". Defaults to "LEVEL3".	`'LEVEL3'`

Attributes:

Name	Type	Description
`hdf5_file`	`str`	Path to the HDF5 file.
`name`	`str`	Basename of the HDF5 file.
`camera`	`str`	Camera ID (for LEVEL2A products).
`res_name`	`str`	Resolution name identifier (e.g., '100M', '300M', '1KM').
`version`	`str`	Product version.
`toatoc`	`str`	Indicator of whether data is TOA (top of atmosphere) or TOC (top of canopy).
`real_transform`	`Affine`	Affine transform for the full image.
`real_shape`	`Tuple[int, int]`	Shape of the full image (height, width).
`dtype_radiometry`		Data type for radiometry data (typically np.float32).
`dtype_sm`		Data type for SM (status map) data.
`metadata`	`Dict[str, Any]`	Dictionary with product metadata.
`window_focus`	`Window`	Current window focus.
`window_data`	`Window`	Window representing the full data extent.
`start_date`	`datetime`	Start acquisition date and time.
`end_date`	`datetime`	End acquisition date and time.
`map_projection_wkt`	`str`	WKT representation of the map projection.
`crs`		Coordinate reference system.
`level_name`	`str`	Processing level identifier.

Examples:

>>> import rasterio.windows
>>> # Initialize the ProbaV reader with a data path
>>> probav_reader = ProbaV('/path/to/probav_product.HDF5')
>>> # Load radiometry data
>>> bands = probav_reader.load_radiometry()
>>> # Get cloud mask
>>> cloud_mask = probav_reader.load_sm_cloud_mask()
>>> # Focus on a specific window
>>> window = rasterio.windows.Window(col_off=100, row_off=100, width=200, height=200)
>>> probav_reader.set_window(window)

Source code in georeader/readers/probav_image_operational.py

class ProbaV:
    """
    Proba-V reader for handling Proba-V satellite products.

    This class provides functionality to read and manipulate Proba-V satellite imagery products.
    It handles the specific format and metadata of Proba-V HDF5 files, supporting operations
    like loading radiometry data, masks, and cloud information.

    Args:
        hdf5_file (str): Path to the HDF5 file containing the Proba-V product.
        window (Optional[rasterio.windows.Window]): Optional window to focus on a specific
            region of the image. Defaults to None (entire image).
        level_name (str): Processing level of the product, either "LEVEL2A" or "LEVEL3".
            Defaults to "LEVEL3".

    Attributes:
        hdf5_file (str): Path to the HDF5 file.
        name (str): Basename of the HDF5 file.
        camera (str): Camera ID (for LEVEL2A products).
        res_name (str): Resolution name identifier (e.g., '100M', '300M', '1KM').
        version (str): Product version.
        toatoc (str): Indicator of whether data is TOA (top of atmosphere) or TOC (top of canopy).
        real_transform (rasterio.Affine): Affine transform for the full image.
        real_shape (Tuple[int, int]): Shape of the full image (height, width).
        dtype_radiometry: Data type for radiometry data (typically np.float32).
        dtype_sm: Data type for SM (status map) data.
        metadata (Dict[str, Any]): Dictionary with product metadata.
        window_focus (rasterio.windows.Window): Current window focus.
        window_data (rasterio.windows.Window): Window representing the full data extent.
        start_date (datetime): Start acquisition date and time.
        end_date (datetime): End acquisition date and time.
        map_projection_wkt (str): WKT representation of the map projection.
        crs: Coordinate reference system.
        level_name (str): Processing level identifier.

    Examples:
        >>> import rasterio.windows
        >>> # Initialize the ProbaV reader with a data path
        >>> probav_reader = ProbaV('/path/to/probav_product.HDF5')
        >>> # Load radiometry data
        >>> bands = probav_reader.load_radiometry()
        >>> # Get cloud mask
        >>> cloud_mask = probav_reader.load_sm_cloud_mask()
        >>> # Focus on a specific window
        >>> window = rasterio.windows.Window(col_off=100, row_off=100, width=200, height=200)
        >>> probav_reader.set_window(window)
    """

    def __init__(
        self,
        hdf5_file: str,
        window: Optional[rasterio.windows.Window] = None,
        level_name: str = "LEVEL3",
    ):
        self.hdf5_file = hdf5_file
        self.name = os.path.basename(self.hdf5_file)
        if level_name == "LEVEL2A":
            matches = re.match(
                r"PROBAV_L2A_\d{8}_\d{6}_(\d)_(\d..?M)_(V\d0\d)", self.name
            )
            if matches is not None:
                self.camera, self.res_name, self.version = matches.groups()
            self.toatoc = "TOA"
        elif level_name == "LEVEL3":
            matches = re.match(
                r"PROBAV_S1_(TO.)_.{6}_\d{8}_(\d..?M)_(V\d0\d)", self.name
            )
            if matches is not None:
                self.toatoc, self.res_name, self.version = matches.groups()
        else:
            raise NotImplementedError(f"Unknown level name {level_name}")

        try:
            with h5py.File(self.hdf5_file, "r") as input_f:
                # reference metadata: http://www.vito-eodata.be/PDF/image/PROBAV-Products_User_Manual.pdf
                valores_blue = (
                    input_f[f"{level_name}/RADIOMETRY/BLUE/{self.toatoc}"]
                    .attrs["MAPPING"][3:7]
                    .astype(np.float64)
                )
                self.real_transform = Affine(
                    a=valores_blue[2],
                    b=0,
                    c=valores_blue[0],
                    d=0,
                    e=-valores_blue[3],
                    f=valores_blue[1],
                )
                self.real_shape = input_f[
                    f"{level_name}/RADIOMETRY/BLUE/{self.toatoc}"
                ].shape
                # self.dtype_radiometry = input_f[f"{level_name}/RADIOMETRY/RED/{self.toatoc}"].dtype

                # Set to float because we're converting the image to TOA when reading (see read_radiometry function)
                self.dtype_radiometry = np.float32
                self.dtype_sm = input_f[f"{level_name}/QUALITY/SM"].dtype
                self.metadata = dict(input_f.attrs)
        except OSError as e:
            raise FileNotFoundError("Error opening file %s" % self.hdf5_file)

        if window is None:
            self.window_focus = rasterio.windows.Window(
                row_off=0,
                col_off=0,
                width=self.real_shape[1],
                height=self.real_shape[0],
            )
        else:
            self.window_focus = rasterio.windows.Window(
                row_off=0,
                col_off=0,
                width=self.real_shape[1],
                height=self.real_shape[0],
            )

        self.window_data = rasterio.windows.Window(
            row_off=0, col_off=0, width=self.real_shape[1], height=self.real_shape[0]
        )

        if "OBSERVATION_END_DATE" in self.metadata:
            self.end_date = datetime.strptime(
                " ".join(
                    self.metadata["OBSERVATION_END_DATE"].astype(str).tolist()
                    + self.metadata["OBSERVATION_END_TIME"].astype(str).tolist()
                ),
                "%Y-%m-%d %H:%M:%S",
            ).replace(tzinfo=timezone.utc)
            self.start_date = datetime.strptime(
                " ".join(
                    self.metadata["OBSERVATION_START_DATE"].astype(str).tolist()
                    + self.metadata["OBSERVATION_START_TIME"].astype(str).tolist()
                ),
                "%Y-%m-%d %H:%M:%S",
            ).replace(tzinfo=timezone.utc)
            self.map_projection_wkt = " ".join(
                self.metadata["MAP_PROJECTION_WKT"].astype(str).tolist()
            )

        # Proba-V images are lat/long
        self.crs = rasterio.crs.CRS({"init": "epsg:4326"})

        # Proba-V images have four bands
        self.level_name = level_name

        # Default nodata/fill value (overridden by subclasses as appropriate).
        # Read functions such as georeader.read.read_reproject expect this
        # attribute on reader objects.
        self.fill_value_default = 0

    def _get_window_pad(
        self, boundless: bool = True
    ) -> Tuple[rasterio.windows.Window, Optional[List]]:
        window_read = rasterio.windows.intersection(self.window_focus, self.window_data)

        if boundless:
            _, pad_width = window_utils.get_slice_pad(
                self.window_data, self.window_focus
            )
            need_pad = any(p != 0 for p in pad_width["x"] + pad_width["y"])
            if need_pad:
                pad_list_np = []
                for k in ["y", "x"]:
                    if k in pad_width:
                        pad_list_np.append(pad_width[k])
                    else:
                        pad_list_np.append((0, 0))
            else:
                pad_list_np = None
        else:
            pad_list_np = None

        return window_read, pad_list_np

    def footprint(self, crs: Optional[str] = None) -> Polygon:
        # TODO load footprint from metadata?
        pol = window_utils.window_polygon(self.window_focus, self.transform)
        if (crs is None) or window_utils.compare_crs(self.crs, crs):
            return pol

        return window_utils.polygon_to_crs(pol, self.crs, crs)

    def valid_footprint(self, crs: Optional[str] = None) -> Polygon:
        valids = self.load_mask()
        return valids.valid_footprint(crs=crs)

    def _load_bands(
        self,
        bands_names: Union[List[str], str],
        boundless: bool = True,
        fill_value_default: Number = 0,
    ) -> geotensor.GeoTensor:
        window_read, pad_list_np = self._get_window_pad(boundless=boundless)
        slice_ = window_read.toslices()
        if isinstance(bands_names, str):
            bands_names = [bands_names]
            flatten = True
        else:
            flatten = False

        with h5py.File(self.hdf5_file, "r") as input_f:
            bands_arrs = []
            for band in bands_names:
                data = read_band_toa(input_f, band, slice_)
                if pad_list_np is not None:
                    data = np.pad(
                        data,
                        tuple(pad_list_np),
                        mode="constant",
                        constant_values=fill_value_default,
                    )

                bands_arrs.append(data)

        if boundless:
            transform = self.transform
        else:
            transform = rasterio.windows.transform(window_read, self.real_transform)

        if flatten:
            img = bands_arrs[0]
        else:
            img = np.stack(bands_arrs, axis=0)

        return geotensor.GeoTensor(
            img,
            transform=transform,
            crs=self.crs,
            fill_value_default=fill_value_default,
        )

    def save_bands(self, img: np.ndarray):
        """

        Args:
            img: (4, self.real_height, self.real_width, 4) tensor

        Returns:

        """
        assert (
            img.shape[0] == 4
        ), "Unexpected number of channels expected 4 found {}".format(img.shape)
        assert (
            img.shape[1:] == self.real_shape
        ), f"Unexpected shape expected {self.real_shape} found {img.shape[1:]}"

        # TODO save only window_focus?

        with h5py.File(self.hdf5_file, "r+") as input_f:
            for i, b in enumerate(BAND_NAMES):
                band_to_save = img[i]
                mask_band_2_save = np.ma.getmaskarray(img[i])
                band_to_save = np.clip(np.ma.filled(band_to_save, 0), 0, 2)
                band_name = f"{self.level_name}/RADIOMETRY/{b}/{self.toatoc}"
                attrs = input_f[band_name].attrs
                band_to_save *= attrs["SCALE"]
                band_to_save += attrs["OFFSET"]
                band_to_save = np.round(band_to_save).astype(np.int16)
                band_to_save[mask_band_2_save] = -1
                input_f[band_name][...] = band_to_save

    def load_radiometry(
        self, indexes: Optional[List[int]] = None, boundless: bool = True
    ) -> geotensor.GeoTensor:
        if indexes is None:
            indexes = (0, 1, 2, 3)
        bands_names = [
            f"{self.level_name}/RADIOMETRY/{BAND_NAMES[i]}/{self.toatoc}"
            for i in indexes
        ]
        return self._load_bands(
            bands_names, boundless=boundless, fill_value_default=-1 / 2000.0
        )

    def load_sm(self, boundless: bool = True) -> geotensor.GeoTensor:
        """
        Reference of values in `SM` flags.

        From [user manual](http://www.vito-eodata.be/PDF/image/PROBAV-Products_User_Manual.pdf) pag 67
        * Clear  ->    000
        * Shadow ->    001
        * Undefined -> 010
        * Cloud  ->    011
        * Ice    ->    100
        * `2**3` sea/land
        * `2**4` quality swir (0 bad 1 good)
        * `2**5` quality nir
        * `2**6` quality red
        * `2**7` quality blue
        * `2**8` coverage swir (0 no 1 yes)
        * `2**9` coverage nir
        * `2**10` coverage red
        * `2**11` coverage blue
        """
        return self._load_bands(
            f"{self.level_name}/QUALITY/SM", boundless=boundless, fill_value_default=0
        )

    def load_mask(self, boundless: bool = True) -> geotensor.GeoTensor:
        """
        Returns the valid mask (False if the pixel is out of swath or is invalid). This function loads the SM band

        Args:
            boundless (bool, optional): boundless option to load the SM band. Defaults to True.

        Returns:
            geotensor.GeoTensor: mask with the same shape as the image
        """
        sm = self.load_sm(boundless=boundless)
        # ~mask_only_sm(...) is boolean while sm is integer; build a new GeoTensor
        # rather than assigning into .values (which forbids dtype changes in place).
        valids = geotensor.GeoTensor(~mask_only_sm(sm.values), transform=sm.transform,
                                     crs=sm.crs, fill_value_default=False)
        return valids

    def load_sm_cloud_mask(
        self, mask_undefined: bool = False, boundless: bool = True
    ) -> geotensor.GeoTensor:
        sm = self.load_sm(boundless=boundless)
        cloud_mask = sm_cloud_mask(sm.values, mask_undefined=mask_undefined)
        return geotensor.GeoTensor(
            cloud_mask, transform=self.transform, crs=self.crs, fill_value_default=0
        )

    def is_recompressed_and_chunked(self) -> bool:
        original_bands = [
            f"{self.level_name}/RADIOMETRY/{b}/{self.toatoc}" for b in BAND_NAMES
        ]
        original_bands.append(f"{self.level_name}/QUALITY/SM")
        with h5py.File(self.hdf5_file, "r") as input_:
            for b in original_bands:
                if input_[b].compression == "szip":
                    return False
                if (input_[b].chunks is None) or (input_[b].chunks[0] == 1):
                    return False
        return True

    def assert_can_be_read(self):
        original_bands = [
            f"{self.level_name}/RADIOMETRY/{b}/{self.toatoc}" for b in BAND_NAMES
        ] + [f"{self.level_name}/QUALITY/SM"]
        with h5py.File(self.hdf5_file, "a") as input_:
            for name in original_bands:
                assert is_compression_available(
                    input_[name]
                ), f"Band {name} cannot be read. Compression: {input_[name].compression}"

    def recompress_bands(
        self,
        chunks: Tuple[int, int] = (512, 512),
        replace: bool = True,
        compression_dest: str = "gzip",
    ):
        original_bands = {
            b: f"{self.level_name}/RADIOMETRY/{b}/{self.toatoc}" for b in BAND_NAMES
        }
        original_bands.update({"SM": f"{self.level_name}/QUALITY/SM"})
        copy_bands = {k: v + "_NEW" for (k, v) in original_bands.items()}
        with h5py.File(self.hdf5_file, "a") as input_:
            for b in original_bands.keys():
                assert_compression_available(input_[original_bands[b]])
                data = input_[original_bands[b]][:]
                if copy_bands[b] in input_:
                    del input_[copy_bands[b]]

                ds = input_.create_dataset(
                    copy_bands[b],
                    data=data,
                    chunks=chunks,
                    compression=compression_dest,
                )

                attrs_copy = input_[original_bands[b]].attrs
                for k, v in attrs_copy.items():
                    ds.attrs[k] = v

                if replace:
                    del input_[original_bands[b]]
                    input_[original_bands[b]] = input_[copy_bands[b]]
                    del input_[copy_bands[b]]

    @property
    def transform(self) -> Affine:
        return rasterio.windows.transform(self.window_focus, self.real_transform)

    @property
    def res(self) -> Tuple[float, float]:
        return window_utils.res(self.transform)

    @property
    def height(self) -> int:
        return self.window_focus.height

    @property
    def width(self) -> int:
        return self.window_focus.width

    @property
    def bounds(self) -> Tuple[float, float, float, float]:
        return window_utils.window_bounds(self.window_focus, self.real_transform)

    def set_window(
        self,
        window: rasterio.windows.Window,
        relative: bool = True,
        boundless: bool = True,
    ):
        if relative:
            self.window_focus = rasterio.windows.Window(
                col_off=window.col_off + self.window_focus.col_off,
                row_off=window.row_off + self.window_focus.row_off,
                height=window.height,
                width=window.width,
            )
        else:
            self.window_focus = window

        if not boundless:
            self.window_focus = rasterio.windows.intersection(
                self.window_data, self.window_focus
            )

    def __copy__(self) -> "__class__":
        return ProbaV(
            self.hdf5_file, window=self.window_focus, level_name=self.level_name
        )

    def read_from_window(
        self, window: Optional[rasterio.windows.Window] = None, boundless: bool = True
    ) -> "__class__":
        copy = self.__copy__()
        copy.set_window(window=window, boundless=boundless)

        return copy

    def __repr__(self) -> str:
        return f""" 
         File: {self.hdf5_file}
         Transform: {self.transform}
         Shape: {self.height}, {self.width}
         Resolution: {self.res}
         Bounds: {self.bounds}
         CRS: {self.crs}
         Level: {self.level_name}
         TOA/TOC: {self.toatoc}
         Resolution name : {self.res_name}
        """

`load_mask(boundless=True)` ¶

Returns the valid mask (False if the pixel is out of swath or is invalid). This function loads the SM band

Parameters:

Name	Type	Description	Default
`boundless`	`bool`	boundless option to load the SM band. Defaults to True.	`True`

Returns:

Type	Description
`GeoTensor`	geotensor.GeoTensor: mask with the same shape as the image

Source code in georeader/readers/probav_image_operational.py

def load_mask(self, boundless: bool = True) -> geotensor.GeoTensor:
    """
    Returns the valid mask (False if the pixel is out of swath or is invalid). This function loads the SM band

    Args:
        boundless (bool, optional): boundless option to load the SM band. Defaults to True.

    Returns:
        geotensor.GeoTensor: mask with the same shape as the image
    """
    sm = self.load_sm(boundless=boundless)
    # ~mask_only_sm(...) is boolean while sm is integer; build a new GeoTensor
    # rather than assigning into .values (which forbids dtype changes in place).
    valids = geotensor.GeoTensor(~mask_only_sm(sm.values), transform=sm.transform,
                                 crs=sm.crs, fill_value_default=False)
    return valids

`load_sm(boundless=True)` ¶

Reference of values in SM flags.

From user manual pag 67 * Clear -> 000 * Shadow -> 001 * Undefined -> 010 * Cloud -> 011 * Ice -> 100 * 2**3 sea/land * 2**4 quality swir (0 bad 1 good) * 2**5 quality nir * 2**6 quality red * 2**7 quality blue * 2**8 coverage swir (0 no 1 yes) * 2**9 coverage nir * 2**10 coverage red * 2**11 coverage blue

Source code in georeader/readers/probav_image_operational.py

def load_sm(self, boundless: bool = True) -> geotensor.GeoTensor:
    """
    Reference of values in `SM` flags.

    From [user manual](http://www.vito-eodata.be/PDF/image/PROBAV-Products_User_Manual.pdf) pag 67
    * Clear  ->    000
    * Shadow ->    001
    * Undefined -> 010
    * Cloud  ->    011
    * Ice    ->    100
    * `2**3` sea/land
    * `2**4` quality swir (0 bad 1 good)
    * `2**5` quality nir
    * `2**6` quality red
    * `2**7` quality blue
    * `2**8` coverage swir (0 no 1 yes)
    * `2**9` coverage nir
    * `2**10` coverage red
    * `2**11` coverage blue
    """
    return self._load_bands(
        f"{self.level_name}/QUALITY/SM", boundless=boundless, fill_value_default=0
    )

`save_bands(img)` ¶

Parameters:

Name	Type	Description	Default
`img`	`ndarray`	(4, self.real_height, self.real_width, 4) tensor	required

Returns:

Source code in georeader/readers/probav_image_operational.py

def save_bands(self, img: np.ndarray):
    """

    Args:
        img: (4, self.real_height, self.real_width, 4) tensor

    Returns:

    """
    assert (
        img.shape[0] == 4
    ), "Unexpected number of channels expected 4 found {}".format(img.shape)
    assert (
        img.shape[1:] == self.real_shape
    ), f"Unexpected shape expected {self.real_shape} found {img.shape[1:]}"

    # TODO save only window_focus?

    with h5py.File(self.hdf5_file, "r+") as input_f:
        for i, b in enumerate(BAND_NAMES):
            band_to_save = img[i]
            mask_band_2_save = np.ma.getmaskarray(img[i])
            band_to_save = np.clip(np.ma.filled(band_to_save, 0), 0, 2)
            band_name = f"{self.level_name}/RADIOMETRY/{b}/{self.toatoc}"
            attrs = input_f[band_name].attrs
            band_to_save *= attrs["SCALE"]
            band_to_save += attrs["OFFSET"]
            band_to_save = np.round(band_to_save).astype(np.int16)
            band_to_save[mask_band_2_save] = -1
            input_f[band_name][...] = band_to_save

`ProbaVRadiometry` ¶

Bases: ProbaV

A specialized ProbaV reader class focused on radiometry data.

This class extends the base ProbaV class to provide a simplified interface for working with radiometry bands from Proba-V products.

Parameters:

Name	Type	Description	Default
`hdf5_file`	`str`	Path to the HDF5 file containing the Proba-V product.	required
`window`	`Optional[Window]`	Optional window to focus on a specific region of the image. Defaults to None (entire image).	`None`
`level_name`	`str`	Processing level of the product. Defaults to "LEVEL2A".	`'LEVEL2A'`
`indexes`	`Optional[List[int]]`	Optional list of band indices to load. If None, all four bands (0=BLUE, 1=RED, 2=NIR, 3=SWIR) will be loaded. Defaults to None.	`None`

Attributes:

Name	Type	Description
`dims`	`Tuple[str]`	Names of the dimensions ("band", "y", "x").
`indexes`	`List[int]`	List of band indices to load.
`dtype`		Data type of the radiometry data.
`count`	`int`	Number of bands to be loaded.
`shape`	`Tuple[int, int, int]`	Shape of the data (bands, height, width).
`values`	`ndarray`	The radiometry data values.

Examples:

>>> # Initialize the ProbaVRadiometry reader with a data path
>>> probav_rad = ProbaVRadiometry('/path/to/probav_product.HDF5')
>>> # Load only RED and NIR bands
>>> probav_rad_rn = ProbaVRadiometry('/path/to/probav_product.HDF5', indexes=[1, 2])
>>> # Get the data as a GeoTensor
>>> geotensor_data = probav_rad.load()

Source code in georeader/readers/probav_image_operational.py

class ProbaVRadiometry(ProbaV):
    """
    A specialized ProbaV reader class focused on radiometry data.

    This class extends the base ProbaV class to provide a simplified interface
    for working with radiometry bands from Proba-V products.

    Args:
        hdf5_file (str): Path to the HDF5 file containing the Proba-V product.
        window (Optional[rasterio.windows.Window]): Optional window to focus on a specific
            region of the image. Defaults to None (entire image).
        level_name (str): Processing level of the product. Defaults to "LEVEL2A".
        indexes (Optional[List[int]]): Optional list of band indices to load. If None,
            all four bands (0=BLUE, 1=RED, 2=NIR, 3=SWIR) will be loaded. Defaults to None.

    Attributes:
        dims (Tuple[str]): Names of the dimensions ("band", "y", "x").
        indexes (List[int]): List of band indices to load.
        dtype: Data type of the radiometry data.
        count (int): Number of bands to be loaded.
        shape (Tuple[int, int, int]): Shape of the data (bands, height, width).
        values (np.ndarray): The radiometry data values.

    Examples:
        >>> # Initialize the ProbaVRadiometry reader with a data path
        >>> probav_rad = ProbaVRadiometry('/path/to/probav_product.HDF5')
        >>> # Load only RED and NIR bands
        >>> probav_rad_rn = ProbaVRadiometry('/path/to/probav_product.HDF5', indexes=[1, 2])
        >>> # Get the data as a GeoTensor
        >>> geotensor_data = probav_rad.load()
    """

    def __init__(
        self,
        hdf5_file: str,
        window: Optional[rasterio.windows.Window] = None,
        level_name: str = "LEVEL2A",
        indexes: Optional[List[int]] = None,
    ):
        super().__init__(hdf5_file=hdf5_file, window=window, level_name=level_name)
        self.dims = ("band", "y", "x")

        # let read only some bands?
        if indexes is None:
            self.indexes = [0, 1, 2, 3]
        else:
            self.indexes = indexes

        self.dtype = self.dtype_radiometry

        # Radiometry is returned as reflectance scaled by 1/2000 with -1 as the
        # invalid marker, matching load_radiometry()'s fill_value_default.
        self.fill_value_default = -1 / 2000.0

    @property
    def count(self):
        return len(self.indexes)

    def load(self, boundless: bool = True) -> geotensor.GeoTensor:
        return self.load_radiometry(boundless=boundless, indexes=self.indexes)

    @property
    def shape(self) -> Tuple:
        return self.count, self.window_focus.height, self.window_focus.width

    @property
    def width(self) -> int:
        return self.window_focus.width

    @property
    def height(self) -> int:
        return self.window_focus.height

    @property
    def values(self) -> np.ndarray:
        return self.load_radiometry(boundless=True, indexes=self.indexes).values

    def __copy__(self) -> "__class__":
        return ProbaVRadiometry(
            self.hdf5_file,
            window=self.window_focus,
            level_name=self.level_name,
            indexes=self.indexes,
        )

`ProbaVSM` ¶

Bases: ProbaV

A specialized ProbaV reader class focused on Status Map (SM) data.

This class extends the base ProbaV class to provide a simplified interface for working with the status map band from Proba-V products. The SM band contains information about the pixel quality, cloud status, etc.

Parameters:

Name	Type	Description	Default
`hdf5_file`	`str`	Path to the HDF5 file containing the Proba-V product.	required
`window`	`Optional[Window]`	Optional window to focus on a specific region of the image. Defaults to None (entire image).	`None`
`level_name`	`str`	Processing level of the product. Defaults to "LEVEL2A".	`'LEVEL2A'`

Attributes:

Name	Type	Description
`dims`	`Tuple[str]`	Names of the dimensions ("y", "x").
`dtype`		Data type of the SM data.
`shape`	`Tuple[int, int]`	Shape of the SM data (height, width).
`values`	`ndarray`	The SM data values.

Examples:

>>> # Initialize the ProbaVSM reader with a data path
>>> probav_sm = ProbaVSM('/path/to/probav_product.HDF5')
>>> # Get the SM data as a GeoTensor
>>> sm_data = probav_sm.load()
>>> # Extract cloud information
>>> cloud_mask = sm_cloud_mask(sm_data.values)

Source code in georeader/readers/probav_image_operational.py

class ProbaVSM(ProbaV):
    """
    A specialized ProbaV reader class focused on Status Map (SM) data.

    This class extends the base ProbaV class to provide a simplified interface
    for working with the status map band from Proba-V products. The SM band
    contains information about the pixel quality, cloud status, etc.

    Args:
        hdf5_file (str): Path to the HDF5 file containing the Proba-V product.
        window (Optional[rasterio.windows.Window]): Optional window to focus on a specific
            region of the image. Defaults to None (entire image).
        level_name (str): Processing level of the product. Defaults to "LEVEL2A".

    Attributes:
        dims (Tuple[str]): Names of the dimensions ("y", "x").
        dtype: Data type of the SM data.
        shape (Tuple[int, int]): Shape of the SM data (height, width).
        values (np.ndarray): The SM data values.

    Examples:
        >>> # Initialize the ProbaVSM reader with a data path
        >>> probav_sm = ProbaVSM('/path/to/probav_product.HDF5')
        >>> # Get the SM data as a GeoTensor
        >>> sm_data = probav_sm.load()
        >>> # Extract cloud information
        >>> cloud_mask = sm_cloud_mask(sm_data.values)
    """

    def __init__(
        self,
        hdf5_file: str,
        window: Optional[rasterio.windows.Window] = None,
        level_name: str = "LEVEL2A",
    ):
        super().__init__(hdf5_file=hdf5_file, window=window, level_name=level_name)
        self.dims = ("y", "x")
        self.dtype = self.dtype_sm

    def load(self, boundless: bool = True) -> geotensor.GeoTensor:
        return self.load_sm(boundless=boundless)

    @property
    def shape(self) -> Tuple:
        return self.window_focus.height, self.window_focus.width

    @property
    def width(self) -> int:
        return self.window_focus.width

    @property
    def height(self) -> int:
        return self.window_focus.height

    @property
    def values(self) -> np.ndarray:
        return self.load_sm(boundless=True).values

    def __copy__(self) -> "__class__":
        return ProbaVSM(
            self.hdf5_file, window=self.window_focus, level_name=self.level_name
        )

SPOT-VGT Reader¶

The SPOT-VGT reader provides functionality for reading SPOT-VGT products. Features include:

HDF4 file format support
Handling of radiometry and quality layers
Cloud and shadow mask extraction

Note: See the Proba-V tutorial for similar processing workflows as both sensors share similar data structures.

API Reference¶

SPOT VGT reader

Unofficial reader for SPOT VGT products. The reader is based on the user manual: https://docs.terrascope.be/DataProducts/SPOT-VGT/references/SPOT_VGT_PUM_v1.3.pdf

Authors: Dan Lopez-Puigdollers, Gonzalo Mateo-García

`SpotVGT` ¶

SPOT-VGT reader for handling SPOT Vegetation satellite products.

This class provides functionality to read and manipulate SPOT-VGT satellite imagery products. It handles the specific format and metadata of SPOT-VGT HDF4 files, supporting operations like loading radiometry data, masks, and cloud information.

Parameters:

Name	Type	Description	Default
`hdf4_file`	`str`	Path to the HDF4 file or directory containing the SPOT-VGT product.	required
`window`	`Optional[Window]`	Optional window to focus on a specific region of the image. Defaults to None (entire image).	`None`

Attributes:

Name	Type	Description
`hdf4_file`	`str`	Path to the HDF4 file.
`name`	`str`	Basename of the HDF4 file.
`satelliteID`	`str`	Satellite ID extracted from the filename.
`station`	`str`	Station code extracted from the filename.
`productID`	`str`	Product ID extracted from the filename.
`year,`	`month, day (str`	Date components extracted from the filename.
`segment`	`str`	Segment identifier extracted from the filename.
`version`	`str`	Product version extracted from the filename.
`files`	`List[str]`	List of files in the SPOT-VGT product.
`files_dict`	`Dict[str, str]`	Dictionary mapping band names to file paths.
`metadata`	`Dict[str, str]`	Metadata extracted from the LOG file.
`real_shape`	`Tuple[int, int]`	Shape of the full image (height, width).
`real_transform`	`Affine`	Affine transform for the full image.
`dtype_radiometry`		Data type for radiometry data (typically np.float32).
`window_focus`	`Window`	Current window focus.
`window_data`	`Window`	Window representing the full data extent.
`start_date`	`datetime`	Start acquisition date and time.
`end_date`	`datetime`	End acquisition date and time.
`crs`		Coordinate reference system.
`toatoc`	`str`	Indicator of whether data is TOA (top of atmosphere).
`res_name`	`str`	Resolution name identifier (e.g., '1KM').
`level_name`	`str`	Processing level identifier.

Examples:

>>> import rasterio.windows
>>> # Initialize the SpotVGT reader with a data path
>>> spot_reader = SpotVGT('/path/to/V2KRNP____20140321F146_V003')
>>> # Load radiometry data
>>> bands = spot_reader.load_radiometry()
>>> # Get cloud mask
>>> cloud_mask = spot_reader.load_sm_cloud_mask()
>>> # Focus on a specific window
>>> window = rasterio.windows.Window(col_off=100, row_off=100, width=200, height=200)
>>> spot_reader.set_window(window)

Source code in georeader/readers/spotvgt_image_operational.py

class SpotVGT:
    """
    SPOT-VGT reader for handling SPOT Vegetation satellite products.

    This class provides functionality to read and manipulate SPOT-VGT satellite imagery products.
    It handles the specific format and metadata of SPOT-VGT HDF4 files, supporting operations
    like loading radiometry data, masks, and cloud information.

    Args:
        hdf4_file (str): Path to the HDF4 file or directory containing the SPOT-VGT product.
        window (Optional[rasterio.windows.Window]): Optional window to focus on a specific 
            region of the image. Defaults to None (entire image).

    Attributes:
        hdf4_file (str): Path to the HDF4 file.
        name (str): Basename of the HDF4 file.
        satelliteID (str): Satellite ID extracted from the filename.
        station (str): Station code extracted from the filename.
        productID (str): Product ID extracted from the filename.
        year, month, day (str): Date components extracted from the filename.
        segment (str): Segment identifier extracted from the filename.
        version (str): Product version extracted from the filename.
        files (List[str]): List of files in the SPOT-VGT product.
        files_dict (Dict[str, str]): Dictionary mapping band names to file paths.
        metadata (Dict[str, str]): Metadata extracted from the LOG file.
        real_shape (Tuple[int, int]): Shape of the full image (height, width).
        real_transform (rasterio.Affine): Affine transform for the full image.
        dtype_radiometry: Data type for radiometry data (typically np.float32).
        window_focus (rasterio.windows.Window): Current window focus.
        window_data (rasterio.windows.Window): Window representing the full data extent.
        start_date (dt.datetime): Start acquisition date and time.
        end_date (dt.datetime): End acquisition date and time.
        crs: Coordinate reference system.
        toatoc (str): Indicator of whether data is TOA (top of atmosphere).
        res_name (str): Resolution name identifier (e.g., '1KM').
        level_name (str): Processing level identifier.

    Examples:
        >>> import rasterio.windows
        >>> # Initialize the SpotVGT reader with a data path
        >>> spot_reader = SpotVGT('/path/to/V2KRNP____20140321F146_V003')
        >>> # Load radiometry data
        >>> bands = spot_reader.load_radiometry()
        >>> # Get cloud mask
        >>> cloud_mask = spot_reader.load_sm_cloud_mask()
        >>> # Focus on a specific window
        >>> window = rasterio.windows.Window(col_off=100, row_off=100, width=200, height=200)
        >>> spot_reader.set_window(window)
    """
    def __init__(self, hdf4_file: str, window: Optional[rasterio.windows.Window] = None):
        self.hdf4_file = hdf4_file
        self.name = os.path.basename(self.hdf4_file)
        matches = re.match(r'V(\d{1})(\w{3})(\w{1})____(\d{4})(\d{2})(\d{2})F(\w{3})_V(\d{3})', self.name)
        if matches is not None:
            (self.satelliteID, self.station, self.productID, self.year,
             self.month, self.day, self.segment, self.version) = matches.groups()
        else:
            raise FileNotFoundError("SPOT-VGT product not recognized %s" % self.hdf4_file)

        try:
            self.files = sorted([f for f in glob(os.path.join(self.hdf4_file, '*'))])
            self.files_dict = {re.match(r'V\d{12}_(\w+)',
                                        os.path.basename(self.files[i])).groups()[0]: self.files[i]
                               for i in range(len(self.files))}

            with open(self.files_dict['LOG'], "r") as f:
                self.metadata = {re.split(r'\s+', y)[0]: re.split(r'\s+', y)[1] for y in [x for x in f]}

            self.real_shape = (
                int(self.metadata["IMAGE_LOWER_RIGHT_ROW"]) - int(self.metadata["IMAGE_UPPER_LEFT_ROW"]) - 1,
                int(self.metadata["IMAGE_LOWER_RIGHT_COL"]) - int(self.metadata["IMAGE_UPPER_LEFT_COL"]) - 1)

            bbox = [
                float(self.metadata['CARTO_LOWER_LEFT_X']),
                float(self.metadata['CARTO_LOWER_LEFT_Y']),
                float(self.metadata['CARTO_UPPER_RIGHT_X']),
                float(self.metadata['CARTO_UPPER_RIGHT_Y'])
            ]
            self.real_transform = rasterio.transform.from_bounds(*bbox, width=self.real_shape[1],
                                                                 height=self.real_shape[0])

            self.dtype_radiometry = np.float32

        except OSError as e:
            raise FileNotFoundError("Error reading product %s" % self.hdf4_file)

        if window is None:
            self.window_focus = rasterio.windows.Window(row_off=0, col_off=0,
                                                        width=self.real_shape[1],
                                                        height=self.real_shape[0])
        else:
            self.window_focus = rasterio.windows.Window(row_off=0, col_off=0,
                                                        width=self.real_shape[1],
                                                        height=self.real_shape[0])

        self.window_data = rasterio.windows.Window(row_off=0, col_off=0,
                                                   width=self.real_shape[1],
                                                   height=self.real_shape[0])

        year, month, day = re.match(r'(\d{4})(\d{2})(\d{2})', self.metadata['SEGM_FIRST_DATE']).groups()
        hh, mm, ss = re.match(r'(\d{2})(\d{2})(\d{2})', self.metadata['SEGM_FIRST_TIME']).groups()

        self.start_date = dt.datetime(day=int(day), month=int(month), year=int(year),
                                      hour=int(hh), minute=int(mm), second=int(ss), tzinfo=dt.timezone.utc)

        year, month, day = re.match(r'(\d{4})(\d{2})(\d{2})', self.metadata['SEGM_LAST_DATE']).groups()
        hh, mm, ss = re.match(r'(\d{2})(\d{2})(\d{2})', self.metadata['SEGM_LAST_TIME']).groups()

        self.end_date = dt.datetime(day=int(day), month=int(month), year=int(year),
                                    hour=int(hh), minute=int(mm), second=int(ss), tzinfo=dt.timezone.utc)

        # self.map_projection_wkt

        self.toatoc = "TOA"

        self.res_name = '1KM'

        # SPOT-VGT images are lat/long
        self.crs = rasterio.crs.CRS({'init': 'epsg:4326'})

        # SPOT-VGT images have four bands
        self.level_name = "LEVEL2A"

    def _get_window_pad(self, boundless: bool = True) -> Tuple[rasterio.windows.Window, Optional[List]]:
        window_read = rasterio.windows.intersection(self.window_focus, self.window_data)

        if boundless:
            _, pad_width = window_utils.get_slice_pad(self.window_data, self.window_focus)
            need_pad = any(p != 0 for p in pad_width["x"] + pad_width["y"])
            if need_pad:
                pad_list_np = []
                for k in ["y", "x"]:
                    if k in pad_width:
                        pad_list_np.append(pad_width[k])
                    else:
                        pad_list_np.append((0, 0))
            else:
                pad_list_np = None
        else:
            pad_list_np = None

        return window_read, pad_list_np

    def footprint(self, crs:Optional[str]=None) -> Polygon:
        # TODO load footprint from metadata?
        pol = window_utils.window_polygon(self.window_focus, self.transform)
        if (crs is None) or window_utils.compare_crs(self.crs, crs):
            return pol

        return window_utils.polygon_to_crs(pol, self.crs, crs)

    def valid_footprint(self, crs:Optional[str]=None) -> Polygon:
        valids = self.load_mask()
        return valids.valid_footprint(crs=crs)        

    def _load_bands(self, bands_names: Union[List[str], str], boundless: bool = True,
                    fill_value_default: Number = 0) -> geotensor.GeoTensor:
        window_read, pad_list_np = self._get_window_pad(boundless=boundless)
        slice_ = window_read.toslices()
        if isinstance(bands_names, str):
            bands_names = [bands_names]
            flatten = True
        else:
            flatten = False

        hdf_objs = {b: SD(self.files_dict[b], SDC.READ) for b in bands_names}
        # Read dataset
        # shapes = [hdf_objs[b].datasets()["PIXEL_DATA"][1] for b in bands_names]
        # data = [hdf_objs[b].select("PIXEL_DATA")[slice_] for b in bands_names]

        bands_arrs = []
        # Original slice int32 gives an error. Cast to int
        for band in bands_names:
            data = read_band_toa(hdf_objs, band, (slice(int(slice_[0].start), int(slice_[0].stop), None),
                                                  slice(int(slice_[1].start), int(slice_[1].stop), None)))
            if pad_list_np:
                data = np.pad(data, tuple(pad_list_np), mode="constant", constant_values=fill_value_default)

            bands_arrs.append(data)

        if boundless:
            transform = self.transform
        else:
            transform = rasterio.windows.transform(window_read, self.real_transform)

        if flatten:
            img = bands_arrs[0]
        else:
            img = np.stack(bands_arrs, axis=0)

        return geotensor.GeoTensor(img, transform=transform, crs=self.crs,
                                   fill_value_default=fill_value_default)

    def load_radiometry(self, indexes: Optional[List[int]] = None, boundless: bool = True) -> geotensor.GeoTensor:
        if indexes is None:
            indexes = (0, 1, 2, 3)
        # bands_names = [f"{self.level_name}/RADIOMETRY/{BAND_NAMES[i]}/{self.toatoc}" for i in indexes]
        bands_names = [BANDS_DICT[i] for i in indexes]
        return self._load_bands(bands_names, boundless=boundless, fill_value_default=0)

    def load_sm(self, boundless: bool = True) -> geotensor.GeoTensor:
        """
        Reference of values in `SM` flags.

        From [user manual](https://docs.terrascope.be/DataProducts/SPOT-VGT/references/SPOT_VGT_PUM_v1.3.pdf) pag 46
        * Clear  ->    000
        * Shadow ->    001
        * Undefined -> 010
        * Cloud  ->    011
        * Ice    ->    100
        * `2**3` sea/land
        * `2**4` quality swir (0 bad 1 good)
        * `2**5` quality nir
        * `2**6` quality red
        * `2**7` quality blue
        """
        return self._load_bands('SM', boundless=boundless, fill_value_default=0)

    def load_mask(self, boundless: bool = True) -> geotensor.GeoTensor:
        """
        Returns the valid mask (False if the pixel is out of swath or is invalid). This function loads the SM band

        Args:
            boundless (bool, optional): boundless option to load the SM band. Defaults to True.

        Returns:
            geotensor.GeoTensor: mask with the same shape as the image
        """

        sm = self.load_sm(boundless=boundless)
        invalids = mask_only_sm(sm.values)
        # ~invalids is boolean while sm is integer; build a new GeoTensor rather
        # than assigning into .values (which forbids dtype changes in place).
        valids = geotensor.GeoTensor(~invalids, transform=sm.transform, crs=sm.crs,
                                     fill_value_default=False)

        return valids

    def load_sm_cloud_mask(self, mask_undefined:bool=False, boundless:bool=True) -> geotensor.GeoTensor:
        sm = self.load_sm(boundless=boundless)
        cloud_mask = sm_cloud_mask(sm.values, mask_undefined=mask_undefined)
        cloud_mask+=1
        invalids = mask_only_sm(sm.values)

        cloud_mask[invalids] = 0
        return geotensor.GeoTensor(cloud_mask, transform=self.transform, crs=self.crs, fill_value_default=0)

    @property
    def transform(self) -> Affine:
        return rasterio.windows.transform(self.window_focus, self.real_transform)

    @property
    def res(self) -> Tuple[float, float]:
        return window_utils.res(self.transform)

    @property
    def height(self) -> int:
        return self.window_focus.height

    @property
    def width(self) -> int:
        return self.window_focus.width

    @property
    def bounds(self) -> Tuple[float, float, float, float]:
        return window_utils.window_bounds(self.window_focus, self.real_transform)

    def set_window(self, window:rasterio.windows.Window, relative: bool = True, boundless: bool = True):
        if relative:
            self.window_focus = rasterio.windows.Window(col_off=window.col_off + self.window_focus.col_off,
                                                        row_off=window.row_off + self.window_focus.row_off,
                                                        height=window.height, width=window.width)
        else:
            self.window_focus = window

        if not boundless:
            self.window_focus = rasterio.windows.intersection(self.window_data, self.window_focus)

    def __copy__(self) -> '__class__':
        return SpotVGT(self.hdf4_file, window=self.window_focus)

    def read_from_window(self, window: Optional[rasterio.windows.Window] = None, boundless: bool = True) -> '__class__':
        copy = self.__copy__()
        copy.set_window(window=window, boundless=boundless)

        return copy

    def __repr__(self) -> str:
        return f""" 
         File: {self.hdf4_file}
         Transform: {self.transform}
         Shape: {self.height}, {self.width}
         Resolution: {self.res}
         Bounds: {self.bounds}
         CRS: {self.crs}
         Level: {self.level_name}
         TOA/TOC: {self.toatoc}
         Resolution name : {self.res_name}
        """

`load_mask(boundless=True)` ¶

Returns the valid mask (False if the pixel is out of swath or is invalid). This function loads the SM band

Parameters:

Name	Type	Description	Default
`boundless`	`bool`	boundless option to load the SM band. Defaults to True.	`True`

Returns:

Type	Description
`GeoTensor`	geotensor.GeoTensor: mask with the same shape as the image

Source code in georeader/readers/spotvgt_image_operational.py

def load_mask(self, boundless: bool = True) -> geotensor.GeoTensor:
    """
    Returns the valid mask (False if the pixel is out of swath or is invalid). This function loads the SM band

    Args:
        boundless (bool, optional): boundless option to load the SM band. Defaults to True.

    Returns:
        geotensor.GeoTensor: mask with the same shape as the image
    """

    sm = self.load_sm(boundless=boundless)
    invalids = mask_only_sm(sm.values)
    # ~invalids is boolean while sm is integer; build a new GeoTensor rather
    # than assigning into .values (which forbids dtype changes in place).
    valids = geotensor.GeoTensor(~invalids, transform=sm.transform, crs=sm.crs,
                                 fill_value_default=False)

    return valids

`load_sm(boundless=True)` ¶

Reference of values in SM flags.

From user manual pag 46 * Clear -> 000 * Shadow -> 001 * Undefined -> 010 * Cloud -> 011 * Ice -> 100 * 2**3 sea/land * 2**4 quality swir (0 bad 1 good) * 2**5 quality nir * 2**6 quality red * 2**7 quality blue

Source code in georeader/readers/spotvgt_image_operational.py

def load_sm(self, boundless: bool = True) -> geotensor.GeoTensor:
    """
    Reference of values in `SM` flags.

    From [user manual](https://docs.terrascope.be/DataProducts/SPOT-VGT/references/SPOT_VGT_PUM_v1.3.pdf) pag 46
    * Clear  ->    000
    * Shadow ->    001
    * Undefined -> 010
    * Cloud  ->    011
    * Ice    ->    100
    * `2**3` sea/land
    * `2**4` quality swir (0 bad 1 good)
    * `2**5` quality nir
    * `2**6` quality red
    * `2**7` quality blue
    """
    return self._load_bands('SM', boundless=boundless, fill_value_default=0)

PRISMA Reader¶

The PRISMA reader handles data from the Italian Space Agency's hyperspectral mission, specifically working with Level 1B radiance data (not atmospherically corrected). PRISMA provides hyperspectral imaging in the 400-2500 nm spectral range, with a spectral resolution of ~12 nm.

Key features:

Reading L1B hyperspectral radiance data from HDF5 format files
Handling separate VNIR (400-1000 nm) and SWIR (1000-2500 nm) spectral ranges
Georeferencing functionality for non-orthorectified data using provided latitude/longitude coordinates
On-demand conversion from radiance (mW/m²/sr/nm) to top-of-atmosphere reflectance
Spectral response function integration for accurate band simulation
Extraction of RGB previews from specific wavelengths
Access to satellite and solar geometry information for radiometric calculations

Tutorial examples:

API Reference¶

Module to read PRISMA (PRecursore IperSpettrale della Missione Applicativa) hyperspectral images.

PRISMA is an Italian Space Agency (ASI) Earth observation satellite launched in 2019, carrying a hyperspectral imaging spectrometer that captures data in 239 spectral bands from 400 to 2500 nm with a 30m spatial resolution.

Data Format Overview¶

PRISMA data is distributed in HDF5 format (HE5 extension) with a specific structure:

PRISMA HDF5 File Structure:
┌─────────────────────────────────────────────────────────┐
│  /HDFEOS/SWATHS/PRS_L1_HCO/                             │
│  ├── Data Fields/                                        │
│  │   ├── VNIR_Cube: (bands, crosstrack, downtrack)      │
│  │   │   └── 400-1010 nm, ~66 bands                     │
│  │   └── SWIR_Cube: (bands, crosstrack, downtrack)      │
│  │       └── 920-2500 nm, ~173 bands                    │
│  ├── Geolocation Fields/                                 │
│  │   ├── Latitude_SWIR, Longitude_SWIR                  │
│  │   └── Latitude_VNIR, Longitude_VNIR                  │
│  └── Attributes (solar/view angles, timing, etc.)       │
│                                                          │
│  /KDP_AUX/                                               │
│  ├── Cw_Vnir_Matrix, Cw_Swir_Matrix (wavelengths)       │
│  └── Fwhm_Vnir_Matrix, Fwhm_Swir_Matrix                 │
└─────────────────────────────────────────────────────────┘

Unlike EMIT, PRISMA data is NOT orthorectified. The geolocation arrays provide lat/lon coordinates for each pixel, requiring gridding for visualization.

Dual-Sensor Configuration¶

PRISMA uses two separate sensors for VNIR and SWIR:

VNIR Sensor                          SWIR Sensor
┌────────────────────┐               ┌────────────────────┐
│ 400 - 1010 nm      │               │ 920 - 2500 nm      │
│ ~66 bands          │               │ ~173 bands         │
│ ~10 nm sampling    │               │ ~10 nm sampling    │
│                    │               │                    │
│ Shared 30m GSD     │               │ Shared 30m GSD     │
└────────────────────┘               └────────────────────┘
          │                                    │
          └──────────── Overlap ───────────────┘
                     920-1010 nm

The VNIR and SWIR sensors have overlapping wavelength coverage in the 920-1010 nm region, which can be used for cross-calibration.

Radiometric Units¶

L1 Radiance: mW/(m²·sr·nm) - milliwatts per square meter per steradian per nanometer (equivalent to W/(m²·sr·μm))
Scale factors and offsets are applied during loading to convert from DN to radiance

Spectral Characteristics¶

Total bands: ~239 (66 VNIR + 173 SWIR, minus flagged bands)
Spectral sampling: ~10 nm (varies slightly)
FWHM: ~10-12 nm
SNR: >200 for VNIR, >100 for SWIR

Examples¶

Basic usage::

from georeader.readers.prisma import PRISMA

# Load PRISMA image
prisma = PRISMA('/path/to/PRS_L1_STD_*.he5')

# Load specific wavelengths as reflectance
bands = prisma.load_wavelengths([850, 1600, 2200], as_reflectance=True)

# Load RGB composite
rgb = prisma.load_rgb(as_reflectance=True)

# Get georeferenced output (reprojected to UTM)
rgb_geo = prisma.load_rgb(as_reflectance=True, raw=False)

References¶

ASI PRISMA Mission: https://www.asi.it/en/earth-science/prisma/
PRISMA User Guide: https://prisma.asi.it/

`PRISMA` ¶

Reader for PRISMA (PRecursore IperSpettrale della Missione Applicativa) hyperspectral images.

This class provides comprehensive functionality to read and manipulate PRISMA satellite imagery products from the Italian Space Agency (ASI). It handles the dual-sensor (VNIR + SWIR) data format, supporting operations like:

Loading radiance or reflectance data at specific wavelengths
Automatic handling of VNIR/SWIR sensor selection based on wavelength
Converting radiance to reflectance using solar irradiance
Georeferencing raw data to projected coordinate systems

PRISMA Data Model¶

PRISMA stores data in sensor coordinates with separate lat/lon arrays for geolocation. Unlike EMIT's GLT approach, PRISMA requires gridding/interpolation for orthorectification:

Sensor Grid (raw)                  Geographic Grid (output)
┌─────────────────────┐            ┌─────────────────────┐
│ pushbroom scan      │            │ regular grid        │
│ ┌───┬───┬───┬───┐  │  gridding  │ ┌───┬───┬───┬───┐  │
│ │ a │ b │ c │ d │  │  ───────→  │ │ a'│ b'│ c'│ d'│  │
│ ├───┼───┼───┼───┤  │            │ ├───┼───┼───┼───┤  │
│ │ e │ f │ g │ h │  │            │ │ e'│ f'│ g'│ h'│  │
│ └───┴───┴───┴───┘  │            │ └───┴───┴───┴───┘  │
│ + lat/lon per pixel│            │ + affine transform  │
└─────────────────────┘            └─────────────────────┘

Raw methods (raw=True) return sensor coordinates; georeferenced methods (raw=False) apply gridding to regular geographic coordinates.

Dual Sensor Architecture¶

PRISMA has separate VNIR and SWIR sensors with overlapping coverage:

Wavelength Range:
├──────────────────────────────────────────────────────────────┤
400nm              1000nm                                 2500nm
├───────── VNIR ──────────┤
                  ├────────────────── SWIR ───────────────────┤
                  └─ overlap ─┘
                  920-1010nm

The class automatically selects the appropriate sensor based on requested wavelengths.

Attributes¶

filename : str Path to the PRISMA HE5 file. lats : np.ndarray Latitude values (H, W) for each pixel in sensor coordinates. lons : np.ndarray Longitude values (H, W) for each pixel in sensor coordinates. attributes_prisma : Dict Dictionary of PRISMA metadata attributes from HDF5 root. nbands_vnir : int Number of valid VNIR bands (excluding flagged bands). vnir_range : Tuple[float, float] Wavelength range (min, max) of VNIR sensor in nm. nbands_swir : int Number of valid SWIR bands (excluding flagged bands). swir_range : Tuple[float, float] Wavelength range (min, max) of SWIR sensor in nm. time_coverage_start : datetime UTC datetime of acquisition start. time_coverage_end : datetime UTC datetime of acquisition end. units : str Radiance units: 'mW/m2/sr/nm'. sza_swir : float Solar zenith angle (degrees) for SWIR sensor. sza_vnir : float Solar zenith angle (degrees) for VNIR sensor. vza_swir : float View zenith angle (degrees) for SWIR sensor. vza_vnir : float View zenith angle (degrees) for VNIR sensor.

Lazy-Loaded Attributes¶

ltoa_swir : np.ndarray SWIR radiance data (H, W, B), loaded by load_raw(swir_flag=True). ltoa_vnir : np.ndarray VNIR radiance data (H, W, B), loaded by load_raw(swir_flag=False). wavelength_swir : np.ndarray SWIR wavelengths (H, B) - varies slightly across track. wavelength_vnir : np.ndarray VNIR wavelengths (H, B) - varies slightly across track. fwhm_swir : np.ndarray SWIR FWHM values (H, B) - varies slightly across track. fwhm_vnir : np.ndarray VNIR FWHM values (H, B) - varies slightly across track.

Examples¶

Basic loading::

>>> from georeader.readers.prisma import PRISMA
>>> 
>>> prisma = PRISMA('/path/to/PRS_L1_STD_*.he5')
>>> print(prisma)  # View metadata summary
>>> print(f"VNIR: {prisma.vnir_range}, SWIR: {prisma.swir_range}")

Loading specific wavelengths::

>>> # Load NDVI bands (Red at 665nm, NIR at 865nm)
>>> bands = prisma.load_wavelengths([665, 865], as_reflectance=True)
>>> print(bands.shape)  # (2, H, W) in sensor coordinates
>>> 
>>> # Load and georeference to UTM
>>> bands_geo = prisma.load_wavelengths([665, 865], as_reflectance=True, 
...                                       raw=False, resolution_dst=30)
>>> print(type(bands_geo))  # GeoTensor with transform and CRS

Loading RGB composite::

>>> # Raw sensor coordinates
>>> rgb_raw = prisma.load_rgb(as_reflectance=True, raw=True)
>>> 
>>> # Georeferenced output  
>>> rgb_geo = prisma.load_rgb(as_reflectance=True, raw=False)
>>> plt.imshow(np.clip(rgb_geo.values.transpose(1,2,0), 0, 0.3) / 0.3)

Working with raw data::

>>> # Load all SWIR bands
>>> prisma.load_raw(swir_flag=True)
>>> print(prisma.ltoa_swir.shape)  # (H, W, ~173)
>>> print(prisma.wavelength_swir.shape)  # (H, ~173) - wavelengths vary across track
>>> 
>>> # Load all VNIR bands
>>> prisma.load_raw(swir_flag=False)
>>> print(prisma.ltoa_vnir.shape)  # (H, W, ~66)

References¶

ASI PRISMA Mission: https://www.asi.it/en/earth-science/prisma/
PRISMA Data Products: https://prisma.asi.it/

Source code in georeader/readers/prisma.py

class PRISMA:
    """
    Reader for PRISMA (PRecursore IperSpettrale della Missione Applicativa) hyperspectral images.

    This class provides comprehensive functionality to read and manipulate PRISMA satellite 
    imagery products from the Italian Space Agency (ASI). It handles the dual-sensor
    (VNIR + SWIR) data format, supporting operations like:

    - Loading radiance or reflectance data at specific wavelengths
    - Automatic handling of VNIR/SWIR sensor selection based on wavelength
    - Converting radiance to reflectance using solar irradiance
    - Georeferencing raw data to projected coordinate systems

    PRISMA Data Model
    -----------------
    PRISMA stores data in sensor coordinates with separate lat/lon arrays for geolocation.
    Unlike EMIT's GLT approach, PRISMA requires gridding/interpolation for orthorectification:

        Sensor Grid (raw)                  Geographic Grid (output)
        ┌─────────────────────┐            ┌─────────────────────┐
        │ pushbroom scan      │            │ regular grid        │
        │ ┌───┬───┬───┬───┐  │  gridding  │ ┌───┬───┬───┬───┐  │
        │ │ a │ b │ c │ d │  │  ───────→  │ │ a'│ b'│ c'│ d'│  │
        │ ├───┼───┼───┼───┤  │            │ ├───┼───┼───┼───┤  │
        │ │ e │ f │ g │ h │  │            │ │ e'│ f'│ g'│ h'│  │
        │ └───┴───┴───┴───┘  │            │ └───┴───┴───┴───┘  │
        │ + lat/lon per pixel│            │ + affine transform  │
        └─────────────────────┘            └─────────────────────┘

    Raw methods (raw=True) return sensor coordinates; georeferenced methods
    (raw=False) apply gridding to regular geographic coordinates.

    Dual Sensor Architecture
    ------------------------
    PRISMA has separate VNIR and SWIR sensors with overlapping coverage:

        Wavelength Range:
        ├──────────────────────────────────────────────────────────────┤
        400nm              1000nm                                 2500nm
        ├───────── VNIR ──────────┤
                          ├────────────────── SWIR ───────────────────┤
                          └─ overlap ─┘
                          920-1010nm

    The class automatically selects the appropriate sensor based on requested wavelengths.

    Attributes
    ----------
    filename : str
        Path to the PRISMA HE5 file.
    lats : np.ndarray
        Latitude values (H, W) for each pixel in sensor coordinates.
    lons : np.ndarray
        Longitude values (H, W) for each pixel in sensor coordinates.
    attributes_prisma : Dict
        Dictionary of PRISMA metadata attributes from HDF5 root.
    nbands_vnir : int
        Number of valid VNIR bands (excluding flagged bands).
    vnir_range : Tuple[float, float]
        Wavelength range (min, max) of VNIR sensor in nm.
    nbands_swir : int
        Number of valid SWIR bands (excluding flagged bands).
    swir_range : Tuple[float, float]
        Wavelength range (min, max) of SWIR sensor in nm.
    time_coverage_start : datetime
        UTC datetime of acquisition start.
    time_coverage_end : datetime
        UTC datetime of acquisition end.
    units : str
        Radiance units: 'mW/m2/sr/nm'.
    sza_swir : float
        Solar zenith angle (degrees) for SWIR sensor.
    sza_vnir : float
        Solar zenith angle (degrees) for VNIR sensor.
    vza_swir : float
        View zenith angle (degrees) for SWIR sensor.
    vza_vnir : float
        View zenith angle (degrees) for VNIR sensor.

    Lazy-Loaded Attributes
    ----------------------
    ltoa_swir : np.ndarray
        SWIR radiance data (H, W, B), loaded by `load_raw(swir_flag=True)`.
    ltoa_vnir : np.ndarray
        VNIR radiance data (H, W, B), loaded by `load_raw(swir_flag=False)`.
    wavelength_swir : np.ndarray
        SWIR wavelengths (H, B) - varies slightly across track.
    wavelength_vnir : np.ndarray
        VNIR wavelengths (H, B) - varies slightly across track.
    fwhm_swir : np.ndarray
        SWIR FWHM values (H, B) - varies slightly across track.
    fwhm_vnir : np.ndarray
        VNIR FWHM values (H, B) - varies slightly across track.

    Examples
    --------
    Basic loading::

        >>> from georeader.readers.prisma import PRISMA
        >>> 
        >>> prisma = PRISMA('/path/to/PRS_L1_STD_*.he5')
        >>> print(prisma)  # View metadata summary
        >>> print(f"VNIR: {prisma.vnir_range}, SWIR: {prisma.swir_range}")

    Loading specific wavelengths::

        >>> # Load NDVI bands (Red at 665nm, NIR at 865nm)
        >>> bands = prisma.load_wavelengths([665, 865], as_reflectance=True)
        >>> print(bands.shape)  # (2, H, W) in sensor coordinates
        >>> 
        >>> # Load and georeference to UTM
        >>> bands_geo = prisma.load_wavelengths([665, 865], as_reflectance=True, 
        ...                                       raw=False, resolution_dst=30)
        >>> print(type(bands_geo))  # GeoTensor with transform and CRS

    Loading RGB composite::

        >>> # Raw sensor coordinates
        >>> rgb_raw = prisma.load_rgb(as_reflectance=True, raw=True)
        >>> 
        >>> # Georeferenced output  
        >>> rgb_geo = prisma.load_rgb(as_reflectance=True, raw=False)
        >>> plt.imshow(np.clip(rgb_geo.values.transpose(1,2,0), 0, 0.3) / 0.3)

    Working with raw data::

        >>> # Load all SWIR bands
        >>> prisma.load_raw(swir_flag=True)
        >>> print(prisma.ltoa_swir.shape)  # (H, W, ~173)
        >>> print(prisma.wavelength_swir.shape)  # (H, ~173) - wavelengths vary across track
        >>> 
        >>> # Load all VNIR bands
        >>> prisma.load_raw(swir_flag=False)
        >>> print(prisma.ltoa_vnir.shape)  # (H, W, ~66)

    See Also
    --------
    georeader.readers.emit.EMITImage : EMIT hyperspectral reader
    georeader.readers.enmap.EnMAP : EnMAP hyperspectral reader
    georeader.griddata : Gridding utilities for non-orthorectified data
    georeader.reflectance : Radiometric conversion utilities

    References
    ----------
    - ASI PRISMA Mission: https://www.asi.it/en/earth-science/prisma/
    - PRISMA Data Products: https://prisma.asi.it/
    """

    def __init__(self, filename: str) -> None:
        if not os.path.exists(filename):
            raise FileNotFoundError(f"File {filename} not found")
        self.filename = filename
        self.swir_cube_dat = SWIR_FLAG["swir_cube_dat"][True]
        self.vni_cube_dat = SWIR_FLAG["swir_cube_dat"][False]

        with h5py.File(filename, mode="r") as f:
            dset = f[HE5_COORDS["swir_lat"]]
            self.lats = np.flip(dset[:, :], axis=0)
            dset = f[HE5_COORDS["swir_lon"]]
            self.lons = np.flip(dset[:, :], axis=0)
            self.attributes_prisma = dict(f.attrs)
            sza = f.attrs["Sun_zenith_angle"]

        arr = self.attributes_prisma["List_Cw_Vnir"][
            self.attributes_prisma["List_Cw_Vnir"] > 0
        ]
        self.nbands_vnir = len(arr)
        self.vnir_range = arr.min(), arr.max()
        arr = self.attributes_prisma["List_Cw_Swir"][
            self.attributes_prisma["List_Cw_Swir"] > 0
        ]
        self.swir_range = arr.min(), arr.max()
        self.nbands_swir = len(arr)

        self.ltoa_swir: Optional[NDArray] = None
        self.ltoa_vnir: Optional[NDArray] = None
        self.wavelength_swir: Optional[NDArray] = None
        self.fwhm_swir: Optional[NDArray] = None
        self.wavelength_vnir: Optional[NDArray] = None
        self.fwhm_vnir: Optional[NDArray] = None
        self.vza_swir: float = 0
        self.vza_vnir: float = 0
        self.sza_swir: float = sza
        self.sza_vnir: float = sza

        # self.time_coverage_start = self.attributes_prisma['Product_StartTime']
        self.time_coverage_start = datetime.fromisoformat(
            self.attributes_prisma["Product_StartTime"].decode("utf-8")
        ).replace(tzinfo=timezone.utc)
        self.time_coverage_end = datetime.fromisoformat(
            self.attributes_prisma["Product_StopTime"].decode("utf-8")
        ).replace(tzinfo=timezone.utc)
        self.units = "mW/m2/sr/nm"  # same as W/m^2/SR/um

        self._footprint = griddata.footprint(self.lons, self.lats)
        self._observation_date_correction_factor: Optional[float] = None

    def footprint(self, crs: Optional[str] = None) -> GeoTensor:
        if (crs is None) or compare_crs("EPSG:4326", crs):
            return self._footprint

        return window_utils.polygon_to_crs(
            self._footprint, crs_polygon="EPSG:4326", crs_dst=crs
        )

    @property
    def observation_date_correction_factor(self) -> float:
        if self._observation_date_correction_factor is None:
            self._observation_date_correction_factor = (
                reflectance.observation_date_correction_factor(
                    date_of_acquisition=self.time_coverage_start,
                    center_coords=self.footprint("EPSG:4326").centroid.coords[0],
                )
            )
        return self._observation_date_correction_factor

    @property
    def bounds(self) -> Tuple[float, float, float, float]:
        return self._footprint.bounds

    def load_raw(self, swir_flag: bool) -> NDArray:
        """
        Load the all the data from all the wavelengths for the VNIR or SWIR range.
        This function caches the data, wavelegths and FWHM in the attributes of the class:
            * `ltoa_swir`, `wavelength_swir`, `fwhm_swir`, `vza_swir`, `sza_swir` if `swir_flag` is True
            * `ltoa_vnir`, `wavelength_vnir`, `fwhm_vnir`, `vza_vnir`, `sza_vnir` if `swir_flag` is False

        Args:
            swir_flag (bool): if True it will load the SWIR range, otherwise it will load the VNIR range

        Returns:
            NDArray: 3D array with the reflectance values (H, W, B)
                where N and M are the dimensions of the image and B is the number of bands.
        """

        if swir_flag:
            if all(
                x is not None
                for x in [
                    self.ltoa_swir,
                    self.wavelength_swir,
                    self.fwhm_swir,
                    self.vza_swir,
                    self.sza_swir,
                ]
            ):
                return self.ltoa_swir
        else:
            if all(
                x is not None
                for x in [
                    self.ltoa_vnir,
                    self.wavelength_vnir,
                    self.fwhm_vnir,
                    self.vza_vnir,
                    self.sza_vnir,
                ]
            ):
                return self.ltoa_vnir

        swir_cube_dat = SWIR_FLAG["swir_cube_dat"][swir_flag]
        swir_lab = SWIR_FLAG["swir_lab"][swir_flag]  # True: "Swir", False: "Vnir"

        with h5py.File(self.filename, "r") as f:
            dset = f[swir_cube_dat]

            ltoa_img = np.flip(np.transpose(dset[:, :, :], axes=[0, 2, 1]), axis=0)

            dset = f["/KDP_AUX/Cw_" + swir_lab + "_Matrix"]
            wvl_mat_ini = dset[:, :]

            dset = f["/KDP_AUX/Fwhm_" + swir_lab + "_Matrix"]
            fwhm_mat_ini = dset[:, :]

            wvl_cntr = f.attrs["List_Cw_" + swir_lab]
            wvl_flag = f.attrs["List_Cw_" + swir_lab + "_Flags"]

            sc_fac = f.attrs["ScaleFactor_" + swir_lab]

            of_fac = f.attrs["Offset_" + swir_lab]

            vza = 0.0
            sza = f.attrs["Sun_zenith_angle"]

            ltoa_img = ltoa_img / sc_fac - of_fac

        # Lambda
        wvl_mat_ini = np.flip(wvl_mat_ini, axis=1)
        li_no0 = np.where(wvl_mat_ini[100, :] > 0)[0]
        wvl_mat = np.copy(wvl_mat_ini[:, li_no0])
        wl_center_ini = np.mean(wvl_mat, axis=0)

        # FWHM
        fwhm_mat_ini = np.flip(fwhm_mat_ini, axis=1)
        fwhm_mat = np.copy(fwhm_mat_ini[:, li_no0])

        M, N, B_tot = ltoa_img.shape

        if swir_flag:
            if B_tot == len(wl_center_ini):
                ltoa_img = np.flip(ltoa_img, axis=2)
            else:
                # ltoa_img = np.flip(ltoa_img[:, :, :-2], axis=2)
                non0_bands = np.where(wvl_flag == 1)[0]
                ltoa_img = np.flip(ltoa_img[:, :, non0_bands], axis=2)

        else:
            if B_tot == len(wl_center_ini):
                ltoa_img = np.flip(ltoa_img, axis=2)
            else:
                # ltoa_img = np.flip(ltoa_img[:, :, 3:], axis=2)  # Revisar esto(not sure)
                non0_bands = np.where(wvl_flag == 1)[0]
                ltoa_img = np.flip(ltoa_img[:, :, non0_bands], axis=2)

        ltoa_img = np.transpose(ltoa_img, (1, 0, 2))
        if swir_flag:
            self.ltoa_swir = ltoa_img
            self.wavelength_swir = wvl_mat
            self.fwhm_swir = fwhm_mat
            self.vza_swir = vza
            self.sza_swir = sza
        else:
            self.ltoa_vnir = ltoa_img
            self.wavelength_vnir = wvl_mat
            self.fwhm_vnir = fwhm_mat
            self.vza_vnir = vza
            self.sza_vnir = sza

        return ltoa_img

    def load_wavelengths(
        self,
        wavelengths: Union[float, List[float], NDArray],
        as_reflectance: bool = True,
        raw: bool = True,
        resolution_dst=30,
        dst_crs: Optional[Any] = None,
        fill_value_default: float = -1,
    ) -> Union[GeoTensor, NDArray]:
        """
        Load the reflectance of the given wavelengths

        Args:
            wavelengths (Union[float, List[float], NDArray]): List of wavelengths to load
            as_reflectance (bool, optional): return the values as reflectance rather than radiance. Defaults to True.
                If False values will have units of W/m^2/SR/um (`self.units`)
            raw (bool, optional): if True it will return the raw values,
                if False it will return the values reprojected to the specified CRS and resolution. Defaults to True.
            resolution_dst (int, optional): if raw is False, it will reproject the values to this resolution. Defaults to 30.
            dst_crs (Optional[Any], optional): if None it will use the corresponding UTM zone.
            fill_value_default (float, optional): fill value. Defaults to -1.

        Returns:
            Union[GeoTensor, NDArray]: if raw is True it will return a NDArray with the values, otherwise it will return a GeoTensor
                with the reprojected values in its `.values` attribute.
        """

        if isinstance(wavelengths, Number):
            wavelengths = np.array([wavelengths])
        else:
            wavelengths = np.array(wavelengths)

        load_swir = any(
            [
                wvl >= self.swir_range[0] and wvl < self.swir_range[1]
                for wvl in wavelengths
            ]
        )
        load_vnir = any(
            [
                wvl >= self.vnir_range[0] and wvl < self.vnir_range[1]
                for wvl in wavelengths
            ]
        )
        if load_swir:
            self.load_raw(swir_flag=True)
            wavelength_swir_mean = np.mean(self.wavelength_swir, axis=0)
            fwhm_swir_mean = np.mean(self.fwhm_swir, axis=0)
        if load_vnir:
            self.load_raw(swir_flag=False)
            wavelength_vnir_mean = np.mean(self.wavelength_vnir, axis=0)
            fwhm_vnir_mean = np.mean(self.fwhm_vnir, axis=0)

        ltoa_img = []
        fwhm = []
        for b in range(len(wavelengths)):
            if (
                wavelengths[b] >= self.swir_range[0]
                and wavelengths[b] < self.swir_range[1]
            ):
                index_band = np.argmin(np.abs(wavelengths[b] - wavelength_swir_mean))
                fwhm.append(fwhm_swir_mean[index_band])
                img = self.ltoa_swir[..., index_band]
            else:
                index_band = np.argmin(np.abs(wavelengths[b] - wavelength_vnir_mean))
                fwhm.append(fwhm_vnir_mean[index_band])
                img = self.ltoa_vnir[..., index_band]

            ltoa_img.append(img)

        # Transpose to row major
        ltoa_img = np.transpose(np.stack(ltoa_img, axis=0), (0, 2, 1))

        if as_reflectance:
            thuiller = reflectance.load_thuillier_irradiance()
            response = reflectance.srf(wavelengths, fwhm, thuiller["Nanometer"].values)

            solar_irradiance_norm = thuiller["Radiance(mW/m2/nm)"].values.dot(
                response
            )  # mW/m$^2$/nm
            solar_irradiance_norm /= 1_000  # W/m$^2$/nm

            ltoa_img = reflectance.radiance_to_reflectance(
                ltoa_img,
                solar_irradiance_norm,
                units=self.units,
                observation_date_corr_factor=self.observation_date_correction_factor,
            )

        if raw:
            return ltoa_img

        return griddata.read_to_crs(
            np.transpose(ltoa_img, (1, 2, 0)),
            lons=self.lons,
            lats=self.lats,
            resolution_dst=resolution_dst,
            dst_crs=dst_crs,
            fill_value_default=fill_value_default,
        )

    def load_rgb(
        self, as_reflectance: bool = True, raw: bool = True
    ) -> Union[GeoTensor, NDArray]:
        return self.load_wavelengths(
            wavelengths=WAVELENGTHS_RGB, as_reflectance=as_reflectance, raw=raw
        )

    def __repr__(self) -> str:
        return f"""
        File: {self.filename}
        Bounds: {self.bounds}
        Time: {self.time_coverage_start}
        VNIR Range: {self.vnir_range} {self.nbands_vnir} bands
        SWIR Range: {self.swir_range} {self.nbands_swir} bands
        """

`load_raw(swir_flag)` ¶

Load the all the data from all the wavelengths for the VNIR or SWIR range. This function caches the data, wavelegths and FWHM in the attributes of the class: * ltoa_swir, wavelength_swir, fwhm_swir, vza_swir, sza_swir if swir_flag is True * ltoa_vnir, wavelength_vnir, fwhm_vnir, vza_vnir, sza_vnir if swir_flag is False

Parameters:

Name	Type	Description	Default
`swir_flag`	`bool`	if True it will load the SWIR range, otherwise it will load the VNIR range	required

Returns:

Name	Type	Description
`NDArray`	`NDArray`	3D array with the reflectance values (H, W, B) where N and M are the dimensions of the image and B is the number of bands.

Source code in georeader/readers/prisma.py

def load_raw(self, swir_flag: bool) -> NDArray:
    """
    Load the all the data from all the wavelengths for the VNIR or SWIR range.
    This function caches the data, wavelegths and FWHM in the attributes of the class:
        * `ltoa_swir`, `wavelength_swir`, `fwhm_swir`, `vza_swir`, `sza_swir` if `swir_flag` is True
        * `ltoa_vnir`, `wavelength_vnir`, `fwhm_vnir`, `vza_vnir`, `sza_vnir` if `swir_flag` is False

    Args:
        swir_flag (bool): if True it will load the SWIR range, otherwise it will load the VNIR range

    Returns:
        NDArray: 3D array with the reflectance values (H, W, B)
            where N and M are the dimensions of the image and B is the number of bands.
    """

    if swir_flag:
        if all(
            x is not None
            for x in [
                self.ltoa_swir,
                self.wavelength_swir,
                self.fwhm_swir,
                self.vza_swir,
                self.sza_swir,
            ]
        ):
            return self.ltoa_swir
    else:
        if all(
            x is not None
            for x in [
                self.ltoa_vnir,
                self.wavelength_vnir,
                self.fwhm_vnir,
                self.vza_vnir,
                self.sza_vnir,
            ]
        ):
            return self.ltoa_vnir

    swir_cube_dat = SWIR_FLAG["swir_cube_dat"][swir_flag]
    swir_lab = SWIR_FLAG["swir_lab"][swir_flag]  # True: "Swir", False: "Vnir"

    with h5py.File(self.filename, "r") as f:
        dset = f[swir_cube_dat]

        ltoa_img = np.flip(np.transpose(dset[:, :, :], axes=[0, 2, 1]), axis=0)

        dset = f["/KDP_AUX/Cw_" + swir_lab + "_Matrix"]
        wvl_mat_ini = dset[:, :]

        dset = f["/KDP_AUX/Fwhm_" + swir_lab + "_Matrix"]
        fwhm_mat_ini = dset[:, :]

        wvl_cntr = f.attrs["List_Cw_" + swir_lab]
        wvl_flag = f.attrs["List_Cw_" + swir_lab + "_Flags"]

        sc_fac = f.attrs["ScaleFactor_" + swir_lab]

        of_fac = f.attrs["Offset_" + swir_lab]

        vza = 0.0
        sza = f.attrs["Sun_zenith_angle"]

        ltoa_img = ltoa_img / sc_fac - of_fac

    # Lambda
    wvl_mat_ini = np.flip(wvl_mat_ini, axis=1)
    li_no0 = np.where(wvl_mat_ini[100, :] > 0)[0]
    wvl_mat = np.copy(wvl_mat_ini[:, li_no0])
    wl_center_ini = np.mean(wvl_mat, axis=0)

    # FWHM
    fwhm_mat_ini = np.flip(fwhm_mat_ini, axis=1)
    fwhm_mat = np.copy(fwhm_mat_ini[:, li_no0])

    M, N, B_tot = ltoa_img.shape

    if swir_flag:
        if B_tot == len(wl_center_ini):
            ltoa_img = np.flip(ltoa_img, axis=2)
        else:
            # ltoa_img = np.flip(ltoa_img[:, :, :-2], axis=2)
            non0_bands = np.where(wvl_flag == 1)[0]
            ltoa_img = np.flip(ltoa_img[:, :, non0_bands], axis=2)

    else:
        if B_tot == len(wl_center_ini):
            ltoa_img = np.flip(ltoa_img, axis=2)
        else:
            # ltoa_img = np.flip(ltoa_img[:, :, 3:], axis=2)  # Revisar esto(not sure)
            non0_bands = np.where(wvl_flag == 1)[0]
            ltoa_img = np.flip(ltoa_img[:, :, non0_bands], axis=2)

    ltoa_img = np.transpose(ltoa_img, (1, 0, 2))
    if swir_flag:
        self.ltoa_swir = ltoa_img
        self.wavelength_swir = wvl_mat
        self.fwhm_swir = fwhm_mat
        self.vza_swir = vza
        self.sza_swir = sza
    else:
        self.ltoa_vnir = ltoa_img
        self.wavelength_vnir = wvl_mat
        self.fwhm_vnir = fwhm_mat
        self.vza_vnir = vza
        self.sza_vnir = sza

    return ltoa_img

`load_wavelengths(wavelengths, as_reflectance=True, raw=True, resolution_dst=30, dst_crs=None, fill_value_default=-1)` ¶

Load the reflectance of the given wavelengths

Parameters:

Name	Type	Description	Default
`wavelengths`	`Union[float, List[float], NDArray]`	List of wavelengths to load	required
`as_reflectance`	`bool`	return the values as reflectance rather than radiance. Defaults to True. If False values will have units of W/m^2/SR/um (`self.units`)	`True`
`raw`	`bool`	if True it will return the raw values, if False it will return the values reprojected to the specified CRS and resolution. Defaults to True.	`True`
`resolution_dst`	`int`	if raw is False, it will reproject the values to this resolution. Defaults to 30.	`30`
`dst_crs`	`Optional[Any]`	if None it will use the corresponding UTM zone.	`None`
`fill_value_default`	`float`	fill value. Defaults to -1.	`-1`

Returns:

Type	Description
`Union[GeoTensor, NDArray]`	Union[GeoTensor, NDArray]: if raw is True it will return a NDArray with the values, otherwise it will return a GeoTensor with the reprojected values in its `.values` attribute.

Source code in georeader/readers/prisma.py

def load_wavelengths(
    self,
    wavelengths: Union[float, List[float], NDArray],
    as_reflectance: bool = True,
    raw: bool = True,
    resolution_dst=30,
    dst_crs: Optional[Any] = None,
    fill_value_default: float = -1,
) -> Union[GeoTensor, NDArray]:
    """
    Load the reflectance of the given wavelengths

    Args:
        wavelengths (Union[float, List[float], NDArray]): List of wavelengths to load
        as_reflectance (bool, optional): return the values as reflectance rather than radiance. Defaults to True.
            If False values will have units of W/m^2/SR/um (`self.units`)
        raw (bool, optional): if True it will return the raw values,
            if False it will return the values reprojected to the specified CRS and resolution. Defaults to True.
        resolution_dst (int, optional): if raw is False, it will reproject the values to this resolution. Defaults to 30.
        dst_crs (Optional[Any], optional): if None it will use the corresponding UTM zone.
        fill_value_default (float, optional): fill value. Defaults to -1.

    Returns:
        Union[GeoTensor, NDArray]: if raw is True it will return a NDArray with the values, otherwise it will return a GeoTensor
            with the reprojected values in its `.values` attribute.
    """

    if isinstance(wavelengths, Number):
        wavelengths = np.array([wavelengths])
    else:
        wavelengths = np.array(wavelengths)

    load_swir = any(
        [
            wvl >= self.swir_range[0] and wvl < self.swir_range[1]
            for wvl in wavelengths
        ]
    )
    load_vnir = any(
        [
            wvl >= self.vnir_range[0] and wvl < self.vnir_range[1]
            for wvl in wavelengths
        ]
    )
    if load_swir:
        self.load_raw(swir_flag=True)
        wavelength_swir_mean = np.mean(self.wavelength_swir, axis=0)
        fwhm_swir_mean = np.mean(self.fwhm_swir, axis=0)
    if load_vnir:
        self.load_raw(swir_flag=False)
        wavelength_vnir_mean = np.mean(self.wavelength_vnir, axis=0)
        fwhm_vnir_mean = np.mean(self.fwhm_vnir, axis=0)

    ltoa_img = []
    fwhm = []
    for b in range(len(wavelengths)):
        if (
            wavelengths[b] >= self.swir_range[0]
            and wavelengths[b] < self.swir_range[1]
        ):
            index_band = np.argmin(np.abs(wavelengths[b] - wavelength_swir_mean))
            fwhm.append(fwhm_swir_mean[index_band])
            img = self.ltoa_swir[..., index_band]
        else:
            index_band = np.argmin(np.abs(wavelengths[b] - wavelength_vnir_mean))
            fwhm.append(fwhm_vnir_mean[index_band])
            img = self.ltoa_vnir[..., index_band]

        ltoa_img.append(img)

    # Transpose to row major
    ltoa_img = np.transpose(np.stack(ltoa_img, axis=0), (0, 2, 1))

    if as_reflectance:
        thuiller = reflectance.load_thuillier_irradiance()
        response = reflectance.srf(wavelengths, fwhm, thuiller["Nanometer"].values)

        solar_irradiance_norm = thuiller["Radiance(mW/m2/nm)"].values.dot(
            response
        )  # mW/m$^2$/nm
        solar_irradiance_norm /= 1_000  # W/m$^2$/nm

        ltoa_img = reflectance.radiance_to_reflectance(
            ltoa_img,
            solar_irradiance_norm,
            units=self.units,
            observation_date_corr_factor=self.observation_date_correction_factor,
        )

    if raw:
        return ltoa_img

    return griddata.read_to_crs(
        np.transpose(ltoa_img, (1, 2, 0)),
        lons=self.lons,
        lats=self.lats,
        resolution_dst=resolution_dst,
        dst_crs=dst_crs,
        fill_value_default=fill_value_default,
    )

EMIT Reader¶

The EMIT (Earth Surface Mineral Dust Source Investigation) reader provides access to NASA's imaging spectrometer data from the International Space Station. This reader works with Level 1B calibrated radiance data (not atmospherically corrected).

Key features:

Reading L1B hyperspectral radiance data from NetCDF4 format files
Working with the 380-2500 nm spectral range with 7.4 nm sampling
Irregular grid georeferencing through GLT (Geographic Lookup Table)
Support for the observation geometry information (solar and viewing angles)
Integration with L2A mask products for cloud and shadow detection
Quality-aware analysis with cloud, cirrus, and spacecraft flag masks
Conversion from radiance (μW/cm²/sr/nm) to top-of-atmosphere reflectance
Support for downloading data from NASA DAAC portals
Automatic detection and use of appropriate UTM projection

Tutorial example:

Working with EMIT images

API Reference¶

Module to read EMIT (Earth Surface Mineral Dust Source Investigation) hyperspectral images.

EMIT is a NASA imaging spectrometer aboard the International Space Station that measures reflected solar radiation from Earth's surface in 285 spectral bands from 380 to 2500 nm. This module provides tools to read, georeference, and process EMIT L1B radiance data.

Data Format Overview¶

EMIT data is distributed in NetCDF format with a unique storage layout:

Raw Data Structure (NetCDF file):
┌─────────────────────────────────────┐
│  radiance: (downtrack, crosstrack, bands)  │
│  └── Shape: (~1280, ~1242, 285)            │
│                                             │
│  location/glt_x: (rows, cols)              │
│  location/glt_y: (rows, cols)              │
│  └── Geographic Lookup Table (GLT)         │
└─────────────────────────────────────┘

The raw data is stored in sensor coordinates (pushbroom scan lines), NOT in geographic coordinates. The GLT provides a mapping from geographic (orthorectified) coordinates back to raw sensor coordinates.

GLT Orthorectification Process¶

The GLT (Geographic Lookup Table) is key to understanding EMIT data:

Geographic Grid (Output)          Sensor Grid (Raw Data)
┌─────────────────────┐           ┌─────────────────────┐
│ (0,0)               │           │ radiance array      │
│   ┌───┬───┬───┐     │   GLT     │ ┌───────────────┐   │
│   │ a │ b │ c │     │ ──────→   │ │ (5,2) (5,3)   │   │
│   ├───┼───┼───┤     │ lookup    │ │ (6,1) (6,2)   │   │
│   │ d │ e │ f │     │           │ │ ...           │   │
│   └───┴───┴───┘     │           │ └───────────────┘   │
│               (H,W) │           │                     │
└─────────────────────┘           └─────────────────────┘

For pixel (row=1, col=2) in geographic grid:
    glt_x[1,2] = 5  →  raw_col = 5
    glt_y[1,2] = 2  →  raw_row = 2
    value = radiance[2, 5, :]  (all bands)

GLT values of 0 indicate invalid/no-data pixels

This approach allows: 1. Efficient storage (no wasted pixels from orthorectification padding) 2. Preservation of original radiometric values (no resampling) 3. Flexible reprojection to any target CRS

Radiometric Units¶

L1B Radiance: μW/(cm²·sr·nm) - microwatts per square centimeter per steradian per nanometer
FWHM: Full Width at Half Maximum of spectral response in nm
Wavelengths: Center wavelengths in nm (380-2500 nm range)

Key Classes and Functions¶

EMITImage: Main class for reading and processing EMIT data
download_product: Download EMIT products from NASA Earthdata
get_radiance_link, get_obs_link: Generate download URLs

Requirements¶

Requires xarray: pip install xarray

Authentication for downloads requires NASA Earthdata credentials stored in: ~/.georeader/auth_emit.json with format: {"user": "...", "password": "..."}

Examples¶

Basic usage::

from georeader.readers.emit import EMITImage, download_product

# Download and open EMIT image
link = 'https://data.lpdaac.earthdatacloud.nasa.gov/...'
filepath = download_product(link)
emit = EMITImage(filepath)

# Reproject to UTM (recommended for analysis)
emit_utm = emit.to_crs("UTM")

# Load as reflectance (applies solar irradiance correction)
reflectance = emit_utm.load(as_reflectance=True)

# Load RGB composite
rgb = emit_utm.load_rgb(as_reflectance=True)

# Get cloud mask
cloud_mask = emit.validmask()

References¶

NASA EMIT Mission: https://earth.jpl.nasa.gov/emit/
EMIT Data Resources: https://github.com/nasa/EMIT-Data-Resources
EMIT Utils: https://github.com/emit-sds/emit-utils/
LP DAAC Data Access: https://lpdaac.usgs.gov/products/emitl1bradv001/

`EMITImage` ¶

Reader for EMIT L1B (Earth Surface Mineral Dust Source Investigation) hyperspectral images.

This class provides comprehensive functionality to read and manipulate EMIT satellite imagery products from NASA's imaging spectrometer aboard the ISS. It handles the unique GLT-based (Geographic Lookup Table) storage format, supporting operations like:

Loading radiometry data with automatic orthorectification
Converting radiance to reflectance using solar irradiance
Accessing cloud and quality masks
Extracting viewing and solar geometry angles
Reprojecting to different coordinate reference systems

EMIT Data Model¶

EMIT stores data in sensor coordinates, not geographic coordinates. The GLT provides a lookup table mapping geographic pixels to sensor pixels:

GLT Orthorectification:
┌────────────────────────────┐      ┌──────────────────────────┐
│    Geographic Grid         │      │   Sensor Grid (raw)      │
│  (orthorectified space)    │      │  (pushbroom scan)        │
│  ┌───┬───┬───┬───┐        │      │  ┌───┬───┬───┬───┐      │
│  │ · │ a │ b │ · │        │  GLT │  │ e │ a │ b │ · │      │
│  ├───┼───┼───┼───┤        │  ──→ │  ├───┼───┼───┼───┤      │
│  │ c │ d │ e │ f │        │      │  │ f │ c │ d │ · │      │
│  └───┴───┴───┴───┘        │      │  └───┴───┴───┴───┘      │
│  (pixels with data)        │      │  (original acquistion)   │
└────────────────────────────┘      └──────────────────────────┘

· = no data (GLT value = 0)

For geographic pixel (row, col):
    raw_x = glt_x[row, col]  
    raw_y = glt_y[row, col]
    value = radiance[raw_y, raw_x, :]

This approach preserves original radiometric values without interpolation artifacts.

Spectral Characteristics¶

Wavelength range: 380-2500 nm (VNIR + SWIR)
Number of bands: 285
Spectral sampling: ~7.4 nm
Spatial resolution: 60m at nadir

Attributes¶

filename : str Path to the EMIT NetCDF file. nc_ds : xr.Dataset xarray Dataset handle for the main radiance file. glt : GeoTensor Geographic Lookup Table as a GeoTensor with shape (2, H, W). - glt.values[0]: x-indices into raw radiance (1-based) - glt.values[1]: y-indices into raw radiance (1-based) valid_glt : np.ndarray Boolean mask (H, W) indicating valid GLT entries (data coverage). glt_relative : GeoTensor GLT with indices relative to the data window (0-based). window_raw : rasterio.windows.Window Window defining the subset of raw data to read (optimizes I/O). real_transform : rasterio.Affine Affine transform for the orthorectified (geographic) grid. time_coverage_start : datetime UTC datetime of acquisition start. time_coverage_end : datetime UTC datetime of acquisition end. wavelengths : np.ndarray Center wavelengths (nm) for selected bands. fwhm : np.ndarray Full Width at Half Maximum (nm) for selected bands. band_selection : Union[int, Tuple[int, ...], slice] Current band subset selection. units : str Radiance units from file metadata (typically 'uW/(cm^2 sr nm)'). fill_value_default : float No-data value for radiance data. dims : Tuple[str] Dimension names ("band", "y", "x"). dtype : np.dtype Data type of radiance values.

Lazy-Loaded Properties¶

nc_ds_obs : xr.Dataset Observation data (viewing/solar angles, path length, elevation). Auto-downloaded from NASA Earthdata if not present locally. nc_ds_l2amask : xr.Dataset
L2A quality mask data (clouds, cirrus, water, aggregate flags). Auto-downloaded from NASA Earthdata if not present locally. mean_sza : float Mean solar zenith angle (degrees) across the scene. mean_vza : float Mean view zenith angle (degrees) across the scene. observation_date_correction_factor : float Earth-Sun distance correction factor for the acquisition date.

Examples¶

Basic loading and reprojection::

>>> from georeader.readers.emit import EMITImage, download_product
>>> 
>>> # Download from NASA Earthdata
>>> link = 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/...'
>>> filepath = download_product(link)
>>> 
>>> # Open and reproject to UTM
>>> emit = EMITImage(filepath)
>>> emit_utm = emit.to_crs("UTM", resolution_dst_crs=60)
>>> 
>>> # Load as reflectance
>>> refl = emit_utm.load(as_reflectance=True)
>>> print(refl.shape)  # (285, H, W)

Working with specific wavelengths::

>>> # Select RGB-like bands (640, 550, 460 nm)
>>> emit.set_band_selection([35, 23, 11])
>>> print(emit.wavelengths)  # [641.2, 553.1, 462.3]
>>> rgb = emit.load(as_reflectance=True)
>>> 
>>> # Or use the convenience method
>>> rgb = emit.load_rgb(as_reflectance=True)

Accessing masks and quality data::

>>> # Get valid (cloud-free) mask
>>> valid_mask = emit.validmask()
>>> print(f"Clear pixels: {emit.percentage_clear:.1f}%")
>>> 
>>> # Get specific mask layers
>>> cloud_mask = emit.mask("Cloud flag")
>>> water_mask = emit.water_mask()

Working with viewing geometry::

>>> # Get solar zenith angle
>>> sza = emit.sza()  # GeoTensor with SZA values
>>> 
>>> # Get mean angles for quick reference
>>> print(f"Mean SZA: {emit.mean_sza:.1f}°")
>>> print(f"Mean VZA: {emit.mean_vza:.1f}°")

Spatial subsetting::

>>> import rasterio.windows
>>> 
>>> # Read a spatial window
>>> window = rasterio.windows.Window(col_off=100, row_off=200, width=500, height=500)
>>> emit_subset = emit.read_from_window(window)
>>> data = emit_subset.load()

References¶

EMIT L1B Product Guide: https://lpdaac.usgs.gov/products/emitl1bradv001/
EMIT Data Resources: https://github.com/nasa/EMIT-Data-Resources
EMIT Algorithms: Green et al. (2020) doi:10.1029/2020JD033451

Source code in georeader/readers/emit.py

class EMITImage:
    """
    Reader for EMIT L1B (Earth Surface Mineral Dust Source Investigation) hyperspectral images.

    This class provides comprehensive functionality to read and manipulate EMIT satellite 
    imagery products from NASA's imaging spectrometer aboard the ISS. It handles the 
    unique GLT-based (Geographic Lookup Table) storage format, supporting operations like:

    - Loading radiometry data with automatic orthorectification
    - Converting radiance to reflectance using solar irradiance
    - Accessing cloud and quality masks
    - Extracting viewing and solar geometry angles
    - Reprojecting to different coordinate reference systems

    EMIT Data Model
    ---------------
    EMIT stores data in sensor coordinates, not geographic coordinates. The GLT provides
    a lookup table mapping geographic pixels to sensor pixels:

        GLT Orthorectification:
        ┌────────────────────────────┐      ┌──────────────────────────┐
        │    Geographic Grid         │      │   Sensor Grid (raw)      │
        │  (orthorectified space)    │      │  (pushbroom scan)        │
        │  ┌───┬───┬───┬───┐        │      │  ┌───┬───┬───┬───┐      │
        │  │ · │ a │ b │ · │        │  GLT │  │ e │ a │ b │ · │      │
        │  ├───┼───┼───┼───┤        │  ──→ │  ├───┼───┼───┼───┤      │
        │  │ c │ d │ e │ f │        │      │  │ f │ c │ d │ · │      │
        │  └───┴───┴───┴───┘        │      │  └───┴───┴───┴───┘      │
        │  (pixels with data)        │      │  (original acquistion)   │
        └────────────────────────────┘      └──────────────────────────┘

        · = no data (GLT value = 0)

        For geographic pixel (row, col):
            raw_x = glt_x[row, col]  
            raw_y = glt_y[row, col]
            value = radiance[raw_y, raw_x, :]

    This approach preserves original radiometric values without interpolation artifacts.

    Spectral Characteristics
    ------------------------
    - Wavelength range: 380-2500 nm (VNIR + SWIR)
    - Number of bands: 285
    - Spectral sampling: ~7.4 nm
    - Spatial resolution: 60m at nadir

    Attributes
    ----------
    filename : str
        Path to the EMIT NetCDF file.
    nc_ds : xr.Dataset
        xarray Dataset handle for the main radiance file.
    glt : GeoTensor
        Geographic Lookup Table as a GeoTensor with shape (2, H, W).
        - glt.values[0]: x-indices into raw radiance (1-based)
        - glt.values[1]: y-indices into raw radiance (1-based)
    valid_glt : np.ndarray
        Boolean mask (H, W) indicating valid GLT entries (data coverage).
    glt_relative : GeoTensor
        GLT with indices relative to the data window (0-based).
    window_raw : rasterio.windows.Window
        Window defining the subset of raw data to read (optimizes I/O).
    real_transform : rasterio.Affine
        Affine transform for the orthorectified (geographic) grid.
    time_coverage_start : datetime
        UTC datetime of acquisition start.
    time_coverage_end : datetime
        UTC datetime of acquisition end.
    wavelengths : np.ndarray
        Center wavelengths (nm) for selected bands.
    fwhm : np.ndarray
        Full Width at Half Maximum (nm) for selected bands.
    band_selection : Union[int, Tuple[int, ...], slice]
        Current band subset selection.
    units : str
        Radiance units from file metadata (typically 'uW/(cm^2 sr nm)').
    fill_value_default : float
        No-data value for radiance data.
    dims : Tuple[str]
        Dimension names ("band", "y", "x").
    dtype : np.dtype
        Data type of radiance values.

    Lazy-Loaded Properties
    ----------------------
    nc_ds_obs : xr.Dataset
        Observation data (viewing/solar angles, path length, elevation).
        Auto-downloaded from NASA Earthdata if not present locally.
    nc_ds_l2amask : xr.Dataset  
        L2A quality mask data (clouds, cirrus, water, aggregate flags).
        Auto-downloaded from NASA Earthdata if not present locally.
    mean_sza : float
        Mean solar zenith angle (degrees) across the scene.
    mean_vza : float
        Mean view zenith angle (degrees) across the scene.
    observation_date_correction_factor : float
        Earth-Sun distance correction factor for the acquisition date.

    Examples
    --------
    Basic loading and reprojection::

        >>> from georeader.readers.emit import EMITImage, download_product
        >>> 
        >>> # Download from NASA Earthdata
        >>> link = 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/...'
        >>> filepath = download_product(link)
        >>> 
        >>> # Open and reproject to UTM
        >>> emit = EMITImage(filepath)
        >>> emit_utm = emit.to_crs("UTM", resolution_dst_crs=60)
        >>> 
        >>> # Load as reflectance
        >>> refl = emit_utm.load(as_reflectance=True)
        >>> print(refl.shape)  # (285, H, W)

    Working with specific wavelengths::

        >>> # Select RGB-like bands (640, 550, 460 nm)
        >>> emit.set_band_selection([35, 23, 11])
        >>> print(emit.wavelengths)  # [641.2, 553.1, 462.3]
        >>> rgb = emit.load(as_reflectance=True)
        >>> 
        >>> # Or use the convenience method
        >>> rgb = emit.load_rgb(as_reflectance=True)

    Accessing masks and quality data::

        >>> # Get valid (cloud-free) mask
        >>> valid_mask = emit.validmask()
        >>> print(f"Clear pixels: {emit.percentage_clear:.1f}%")
        >>> 
        >>> # Get specific mask layers
        >>> cloud_mask = emit.mask("Cloud flag")
        >>> water_mask = emit.water_mask()

    Working with viewing geometry::

        >>> # Get solar zenith angle
        >>> sza = emit.sza()  # GeoTensor with SZA values
        >>> 
        >>> # Get mean angles for quick reference
        >>> print(f"Mean SZA: {emit.mean_sza:.1f}°")
        >>> print(f"Mean VZA: {emit.mean_vza:.1f}°")

    Spatial subsetting::

        >>> import rasterio.windows
        >>> 
        >>> # Read a spatial window
        >>> window = rasterio.windows.Window(col_off=100, row_off=200, width=500, height=500)
        >>> emit_subset = emit.read_from_window(window)
        >>> data = emit_subset.load()

    See Also
    --------
    georeader.readers.prisma.PRISMA : PRISMA hyperspectral reader
    georeader.readers.enmap.EnMAP : EnMAP hyperspectral reader
    georeader.reflectance : Radiometric conversion utilities

    References
    ----------
    - EMIT L1B Product Guide: https://lpdaac.usgs.gov/products/emitl1bradv001/
    - EMIT Data Resources: https://github.com/nasa/EMIT-Data-Resources
    - EMIT Algorithms: Green et al. (2020) doi:10.1029/2020JD033451
    """
    attributes_set_if_exists = ["_nc_ds_obs", "_mean_sza", "_mean_vza",
                                "_observation_bands", "_nc_ds_l2amask", "_mask_bands",
                                "obs_file", "l2amaskfile",
                                # Option B: opt-in radiance cache. ``_cache`` is a
                                # mutable dict shared by reference across all clones
                                # built from the same parent — that's what makes the
                                # cache visible end-to-end. ``cache_radiance`` is the
                                # opt-in flag (rebind-on-clone is fine; we don't toggle
                                # per-clone).
                                "_cache", "cache_radiance"]

    # Key under which the full-spectrum windowed radiance is stored in ``_cache``.
    _CACHE_KEY_RADIANCE = "radiance_window"

    def __init__(self, filename:str, glt:Optional[GeoTensor]=None,
                 band_selection:Optional[Union[int, Tuple[int, ...],slice]]=slice(None),
                 cache_radiance:bool=False,
                 reuse_handles_from:Optional['EMITImage']=None):
        if not HAS_XARRAY:
            raise ImportError("xarray is required to read EMIT images. Please install it with: pip install xarray")

        self.filename = filename
        if reuse_handles_from is not None:
            if reuse_handles_from.filename != self.filename:
                raise ValueError("reuse_handles_from must reference the same EMIT file")
            # Clone constructor path: reuse parent handles to avoid opening
            # throwaway datasets that would immediately be overwritten.
            self.nc_ds = reuse_handles_from.nc_ds
        else:
            self.nc_ds = safe_open_netcdf(self.filename, cache=False, load=False)
        self._nc_ds_obs = None
        self._nc_ds_l2amask = None
        self._observation_bands = None
        self._mask_bands = None
        self._sensor_band_params = None
        # Opt-in radiance cache. Default off — the dict is created either way so the
        # ``_cache is parent._cache`` invariant holds for clones even when caching
        # is disabled.
        self.cache_radiance:bool = cache_radiance
        self._cache:Dict[str, Any] = {}
        # self.real_shape = (self.nc_ds['radiance'].shape[-1],) + self.nc_ds['radiance'].shape[:-1]

        self._mean_sza = None
        self._mean_vza = None
        self.obs_file:Optional[str] = None
        self.l2amaskfile:Optional[str] = None

        geotransform = self.nc_ds.attrs['geotransform']
        self.real_transform = rasterio.Affine(geotransform[1], geotransform[2], geotransform[0],
                                              geotransform[4], geotransform[5], geotransform[3])

        self.time_coverage_start = datetime.strptime(self.nc_ds.attrs['time_coverage_start'], "%Y-%m-%dT%H:%M:%S%z")
        self.time_coverage_end = datetime.strptime(self.nc_ds.attrs['time_coverage_end'], "%Y-%m-%dT%H:%M:%S%z")

        self.dtype = self.nc_ds['radiance'].dtype
        self.dims = ("band", "y", "x")
        self.fill_value_default = self.nc_ds['radiance'].attrs.get('_FillValue', -9999)
        self.nodata = self.fill_value_default
        self.units = self.nc_ds["radiance"].attrs.get('units', '')

        if glt is None:
            # Open the location group to access glt_x and glt_y
            location_ds = safe_open_netcdf(self.filename, cache=False, load=False, group='location')
            glt_x = np.nan_to_num(location_ds['glt_x'].values, nan=0).astype(np.int32)
            glt_y = np.nan_to_num(location_ds['glt_y'].values, nan=0).astype(np.int32)
            location_ds.close()

            glt_arr = np.zeros((2,) + glt_x.shape, dtype=np.int32)
            glt_arr[0] = glt_x
            glt_arr[1] = glt_y
            # glt_arr -= 1 # account for 1-based indexing

            # https://rasterio.readthedocs.io/en/stable/api/rasterio.crs.html
            self.glt = GeoTensor(glt_arr, transform=self.real_transform, 
                                 crs=rasterio.crs.CRS.from_wkt(self.nc_ds.attrs['spatial_ref']),
                                 fill_value_default=0)
        else:
            self.glt = glt

        self.valid_glt = np.all(self.glt.values != self.glt.fill_value_default, axis=0)
        xmin, ymin, xmax, ymax = self._bounds_indexes_raw() # values are 1-based!

        # glt has the absolute indexes of the netCDF object
        # glt_relative has the relative indexes
        self.glt_relative = self.glt.copy()
        self.glt_relative.values[0, self.valid_glt] -= xmin
        self.glt_relative.values[1, self.valid_glt] -= ymin

        self.window_raw = rasterio.windows.Window(col_off=xmin-1, row_off=ymin-1, 
                                                  width=xmax-xmin+1, height=ymax-ymin+1)

        # Load sensor_band_parameters from its group, unless we're cloning from
        # an existing instance and can reuse the already-open handle.
        if reuse_handles_from is not None:
            self._sensor_band_params = reuse_handles_from._sensor_band_params
            self.bandname_dimension = reuse_handles_from.bandname_dimension
        else:
            self._sensor_band_params = safe_open_netcdf(self.filename, cache=False, load=False, group='sensor_band_parameters')
            if "wavelengths" in self._sensor_band_params:
                self.bandname_dimension = "wavelengths"
            elif "radiance_wl" in self._sensor_band_params:
                self.bandname_dimension = "radiance_wl"
            else:
                raise ValueError(f"wavelengths or radiance_wl not found in sensor_band_parameters")

        self.band_selection = band_selection
        self.wavelengths = self._sensor_band_params[self.bandname_dimension].values[self.band_selection]
        self.fwhm = self._sensor_band_params['fwhm'].values[self.band_selection]
        self._observation_date_correction_factor:Optional[float] = None

    @property
    def observation_date_correction_factor(self) -> float:
        if self._observation_date_correction_factor is None:
            self._observation_date_correction_factor = reflectance.observation_date_correction_factor(date_of_acquisition=self.time_coverage_start,
                                                                                                      center_coords=self.footprint("EPSG:4326").centroid.coords[0])
        return self._observation_date_correction_factor

    @property
    def crs(self) -> Any:
        return self.glt.crs

    @property
    def shape(self) -> Tuple:
        try:
            n_bands = len(self.wavelengths)
            return  (n_bands,) + self.glt.shape[1:]
        except Exception:
            return self.glt.shape

    @property
    def width(self) -> int:
        return self.shape[-1]

    @property
    def height(self) -> int:
        return self.shape[-2]

    @property
    def transform(self) -> rasterio.Affine:
        return self.glt.transform

    @property
    def res(self) -> Tuple[float, float]:
        return self.glt.res

    @property
    def bounds(self) -> Tuple[float, float, float, float]:
        return self.glt.bounds

    def footprint(self, crs:Optional[str]=None) -> Polygon:
        """
        Get the footprint of the image in the given CRS. If no CRS is given, the footprint is returned in the native CRS.
        This function takes into account the valid_glt mask to compute the footprint.

        Args:
            crs (Optional[str], optional): The CRS to return the footprint in. Defaults to None. 
                If None, the footprint is returned in the native CRS.

        Returns:
            Polygon: The footprint of the image in the given CRS.
        """
        if not hasattr(self, '_pol'):
            from georeader.vectorize import get_polygons
            pols = get_polygons(self.valid_glt, transform=self.transform)
            self._pol = unary_union(pols)
        if crs is not None:
            pol_crs = window_utils.polygon_to_crs(self._pol, self.crs, crs)
        else:
            pol_crs = self._pol

        pol_glt = self.glt.footprint(crs=crs)

        return pol_crs.intersection(pol_glt)

    def set_band_selection(self, band_selection:Optional[Union[int, Tuple[int, ...],slice]]=None):
        """
        Set the band selection. Band selection is absolute w.r.t self.nc_ds['radiance']

        Args:
            band_selection (Optional[Union[int, Tuple[int, ...],slice]], optional): slicing or selection of the bands. Defaults to None.

        Example:
            >>> emit_image.set_band_selection(slice(0, 3)) # will only load the three first bands
            >>> emit_image.wavelengths # will only return the wavelengths of the three first bands
            >>> emit_image.load() # will only load the three first bands
        """
        if band_selection is None:
            band_selection = slice(None)
        self.band_selection = band_selection
        self.wavelengths = self._sensor_band_params[self.bandname_dimension].values[self.band_selection]
        self.fwhm = self._sensor_band_params['fwhm'].values[self.band_selection]

    @ property
    def nc_ds_obs(self, obs_file:Optional[str]=None):
        """
        Loads the observation file. In this file we have information about angles (solar and viewing),
        elevation and ilumination based on elevation and path length.

        This function downloads the observation file if it does not exist from the JPL portal.

        It caches the observation file in the object. (self.nc_ds_obs)

        Args:
            obs_file (Optional[str], optional): Path to the observation file. 
                Defaults to None. If none it will download the observation file 
                from the EMIT server.
        """
        if self._nc_ds_obs is not None:
            return self._nc_ds_obs

        if obs_file is None:
            link_obs_file = get_obs_link(self.filename)
            obs_file = os.path.join(os.path.dirname(self.filename), os.path.basename(link_obs_file))
            if not os.path.exists(obs_file):
                download_product(link_obs_file, obs_file)

        self.obs_file = obs_file
        self._nc_ds_obs = safe_open_netcdf(obs_file, cache=False, load=False)
        # Load observation_bands from sensor_band_parameters group
        sensor_params = safe_open_netcdf(obs_file, cache=False, load=False, group='sensor_band_parameters')
        self._observation_bands = sensor_params['observation_bands'].values
        sensor_params.close()
        return self._nc_ds_obs

    @property
    def nc_ds_l2amask(self, l2amaskfile:Optional[str]=None) -> xr.Dataset:
        """
        Loads the L2A mask file. In this file we have information about the cloud mask.

        This function downloads the L2A mask file if it does not exist from the JPL portal.

        It caches the L2A mask file in the object. (self.nc_ds_l2amask)

        See https://lpdaac.usgs.gov/products/emitl2arflv001/ for info about the L2A mask file.

        Args:
            l2amaskfile (Optional[str], optional): Path to the L2A mask file. 
                Defaults to None. If none it will download the L2A mask file 
                from the EMIT server.
        """
        if self._nc_ds_l2amask is not None:
            return self._nc_ds_l2amask

        if l2amaskfile is None:
            link_l2amaskfile = get_l2amask_link(self.filename)
            l2amaskfile = os.path.join(os.path.dirname(self.filename), os.path.basename(link_l2amaskfile))
            if not os.path.exists(l2amaskfile):
                download_product(link_l2amaskfile, l2amaskfile)

        self.l2amaskfile = l2amaskfile
        self._nc_ds_l2amask = safe_open_netcdf(l2amaskfile, cache=False, load=False)
        # Load mask_bands from sensor_band_parameters group
        sensor_params = safe_open_netcdf(l2amaskfile, cache=False, load=False, 
                                         group='sensor_band_parameters')
        self._mask_bands = sensor_params["mask_bands"].values
        sensor_params.close()
        return self._nc_ds_l2amask

    @property
    def mask_bands(self) -> np.array:
        """ Returns the mask bands -> ['Cloud flag', 'Cirrus flag', 'Water flag', 'Spacecraft Flag',
       'Dilated Cloud Flag', 'AOD550', 'H2O (g cm-2)', 'Aggregate Flag'] """
        self.nc_ds_l2amask
        return self._mask_bands

    def validmask(self, with_buffer:bool=True) -> GeoTensor:
        """
        Return the validmask mask


        Returns:
            GeoTensor: bool mask. True means that the pixel is valid.
        """

        validmask = ~self.invalid_mask_raw(with_buffer=with_buffer)

        return self.georreference(validmask,
                                  fill_value_default=False)

    def invalid_mask_raw(self, with_buffer:bool=True) -> NDArray:
        """
        Returns the non georreferenced quality mask. True means that the pixel is not valid.

        This mask is computed as the sum of the Cloud flag, Cirrus flag, Spacecraft flag and Dilated Cloud Flag.
        True means that the pixel is not valid.

        From: https://github.com/nasa/EMIT-Data-Resources/blob/main/python/how-tos/How_to_use_EMIT_Quality_data.ipynb
        and https://github.com/nasa/EMIT-Data-Resources/blob/main/python/modules/emit_tools.py#L277


        """
        band_index =  [0,1,3]
        if with_buffer:
            band_index.append(4)

        slice_y, slice_x = self.window_raw.toslices()
        mask_arr = self.nc_ds_l2amask['mask'].values[slice_y, slice_x, band_index]
        mask_arr = np.sum(mask_arr, axis=-1)
        mask_arr = (mask_arr >= 1)
        return mask_arr

    @property
    def percentage_clear(self) -> float:
        """
        Return the percentage of clear pixels in the image

        Returns:
            float: percentage of clear pixels
        """

        invalids = self.invalid_mask_raw(with_buffer=False)
        return 100 * (1 - np.sum(invalids) / np.prod(invalids.shape))


    def mask(self, mask_name:str="cloud_mask") -> GeoTensor:
        """
        Return the mask layer with the given name.
        Mask shall be one of self.mask_bands -> ['Cloud flag', 'Cirrus flag', 'Water flag', 'Spacecraft Flag',
       'Dilated Cloud Flag', 'AOD550', 'H2O (g cm-2)', 'Aggregate Flag']

        Args:
            mask_name (str, optional): Name of the mask. Defaults to "cloud_mask".

        Returns:
            GeoTensor: mask
        """
        band_index = self.mask_bands.tolist().index(mask_name)
        slice_y, slice_x = self.window_raw.toslices()
        mask_arr = self.nc_ds_l2amask['mask'].values[slice_y, slice_x, band_index]
        return self.georreference(mask_arr,
                                  fill_value_default=self.nc_ds_l2amask['mask'].attrs.get('_FillValue', -9999))

    def water_mask(self) -> GeoTensor:
        """ Returns the water mask """
        return self.mask("Water flag")

    @property
    def observation_bands(self) -> np.array:
        """ Returns the observation bands """
        self.nc_ds_obs
        return self._observation_bands

    def observation(self, name:str) -> GeoTensor:
        """ Returns the observation with the given name """
        band_index = self.observation_bands.tolist().index(name)
        slice_y, slice_x = self.window_raw.toslices()
        # The obs file stores obs data in root group, not in a subgroup
        obs_arr = self.nc_ds_obs['obs'].values[slice_y, slice_x, band_index]
        return self.georreference(obs_arr, 
                                  fill_value_default=self.nc_ds_obs['obs'].attrs.get('_FillValue', -9999))

    def sza(self) -> GeoTensor:
        """ Return the solar zenith angle as a GeoTensor """
        return self.observation('To-sun zenith (0 to 90 degrees from zenith)')

    def vza(self) -> GeoTensor:
        """ Return the view zenith angle as a GeoTensor """
        return self.observation('To-sensor zenith (0 to 90 degrees from zenith)')

    def elevation(self) -> GeoTensor:
        location_ds = safe_open_netcdf(self.filename, cache=False, load=False, group='location')
        obs_arr = location_ds["elev"]
        slice_y, slice_x = self.window_raw.toslices()
        elev_data = obs_arr.values[slice_y, slice_x]
        fill_val = obs_arr.attrs.get('_FillValue', -9999)
        location_ds.close()
        return self.georreference(elev_data, fill_value_default=fill_val)

    @property
    def mean_sza(self) -> float:
        """ Return the mean solar zenith angle """
        if self._mean_sza is not None:
            return self._mean_sza

        band_index = self.observation_bands.tolist().index('To-sun zenith (0 to 90 degrees from zenith)')
        sza_arr = self.nc_ds_obs['obs'].values[..., band_index]
        fill_val = self.nc_ds_obs['obs'].attrs.get('_FillValue', -9999)
        self._mean_sza = float(np.mean(sza_arr[sza_arr != fill_val]))
        return self._mean_sza

    @property
    def mean_vza(self) -> float:
        """ Return the mean view zenith angle """
        if self._mean_vza is not None:
            return self._mean_vza
        band_index = self.observation_bands.tolist().index('To-sensor zenith (0 to 90 degrees from zenith)')
        vza_arr = self.nc_ds_obs['obs'].values[..., band_index]
        fill_val = self.nc_ds_obs['obs'].attrs.get('_FillValue', -9999)
        self._mean_vza = float(np.mean(vza_arr[vza_arr != fill_val]))
        return self._mean_vza

    def __copy__(self) -> '__class__':
        out = EMITImage(
            self.filename,
            glt=self.glt.copy(),
            band_selection=self.band_selection,
            reuse_handles_from=self,
        )

        # copy nc_ds_obs if it exists
        for attrname in self.attributes_set_if_exists:
            if hasattr(self, attrname):
                setattr(out, attrname, getattr(self, attrname))

        return out
    def copy(self) -> '__class__':
        return self.__copy__()

    def to_crs(self, crs:Any="UTM", 
               resolution_dst_crs:Optional[Union[float, Tuple[float, float]]]=60) -> '__class__':
        """
        Reproject the image to a new crs

        Args:
            crs (Any): CRS. 

        Returns:
            EmitImage: EMIT image in the new CRS

        Example:
            >>> emit_image = EMITImage("path/to/emit_image.nc")
            >>> emit_image_utm = emit_image.to_crs(crs="UTM")
        """
        if crs == "UTM":
            footprint = self.glt.footprint("EPSG:4326")
            crs = get_utm_epsg(footprint)

        glt = read.read_to_crs(self.glt, crs, resampling=rasterio.warp.Resampling.nearest, 
                               resolution_dst_crs=resolution_dst_crs)

        out = EMITImage(
            self.filename,
            glt=glt,
            band_selection=self.band_selection,
            reuse_handles_from=self,
        )

        # Propagate eagerly-set and lazily-loaded attributes from the parent so
        # the new instance shares the parent's NetCDF handles, sensor params,
        # observation bands, mean angles, etc. without re-opening anything.
        for attrname in self.attributes_set_if_exists:
            if hasattr(self, attrname):
                setattr(out, attrname, getattr(self, attrname))

        # _pol is not in attributes_set_if_exists because it's CRS-dependent —
        # it must be reprojected to the new CRS.
        if hasattr(self, '_pol'):
            setattr(out, '_pol', window_utils.polygon_to_crs(self._pol, self.crs, crs))

        return out


    def read_from_window(self, window:Optional[rasterio.windows.Window]=None, boundless:bool=True) -> '__class__':
        glt_window = self.glt.read_from_window(window, boundless=boundless)
        out = EMITImage(
            self.filename,
            glt=glt_window,
            band_selection=self.band_selection,
            reuse_handles_from=self,
        )

        # Propagate eagerly-set and lazily-loaded attributes from the parent.
        for attrname in self.attributes_set_if_exists:
            if hasattr(self, attrname):
                setattr(out, attrname, getattr(self, attrname))

        return out

    def read_from_bands(self, bands:Union[int, Tuple[int, ...], slice]) -> '__class__':
        copy = self.__copy__()
        copy.set_band_selection(bands)
        return copy

    def load(self, boundless:bool=True, as_reflectance:bool=False)-> GeoTensor:
        data = self.load_raw() # (C, H, W) or (H, W)
        if as_reflectance:
            invalids = np.isnan(data) | (data == self.fill_value_default)
            thuiller = reflectance.load_thuillier_irradiance()
            response = reflectance.srf(self.wavelengths, self.fwhm, thuiller["Nanometer"].values)
            solar_irradiance_norm = thuiller["Radiance(mW/m2/nm)"].values.dot(response) / 1_000
            data = reflectance.radiance_to_reflectance(data, solar_irradiance_norm,
                                                       units=self.units,
                                                       observation_date_corr_factor=self.observation_date_correction_factor)
            data[invalids] = self.fill_value_default
        return self.georreference(data, fill_value_default=self.fill_value_default)

    def load_rgb(self, as_reflectance:bool=True) -> GeoTensor:
        bands_read = np.argmin(np.abs(WAVELENGTHS_RGB[:, np.newaxis] - self.wavelengths), axis=1).tolist()
        ei_rgb = self.read_from_bands(bands_read)
        return ei_rgb.load(boundless=True, as_reflectance=as_reflectance)

    @property
    def shape_raw(self) -> Tuple[int, int, int]:
        """ Return the shape of the raw data in (C, H, W) format """
        return (len(self.wavelengths),) + rasterio.windows.shape(self.window_raw)

    def _bounds_indexes_raw(self) -> Tuple[int, int, int, int]:
        """ Return the bounds of the raw data: (min_x, min_y, max_x, max_y) """
        return _bounds_indexes_raw(self.glt.values, self.valid_glt)


    def load_raw(self, transpose:bool=True) -> np.array:
        """
        Load the raw data, without orthorectification

        Args:
            transpose (bool, optional): Transpose the data if it has 3 dimentsions to (C, H, W)
                Defaults to True. if False return (H, W, C)

        Returns:
            np.array: raw data (C, H, W) or (H, W)
        """

        slice_y, slice_x = self.window_raw.toslices()

        if self.cache_radiance:
            # Option B (opt-in): cache the full-spectrum windowed radiance so that
            # subsequent loads of band subsets become pure in-memory slices.
            # ``self._cache`` is a mutable dict shared with all clones built from
            # this instance (via ``attributes_set_if_exists``), so a single
            # decompression services every algorithm downstream.
            cached = self._cache.get(self._CACHE_KEY_RADIANCE)
            if cached is None:
                radiance = self.nc_ds['radiance']
                dims = radiance.dims
                cached = radiance.isel({dims[0]: slice_y, dims[1]: slice_x}).values
                self._cache[self._CACHE_KEY_RADIANCE] = cached
            data = cached[..., self.band_selection]
        else:
            # Default path: push the spatial (and, when possible, spectral) slice
            # into the NetCDF read via xarray .isel(). Avoids materialising the
            # full radiance variable in RAM, but re-reads from disk each call.
            radiance = self.nc_ds['radiance']
            dims = radiance.dims  # typically ('downtrack', 'crosstrack', 'bands')
            radiance = radiance.isel({dims[0]: slice_y, dims[1]: slice_x})

            if isinstance(self.band_selection, slice):
                radiance = radiance.isel({dims[2]: self.band_selection})
                data = radiance.values
            else:
                # Fancy indexing (list / array of indices) — push as far as we can
                # into the read (spatial), then numpy-slice the band axis.
                data = radiance.values[..., self.band_selection]

        # transpose to (C, H, W)
        if transpose and (len(data.shape) == 3):
            data = np.transpose(data, axes=(2, 0, 1))

        return data

    def clear_radiance_cache(self) -> None:
        """Drop the cached radiance window if present.

        After this call, the next ``load_raw()`` will re-read from disk. The
        ``_cache`` dict object itself is not replaced — clones built via
        ``__copy__`` / ``read_from_bands`` / ``to_crs`` / ``read_from_window``
        share the same dict by reference, so clearing through any clone is
        visible to all of them. Intended to be called from ``EmitProcessor.process``
        after all per-scene products are computed, to release the ~1.5 GB
        radiance array before the next scene is processed.
        """
        self._cache.pop(self._CACHE_KEY_RADIANCE, None)


    def georreference(self, data:np.array, 
                      fill_value_default:Optional[Union[int,float]]=None) -> GeoTensor:
        """
        Georreference an image in sensor coordinates to coordinates of the current 
        georreferenced object. If you do some processing with the raw data, you can 
        georreference the raw output with this function.

        Args:
            data (np.array): raw data (C, H, W) or (H, W). 

        Returns:
            GeoTensor: georreferenced version of data (C, H', W') or (H', W')

        Example:
            >>> emit_image = EMITImage("path/to/emit_image.nc")
            >>> emit_image_rgb = emit_image.read_from_bands([35, 23, 11])
            >>> data_rgb = emit_image_rgb.load_raw() # (3, H, W)
            >>> data_rgb_ortho = emit_image.georreference(data_rgb) # (3, H', W')
        """
        return georreference(self.glt_relative, data, self.valid_glt, 
                             fill_value_default=fill_value_default)


    @property
    def values(self) -> np.array:
        # return np.zeros(self.shape, dtype=self.dtype)
        raise self.load(boundless=True).values

    def __repr__(self)->str:
        return f""" 
         File: {self.filename}
         Transform: {self.transform}
         Shape: {self.shape}
         Resolution: {self.res}
         Bounds: {self.bounds}
         CRS: {self.crs}
         units: {self.units}
        """

`mask_bands` `property` ¶

Returns the mask bands -> ['Cloud flag', 'Cirrus flag', 'Water flag', 'Spacecraft Flag', 'Dilated Cloud Flag', 'AOD550', 'H2O (g cm-2)', 'Aggregate Flag']

`mean_sza` `property` ¶

Return the mean solar zenith angle

`mean_vza` `property` ¶

Return the mean view zenith angle

`nc_ds_l2amask` `property` ¶

Loads the L2A mask file. In this file we have information about the cloud mask.

This function downloads the L2A mask file if it does not exist from the JPL portal.

It caches the L2A mask file in the object. (self.nc_ds_l2amask)

See https://lpdaac.usgs.gov/products/emitl2arflv001/ for info about the L2A mask file.

Parameters:

Name	Type	Description	Default
`l2amaskfile`	`Optional[str]`	Path to the L2A mask file. Defaults to None. If none it will download the L2A mask file from the EMIT server.	required

`nc_ds_obs` `property` ¶

Loads the observation file. In this file we have information about angles (solar and viewing), elevation and ilumination based on elevation and path length.

This function downloads the observation file if it does not exist from the JPL portal.

It caches the observation file in the object. (self.nc_ds_obs)

Parameters:

Name	Type	Description	Default
`obs_file`	`Optional[str]`	Path to the observation file. Defaults to None. If none it will download the observation file from the EMIT server.	required

`observation_bands` `property` ¶

Returns the observation bands

`percentage_clear` `property` ¶

Return the percentage of clear pixels in the image

Returns:

Name	Type	Description
`float`	`float`	percentage of clear pixels

`shape_raw` `property` ¶

Return the shape of the raw data in (C, H, W) format

`clear_radiance_cache()` ¶

Drop the cached radiance window if present.

After this call, the next load_raw() will re-read from disk. The _cache dict object itself is not replaced — clones built via __copy__ / read_from_bands / to_crs / read_from_window share the same dict by reference, so clearing through any clone is visible to all of them. Intended to be called from EmitProcessor.process after all per-scene products are computed, to release the ~1.5 GB radiance array before the next scene is processed.

Source code in georeader/readers/emit.py

def clear_radiance_cache(self) -> None:
    """Drop the cached radiance window if present.

    After this call, the next ``load_raw()`` will re-read from disk. The
    ``_cache`` dict object itself is not replaced — clones built via
    ``__copy__`` / ``read_from_bands`` / ``to_crs`` / ``read_from_window``
    share the same dict by reference, so clearing through any clone is
    visible to all of them. Intended to be called from ``EmitProcessor.process``
    after all per-scene products are computed, to release the ~1.5 GB
    radiance array before the next scene is processed.
    """
    self._cache.pop(self._CACHE_KEY_RADIANCE, None)

`footprint(crs=None)` ¶

Get the footprint of the image in the given CRS. If no CRS is given, the footprint is returned in the native CRS. This function takes into account the valid_glt mask to compute the footprint.

Parameters:

Name	Type	Description	Default
`crs`	`Optional[str]`	The CRS to return the footprint in. Defaults to None. If None, the footprint is returned in the native CRS.	`None`

Returns:

Name	Type	Description
`Polygon`	`Polygon`	The footprint of the image in the given CRS.

Source code in georeader/readers/emit.py

def footprint(self, crs:Optional[str]=None) -> Polygon:
    """
    Get the footprint of the image in the given CRS. If no CRS is given, the footprint is returned in the native CRS.
    This function takes into account the valid_glt mask to compute the footprint.

    Args:
        crs (Optional[str], optional): The CRS to return the footprint in. Defaults to None. 
            If None, the footprint is returned in the native CRS.

    Returns:
        Polygon: The footprint of the image in the given CRS.
    """
    if not hasattr(self, '_pol'):
        from georeader.vectorize import get_polygons
        pols = get_polygons(self.valid_glt, transform=self.transform)
        self._pol = unary_union(pols)
    if crs is not None:
        pol_crs = window_utils.polygon_to_crs(self._pol, self.crs, crs)
    else:
        pol_crs = self._pol

    pol_glt = self.glt.footprint(crs=crs)

    return pol_crs.intersection(pol_glt)

`georreference(data, fill_value_default=None)` ¶

Georreference an image in sensor coordinates to coordinates of the current georreferenced object. If you do some processing with the raw data, you can georreference the raw output with this function.

Parameters:

Name	Type	Description	Default
`data`	`array`	raw data (C, H, W) or (H, W).	required

Returns:

Name	Type	Description
`GeoTensor`	`GeoTensor`	georreferenced version of data (C, H', W') or (H', W')

Example

emit_image = EMITImage("path/to/emit_image.nc") emit_image_rgb = emit_image.read_from_bands([35, 23, 11]) data_rgb = emit_image_rgb.load_raw() # (3, H, W) data_rgb_ortho = emit_image.georreference(data_rgb) # (3, H', W')

Source code in georeader/readers/emit.py

def georreference(self, data:np.array, 
                  fill_value_default:Optional[Union[int,float]]=None) -> GeoTensor:
    """
    Georreference an image in sensor coordinates to coordinates of the current 
    georreferenced object. If you do some processing with the raw data, you can 
    georreference the raw output with this function.

    Args:
        data (np.array): raw data (C, H, W) or (H, W). 

    Returns:
        GeoTensor: georreferenced version of data (C, H', W') or (H', W')

    Example:
        >>> emit_image = EMITImage("path/to/emit_image.nc")
        >>> emit_image_rgb = emit_image.read_from_bands([35, 23, 11])
        >>> data_rgb = emit_image_rgb.load_raw() # (3, H, W)
        >>> data_rgb_ortho = emit_image.georreference(data_rgb) # (3, H', W')
    """
    return georreference(self.glt_relative, data, self.valid_glt, 
                         fill_value_default=fill_value_default)

`invalid_mask_raw(with_buffer=True)` ¶

Returns the non georreferenced quality mask. True means that the pixel is not valid.

This mask is computed as the sum of the Cloud flag, Cirrus flag, Spacecraft flag and Dilated Cloud Flag. True means that the pixel is not valid.

From: https://github.com/nasa/EMIT-Data-Resources/blob/main/python/how-tos/How_to_use_EMIT_Quality_data.ipynb and https://github.com/nasa/EMIT-Data-Resources/blob/main/python/modules/emit_tools.py#L277

Source code in georeader/readers/emit.py

def invalid_mask_raw(self, with_buffer:bool=True) -> NDArray:
    """
    Returns the non georreferenced quality mask. True means that the pixel is not valid.

    This mask is computed as the sum of the Cloud flag, Cirrus flag, Spacecraft flag and Dilated Cloud Flag.
    True means that the pixel is not valid.

    From: https://github.com/nasa/EMIT-Data-Resources/blob/main/python/how-tos/How_to_use_EMIT_Quality_data.ipynb
    and https://github.com/nasa/EMIT-Data-Resources/blob/main/python/modules/emit_tools.py#L277


    """
    band_index =  [0,1,3]
    if with_buffer:
        band_index.append(4)

    slice_y, slice_x = self.window_raw.toslices()
    mask_arr = self.nc_ds_l2amask['mask'].values[slice_y, slice_x, band_index]
    mask_arr = np.sum(mask_arr, axis=-1)
    mask_arr = (mask_arr >= 1)
    return mask_arr

`load_raw(transpose=True)` ¶

Load the raw data, without orthorectification

Parameters:

Name	Type	Description	Default
`transpose`	`bool`	Transpose the data if it has 3 dimentsions to (C, H, W) Defaults to True. if False return (H, W, C)	`True`

Returns:

Type	Description
`array`	np.array: raw data (C, H, W) or (H, W)

Source code in georeader/readers/emit.py

def load_raw(self, transpose:bool=True) -> np.array:
    """
    Load the raw data, without orthorectification

    Args:
        transpose (bool, optional): Transpose the data if it has 3 dimentsions to (C, H, W)
            Defaults to True. if False return (H, W, C)

    Returns:
        np.array: raw data (C, H, W) or (H, W)
    """

    slice_y, slice_x = self.window_raw.toslices()

    if self.cache_radiance:
        # Option B (opt-in): cache the full-spectrum windowed radiance so that
        # subsequent loads of band subsets become pure in-memory slices.
        # ``self._cache`` is a mutable dict shared with all clones built from
        # this instance (via ``attributes_set_if_exists``), so a single
        # decompression services every algorithm downstream.
        cached = self._cache.get(self._CACHE_KEY_RADIANCE)
        if cached is None:
            radiance = self.nc_ds['radiance']
            dims = radiance.dims
            cached = radiance.isel({dims[0]: slice_y, dims[1]: slice_x}).values
            self._cache[self._CACHE_KEY_RADIANCE] = cached
        data = cached[..., self.band_selection]
    else:
        # Default path: push the spatial (and, when possible, spectral) slice
        # into the NetCDF read via xarray .isel(). Avoids materialising the
        # full radiance variable in RAM, but re-reads from disk each call.
        radiance = self.nc_ds['radiance']
        dims = radiance.dims  # typically ('downtrack', 'crosstrack', 'bands')
        radiance = radiance.isel({dims[0]: slice_y, dims[1]: slice_x})

        if isinstance(self.band_selection, slice):
            radiance = radiance.isel({dims[2]: self.band_selection})
            data = radiance.values
        else:
            # Fancy indexing (list / array of indices) — push as far as we can
            # into the read (spatial), then numpy-slice the band axis.
            data = radiance.values[..., self.band_selection]

    # transpose to (C, H, W)
    if transpose and (len(data.shape) == 3):
        data = np.transpose(data, axes=(2, 0, 1))

    return data

`mask(mask_name='cloud_mask')` ¶

Return the mask layer with the given name. Mask shall be one of self.mask_bands -> ['Cloud flag', 'Cirrus flag', 'Water flag', 'Spacecraft Flag', 'Dilated Cloud Flag', 'AOD550', 'H2O (g cm-2)', 'Aggregate Flag']

Args: mask_name (str, optional): Name of the mask. Defaults to "cloud_mask".

Returns: GeoTensor: mask

Source code in georeader/readers/emit.py

def mask(self, mask_name:str="cloud_mask") -> GeoTensor:
    """
    Return the mask layer with the given name.
    Mask shall be one of self.mask_bands -> ['Cloud flag', 'Cirrus flag', 'Water flag', 'Spacecraft Flag',
   'Dilated Cloud Flag', 'AOD550', 'H2O (g cm-2)', 'Aggregate Flag']

    Args:
        mask_name (str, optional): Name of the mask. Defaults to "cloud_mask".

    Returns:
        GeoTensor: mask
    """
    band_index = self.mask_bands.tolist().index(mask_name)
    slice_y, slice_x = self.window_raw.toslices()
    mask_arr = self.nc_ds_l2amask['mask'].values[slice_y, slice_x, band_index]
    return self.georreference(mask_arr,
                              fill_value_default=self.nc_ds_l2amask['mask'].attrs.get('_FillValue', -9999))

`observation(name)` ¶

Returns the observation with the given name

Source code in georeader/readers/emit.py

def observation(self, name:str) -> GeoTensor:
    """ Returns the observation with the given name """
    band_index = self.observation_bands.tolist().index(name)
    slice_y, slice_x = self.window_raw.toslices()
    # The obs file stores obs data in root group, not in a subgroup
    obs_arr = self.nc_ds_obs['obs'].values[slice_y, slice_x, band_index]
    return self.georreference(obs_arr, 
                              fill_value_default=self.nc_ds_obs['obs'].attrs.get('_FillValue', -9999))

`set_band_selection(band_selection=None)` ¶

Set the band selection. Band selection is absolute w.r.t self.nc_ds['radiance']

Parameters:

Name	Type	Description	Default
`band_selection`	`Optional[Union[int, Tuple[int, ...], slice]]`	slicing or selection of the bands. Defaults to None.	`None`

Example

emit_image.set_band_selection(slice(0, 3)) # will only load the three first bands emit_image.wavelengths # will only return the wavelengths of the three first bands emit_image.load() # will only load the three first bands

Source code in georeader/readers/emit.py

def set_band_selection(self, band_selection:Optional[Union[int, Tuple[int, ...],slice]]=None):
    """
    Set the band selection. Band selection is absolute w.r.t self.nc_ds['radiance']

    Args:
        band_selection (Optional[Union[int, Tuple[int, ...],slice]], optional): slicing or selection of the bands. Defaults to None.

    Example:
        >>> emit_image.set_band_selection(slice(0, 3)) # will only load the three first bands
        >>> emit_image.wavelengths # will only return the wavelengths of the three first bands
        >>> emit_image.load() # will only load the three first bands
    """
    if band_selection is None:
        band_selection = slice(None)
    self.band_selection = band_selection
    self.wavelengths = self._sensor_band_params[self.bandname_dimension].values[self.band_selection]
    self.fwhm = self._sensor_band_params['fwhm'].values[self.band_selection]

`sza()` ¶

Return the solar zenith angle as a GeoTensor

Source code in georeader/readers/emit.py

def sza(self) -> GeoTensor:
    """ Return the solar zenith angle as a GeoTensor """
    return self.observation('To-sun zenith (0 to 90 degrees from zenith)')

`to_crs(crs='UTM', resolution_dst_crs=60)` ¶

Reproject the image to a new crs

Parameters:

Name	Type	Description	Default
`crs`	`Any`	CRS.	`'UTM'`

Returns:

Name	Type	Description
`EmitImage`	`__class__`	EMIT image in the new CRS

Example

emit_image = EMITImage("path/to/emit_image.nc") emit_image_utm = emit_image.to_crs(crs="UTM")

Source code in georeader/readers/emit.py

def to_crs(self, crs:Any="UTM", 
           resolution_dst_crs:Optional[Union[float, Tuple[float, float]]]=60) -> '__class__':
    """
    Reproject the image to a new crs

    Args:
        crs (Any): CRS. 

    Returns:
        EmitImage: EMIT image in the new CRS

    Example:
        >>> emit_image = EMITImage("path/to/emit_image.nc")
        >>> emit_image_utm = emit_image.to_crs(crs="UTM")
    """
    if crs == "UTM":
        footprint = self.glt.footprint("EPSG:4326")
        crs = get_utm_epsg(footprint)

    glt = read.read_to_crs(self.glt, crs, resampling=rasterio.warp.Resampling.nearest, 
                           resolution_dst_crs=resolution_dst_crs)

    out = EMITImage(
        self.filename,
        glt=glt,
        band_selection=self.band_selection,
        reuse_handles_from=self,
    )

    # Propagate eagerly-set and lazily-loaded attributes from the parent so
    # the new instance shares the parent's NetCDF handles, sensor params,
    # observation bands, mean angles, etc. without re-opening anything.
    for attrname in self.attributes_set_if_exists:
        if hasattr(self, attrname):
            setattr(out, attrname, getattr(self, attrname))

    # _pol is not in attributes_set_if_exists because it's CRS-dependent —
    # it must be reprojected to the new CRS.
    if hasattr(self, '_pol'):
        setattr(out, '_pol', window_utils.polygon_to_crs(self._pol, self.crs, crs))

    return out

`validmask(with_buffer=True)` ¶

Return the validmask mask

Returns:

Name	Type	Description
`GeoTensor`	`GeoTensor`	bool mask. True means that the pixel is valid.

Source code in georeader/readers/emit.py

def validmask(self, with_buffer:bool=True) -> GeoTensor:
    """
    Return the validmask mask


    Returns:
        GeoTensor: bool mask. True means that the pixel is valid.
    """

    validmask = ~self.invalid_mask_raw(with_buffer=with_buffer)

    return self.georreference(validmask,
                              fill_value_default=False)

`vza()` ¶

Return the view zenith angle as a GeoTensor

Source code in georeader/readers/emit.py

def vza(self) -> GeoTensor:
    """ Return the view zenith angle as a GeoTensor """
    return self.observation('To-sensor zenith (0 to 90 degrees from zenith)')

`water_mask()` ¶

Returns the water mask

Source code in georeader/readers/emit.py

def water_mask(self) -> GeoTensor:
    """ Returns the water mask """
    return self.mask("Water flag")

`download_product(link_down, filename=None, display_progress_bar=True, auth=None)` ¶

Download a product from the EMIT website (https://search.earthdata.nasa.gov/search). It requires that you have an account in the NASA Earthdata portal.

This code is based on this example: https://git.earthdata.nasa.gov/projects/LPDUR/repos/daac_data_download_python/browse

Parameters:

Name	Type	Description	Default
`link_down`	`str`	link to the product	required
`filename`	`Optional[str]`	filename to save the product	`None`
`display_progress_bar`	`bool`	display tqdm progress bar	`True`
`auth`	`Optional[Tuple[str, str]]`	tuple with user and password to download the product. If None, it will try to read the user and password from ~/.georeader/auth_emit.json	`None`

Example

link_down = 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL1BRAD.001/EMIT_L1B_RAD_001_20220828T051941_2224004_006/EMIT_L1B_RAD_001_20220828T051941_2224004_006.nc' filename = download_product(link_down)

Source code in georeader/readers/emit.py

def download_product(link_down:str, filename:Optional[str]=None,
                     display_progress_bar:bool=True,
                     auth:Optional[Tuple[str, str]] = None) -> str:
    """
    Download a product from the EMIT website (https://search.earthdata.nasa.gov/search). 
    It requires that you have an account in the NASA Earthdata portal. 

    This code is based on this example: https://git.earthdata.nasa.gov/projects/LPDUR/repos/daac_data_download_python/browse

    Args:
        link_down: link to the product
        filename: filename to save the product
        display_progress_bar: display tqdm progress bar
        auth: tuple with user and password to download the product. If None, it will try to read the user and password from ~/.georeader/auth_emit.json 

    Example:
        >>> link_down = 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL1BRAD.001/EMIT_L1B_RAD_001_20220828T051941_2224004_006/EMIT_L1B_RAD_001_20220828T051941_2224004_006.nc'
        >>> filename = download_product(link_down)
    """
    headers = None
    if auth is None:
        if AUTH_METHOD == "auth":
            auth = get_auth()
        elif AUTH_METHOD == "token":
            assert TOKEN is not None, "You need to set the TOKEN variable to download EMIT images"
            headers = get_headers()

    return download_product_base(link_down, filename=filename, auth=auth,
                                 headers=headers,
                                 display_progress_bar=display_progress_bar, 
                                 verify=False)

`get_radiance_link(product_path)` ¶

Get the link to download a product from the EMIT website. See: https://git.earthdata.nasa.gov/projects/LPDUR/repos/daac_data_download_python/browse

Parameters:

Name	Type	Description	Default
`product_path`	`str`	path to the product or filename of the product or product name with or without extension. e.g. 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'	required

Example

product_path = 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc' link = get_radiance_link(product_path) 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL1BRAD.001/EMIT_L1B_RAD_001_20220827T060753_2223904_013/EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'

Source code in georeader/readers/emit.py

def get_radiance_link(product_path:str) -> str:
    """
    Get the link to download a product from the EMIT website.
    See: https://git.earthdata.nasa.gov/projects/LPDUR/repos/daac_data_download_python/browse

    Args:
        product_path: path to the product or filename of the product or product name with or without extension.
            e.g. 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'

    Example:
        >>> product_path = 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'
        >>> link = get_radiance_link(product_path)
        'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL1BRAD.001/EMIT_L1B_RAD_001_20220827T060753_2223904_013/EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'
    """
    "EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc"
    namefile = os.path.splitext(os.path.basename(product_path))[0]
    product_id = os.path.splitext(namefile)[0]
    content_id = product_id.split("_")
    content_id[1] = "L1B"
    content_id[2] = "RAD"
    content_id[3] = content_id[3].replace("V", "")
    product_id = "_".join(content_id)
    link = f"https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL1BRAD.001/{product_id}/{product_id}.nc"
    return link

`get_obs_link(product_path)` ¶

Get the link to download a product from the EMIT website. See: https://git.earthdata.nasa.gov/projects/LPDUR/repos/daac_data_download_python/browse

Parameters:

Name	Type	Description	Default
`product_path`	`str`	path to the product or filename of the product with or without extension. e.g. 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'	required

Example

product_path = 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc' link = get_radiance_link(product_path) 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL1BRAD.001/EMIT_L1B_RAD_001_20220827T060753_2223904_013/EMIT_L1B_OBS_001_20220827T060753_2223904_013.nc'

Source code in georeader/readers/emit.py

def get_obs_link(product_path:str) -> str:
    """
    Get the link to download a product from the EMIT website.
    See: https://git.earthdata.nasa.gov/projects/LPDUR/repos/daac_data_download_python/browse

    Args:
        product_path: path to the product or filename of the product with or without extension.
            e.g. 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'

    Example:
        >>> product_path = 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'
        >>> link = get_radiance_link(product_path)
        'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL1BRAD.001/EMIT_L1B_RAD_001_20220827T060753_2223904_013/EMIT_L1B_OBS_001_20220827T060753_2223904_013.nc'
    """
    namefile = os.path.splitext(os.path.basename(product_path))[0]

    product_id = os.path.splitext(namefile)[0]
    content_id = product_id.split("_")
    content_id[1] = "L1B"
    content_id[2] = "RAD"
    content_id[3] = content_id[3].replace("V", "")
    product_id = "_".join(content_id)

    content_id[2] = "OBS"
    namefile = "_".join(content_id)

    link = f"https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL1BRAD.001/{product_id}/{namefile}.nc"
    return link

`get_ch4enhancement_link(tile)` ¶

Get the link to download a product from the EMIT website. See: https://git.earthdata.nasa.gov/projects/LPDUR/repos/daac_data_download_python/browse

Parameters:

Name	Type	Description	Default
`tile`	`str`	path to the product or filename of the product with or without extension. e.g. 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'	required

Example

product_path = 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc' link = get_radiance_link(product_path) 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BCH4ENH.001/EMIT_L2B_CH4ENH_001_20220810T064957_2222205_033/EMIT_L2B_CH4ENH_001_20220810T064957_2222205_033.tif'

Source code in georeader/readers/emit.py

def get_ch4enhancement_link(tile:str) -> str:
    """
    Get the link to download a product from the EMIT website.
    See: https://git.earthdata.nasa.gov/projects/LPDUR/repos/daac_data_download_python/browse

    Args:
        tile (str): path to the product or filename of the product with or without extension.
            e.g. 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'

    Example:
        >>> product_path = 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'
        >>> link = get_radiance_link(product_path)
        'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BCH4ENH.001/EMIT_L2B_CH4ENH_001_20220810T064957_2222205_033/EMIT_L2B_CH4ENH_001_20220810T064957_2222205_033.tif'
    """
    namefile = os.path.splitext(os.path.basename(tile))[0]

    product_id = os.path.splitext(namefile)[0]
    content_id = product_id.split("_")
    content_id[1] = "L2B"
    content_id[2] = "CH4ENH"
    content_id[3] = content_id[3].replace("V", "")
    product_id = "_".join(content_id)
    link = f"https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BCH4ENH.001/{product_id}/{product_id}.tif"
    return link

`get_l2amask_link(tile)` ¶

Get the link to download a product from the EMIT website (https://search.earthdata.nasa.gov/search)

Parameters:

Name	Type	Description	Default
`tile`	`str`	path to the product or filename of the L1B product with or without extension. e.g. 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'	required

Returns:

Name	Type	Description
`str`	`str`	link to the L2A mask product

Example

tile = 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc' link = get_l2amask_link(tile) 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20220827T060753_2223904_013/EMIT_L2A_MASK_001_20220827T060753_2223904_013.nc'

Source code in georeader/readers/emit.py

def get_l2amask_link(tile: str) -> str:
    """
    Get the link to download a product from the EMIT website (https://search.earthdata.nasa.gov/search)

    Args:
        tile (str): path to the product or filename of the L1B product with or without extension.
            e.g. 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'

    Returns:
        str: link to the L2A mask product

    Example:
        >>> tile = 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'
        >>> link = get_l2amask_link(tile)
        'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20220827T060753_2223904_013/EMIT_L2A_MASK_001_20220827T060753_2223904_013.nc'
    """
    namefile = os.path.splitext(os.path.basename(tile))[0]
    namefile = namefile + ".nc"

    product_id = os.path.splitext(namefile)[0]
    content_id = product_id.split("_")
    content_id[1] = "L2A"
    content_id[2] = "RFL"
    content_id[3] = content_id[3].replace("V", "")
    product_id = "_".join(content_id)

    content_id[2] = "MASK"
    namefilenew = "_".join(content_id) + ".nc"
    link = f"https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2ARFL.001/{product_id}/{namefilenew}"
    return link

`valid_mask(filename, with_buffer=False, dst_crs='UTM', resolution_dst_crs=60)` ¶

Loads the valid mask from the EMIT L2AMASK file.

Parameters:

Name	Type	Description	Default
`filename`	`str`	path to the L2AMASK file. e.g. EMIT_L2A_MASK_001_20220827T060753_2223904_013.nc	required
`with_buffer`	`bool`	If True, the buffer band is used to compute the valid mask. Defaults to False.	`False`

Returns:

Name	Type	Description
`GeoTensor`	`Tuple[GeoTensor, float]`	valid mask

Source code in georeader/readers/emit.py

def valid_mask(filename:str, with_buffer:bool=False, 
               dst_crs:Optional[Any]="UTM", 
               resolution_dst_crs:Optional[Union[float, Tuple[float, float]]]=60) -> Tuple[GeoTensor, float]:
    """
    Loads the valid mask from the EMIT L2AMASK file.

    Args:
        filename (str): path to the L2AMASK file. e.g. EMIT_L2A_MASK_001_20220827T060753_2223904_013.nc
        with_buffer (bool, optional): If True, the buffer band is used to compute the valid mask. Defaults to False.

    Returns:
        GeoTensor: valid mask
    """

    if not HAS_XARRAY:
        raise ImportError("xarray is required to read EMIT images. Please install it with: pip install xarray")

    nc_ds = safe_open_netcdf(filename, cache=False, load=False)

    geotransform = nc_ds.attrs['geotransform']
    real_transform = rasterio.Affine(geotransform[1], geotransform[2], geotransform[0],
                                     geotransform[4], geotransform[5], geotransform[3])

    # Open location group to access glt data
    location_ds = safe_open_netcdf(filename, cache=False, load=False, group='location')
    glt_x = location_ds['glt_x'].values
    glt_y = location_ds['glt_y'].values
    location_ds.close()

    glt_arr = np.zeros((2,) + glt_x.shape, dtype=np.int32)
    glt_arr[0] = glt_x
    glt_arr[1] = glt_y
    # glt_arr -= 1 # account for 1-based indexing

    # https://rasterio.readthedocs.io/en/stable/api/rasterio.crs.html
    glt = GeoTensor(glt_arr, transform=real_transform, 
                    crs=rasterio.crs.CRS.from_wkt(nc_ds.attrs['spatial_ref']),
                    fill_value_default=0)

    if dst_crs is not None:
        if dst_crs == "UTM":
            footprint = glt.footprint("EPSG:4326")
            dst_crs = get_utm_epsg(footprint)

        glt = read.read_to_crs(glt, dst_crs=dst_crs, 
                               resampling=rasterio.warp.Resampling.nearest, 
                               resolution_dst_crs=resolution_dst_crs)

    valid_glt = np.all(glt.values != glt.fill_value_default, axis=0)
    xmin = np.min(glt.values[0, valid_glt])
    ymin = np.min(glt.values[1, valid_glt])

    glt_relative = glt.copy()
    glt_relative.values[0, valid_glt] -= xmin
    glt_relative.values[1, valid_glt] -= ymin
    # mask_bands = nc_ds["sensor_band_parameters"]["mask_bands"][:]

    band_index =  [0,1,3]
    if with_buffer:
        band_index.append(4)

    mask_arr = nc_ds['mask'][:, :, band_index]
    invalidmask_raw = np.sum(mask_arr, axis=-1)
    invalidmask_raw = (invalidmask_raw >= 1)

    validmask = ~invalidmask_raw

    percentage_clear = 100 * (np.sum(validmask) / np.prod(validmask.shape))

    return georreference(glt_relative, validmask, valid_glt,
                         fill_value_default=False), percentage_clear

EnMAP Reader¶

The EnMAP (Environmental Mapping and Analysis Program) reader processes data from the German hyperspectral satellite mission. This reader works with Level 1B radiometrically calibrated data (not atmospherically corrected) that contains radiance values in physical units.

Key features:

Reading L1B hyperspectral radiance data from GeoTIFF format with accompanying XML metadata
Working with separate VNIR (420-1000 nm) and SWIR (900-2450 nm) spectral ranges
Support for 228 spectral channels with 6.5 nm (VNIR) and 10 nm (SWIR) sampling
Integration with Rational Polynomial Coefficients (RPCs) for accurate geometric correction
Conversion from radiance (mW/m²/sr/nm) to top-of-atmosphere reflectance
Access to solar illumination and viewing geometry for radiometric calculations
Support for quality masks

Tutorial example:

Working with EnMAP and CloudSEN12

API Reference¶

Module to read EnMAP (Environmental Mapping and Analysis Program) hyperspectral images.

EnMAP is a German hyperspectral satellite mission operated by DLR (German Aerospace Center), launched in 2022. It provides high-spectral-resolution data in 224 bands from 420 to 2450 nm with a 30m spatial resolution and 30km swath width.

Data Format Overview¶

EnMAP data is distributed as separate GeoTIFF files with an XML metadata file:

EnMAP Product Structure:
┌─────────────────────────────────────────────────────────────────────┐
│  ENMAP01-____L1B-DT0000000000_20220501T101523Z_001_V010110_...     │
│  ├── *-METADATA.XML           ← Main metadata file (input)         │
│  ├── *-SPECTRAL_IMAGE_VNIR.TIF   420-1000 nm, ~88 bands            │
│  ├── *-SPECTRAL_IMAGE_SWIR.TIF   900-2450 nm, ~136 bands           │
│  ├── *-QL_QUALITY_CLOUD.TIF      Cloud mask                        │
│  ├── *-QL_QUALITY_CIRRUS.TIF     Cirrus mask                       │
│  ├── *-QL_QUALITY_SNOW.TIF       Snow mask                         │
│  ├── *-QL_QUALITY_HAZE.TIF       Haze mask                         │
│  └── *-QL_PIXELMASK_*.TIF        Per-sensor pixel masks            │
└─────────────────────────────────────────────────────────────────────┘

Unlike EMIT and PRISMA, EnMAP L1B data is already orthorectified (map-projected) with Rational Polynomial Coefficients (RPCs) stored in the metadata for refined geolocation.

Dual-Sensor Architecture¶

EnMAP uses two pushbroom sensors with overlapping spectral coverage:

VNIR Detector                        SWIR Detector
┌────────────────────┐               ┌────────────────────┐
│ 420 - 1000 nm      │               │ 900 - 2450 nm      │
│ ~88 bands          │               │ ~136 bands         │
│ 6.5 nm sampling    │               │ 10 nm sampling     │
│ Si CCD             │               │ HgCdTe             │
└────────────────────┘               └────────────────────┘
          │                                    │
          └──────────── Overlap ───────────────┘
                     900-1000 nm

The spectral overlap enables cross-calibration between the two detectors.

Radiometric Processing¶

EnMAP L1B data requires conversion from Digital Numbers (DN) to radiance:

L_λ = DN × GAIN + OFFSET   [W/(m²·sr·nm)]

Note: DLR provides gains between 2000-10000 (multiplicative, not divisive)
The reader applies: L = (GAIN × DN + OFFSET) × 1000 to get mW/(m²·sr·nm)

Rational Polynomial Coefficients (RPCs)¶

EnMAP includes RPCs for precise geolocation refinement:

Pixel (col, row) ──→ RPC Transform ──→ Geographic (lon, lat)

RPCs model:
- Satellite orbit and attitude
- Sensor geometry  
- Terrain elevation effects (when height_off is set appropriately)

The reader can apply RPCs during loading for refined geolocation.

Product Levels¶

L1B: At-sensor radiance, sensor geometry
L2A: Surface reflectance, atmospheric correction applied

This reader is designed for L1B products.

Examples¶

Basic usage::

from georeader.readers.enmap import EnMAP

# Load from metadata XML file
enmap = EnMAP('/path/to/*-METADATA.XML')

# Load specific wavelengths as reflectance
bands = enmap.load_wavelengths([665, 865, 1600], as_reflectance=True)

# Load RGB with RPC-refined geolocation
rgb = enmap.load_rgb(as_reflectance=True, apply_rpcs=True)

# Load quality masks
cloud_mask = enmap.load_product('QL_QUALITY_CLOUD')

References¶

DLR EnMAP Mission: https://www.enmap.org/
EnMAP Product Specification: https://www.enmap.org/data_access/
GFZ enpt Package: https://github.com/GFZ/enpt (metadata parsing reference)

`EnMAP` ¶

Reader for EnMAP (Environmental Mapping and Analysis Program) hyperspectral images.

This class provides comprehensive functionality to read and manipulate EnMAP satellite imagery products from DLR. It handles the multi-file product structure (separate VNIR/SWIR GeoTIFFs with XML metadata), supporting operations like:

Loading radiance or reflectance data at specific wavelengths
Automatic handling of VNIR/SWIR sensor selection based on wavelength
Converting DN to radiance using gain/offset from metadata
Converting radiance to reflectance using solar irradiance
Applying Rational Polynomial Coefficients (RPCs) for refined geolocation
Loading quality masks (cloud, cirrus, snow, haze)

EnMAP Data Model¶

EnMAP L1B products are orthorectified (map-projected) GeoTIFFs with separate files for VNIR and SWIR bands:

File Structure:
┌────────────────────────────────────────────────────┐
│  METADATA.XML  ──→  wavelengths, FWHM, angles,    │
│                      gain/offset, RPCs             │
│                                                    │
│  SPECTRAL_IMAGE_VNIR.TIF  ──→  (88, H, W) bands   │
│  SPECTRAL_IMAGE_SWIR.TIF  ──→  (136, H, W) bands  │
│                                                    │
│  QL_QUALITY_*.TIF  ──→  quality masks              │
└────────────────────────────────────────────────────┘

Radiometric Conversion¶

DN to radiance conversion is automatic::

L_λ = (GAIN × DN + OFFSET) × 1000   [mW/(m²·sr·nm)]

Note: DLR gains are multiplicative (not divisive as in some sensors)

Spectral Configuration¶

EnMAP has two detectors with overlapping coverage:

Wavelength: 420nm ──── 1000nm ──── 2450nm
            ├── VNIR ────┤
                      ├──── SWIR ──────────┤
                      └ overlap┘
                      900-1000nm

VNIR: Silicon CCD, ~88 bands, 6.5nm sampling, SNR >500:1
SWIR: HgCdTe, ~136 bands, 10nm sampling, SNR >150:1

Attributes¶

xml_file : str Path to the EnMAP XML metadata file. by_folder : bool Whether files are organized by folder structure (alternative naming convention). swir_file : str Path to the SWIR GeoTIFF file (derived from xml_file). fs : fsspec.AbstractFileSystem Filesystem for file access (local or cloud storage). vnir : RasterioReader Reader for VNIR spectral image. swir : RasterioReader Reader for SWIR spectral image. wl_center : Dict[str, np.ndarray] Center wavelengths per sensor: {'vnir': [...], 'swir': [...]}. wl_fwhm : Dict[str, np.ndarray] FWHM per sensor: {'vnir': [...], 'swir': [...]}. gain_arr : Dict[str, np.ndarray] Radiometric gains per sensor for DN→radiance conversion. offs_arr : Dict[str, np.ndarray] Radiometric offsets per sensor for DN→radiance conversion. vnir_range : Tuple[float, float] VNIR wavelength range (min, max) including FWHM margins. swir_range : Tuple[float, float] SWIR wavelength range (min, max) including FWHM margins. hsf : float Mean ground elevation (m) from scene metadata. sza : float Solar zenith angle (degrees). saa : float Solar azimuth angle (degrees). vza : float View zenith angle (across-track off-nadir angle, degrees). vaa : float View azimuth angle (scene azimuth, degrees). rpcs_vnir : rasterio.rpc.RPC Rational Polynomial Coefficients for VNIR refined geolocation. rpcs_swir : rasterio.rpc.RPC Rational Polynomial Coefficients for SWIR refined geolocation. time_coverage_start : datetime UTC datetime of acquisition start. time_coverage_end : datetime UTC datetime of acquisition end. units : str Radiance units: 'mW/m2/sr/nm'. cache_radiance : bool Opt-in flag (default False). When True, the raw SWIR/VNIR spectral cubes read by load_product are cached in memory so repeated calls skip the disk read. Call clear_radiance_cache() to release them.

Properties (from underlying readers)¶

shape : Tuple[int, int, int] Full shape (total_bands, height, width). transform : rasterio.Affine Affine geotransform from SWIR file. crs : rasterio.crs.CRS Coordinate reference system from SWIR file. bounds : Tuple[float, float, float, float] Geographic bounds (xmin, ymin, xmax, ymax). res : Tuple[float, float] Pixel resolution (x, y).

Examples¶

Basic loading::

>>> from georeader.readers.enmap import EnMAP
>>> 
>>> enmap = EnMAP('/data/ENMAP01-...-METADATA.XML')
>>> print(enmap)  # View metadata summary

Loading specific wavelengths::

>>> # Load NDVI bands as reflectance
>>> bands = enmap.load_wavelengths([665, 865], as_reflectance=True)
>>> print(bands.shape)  # (2, H, W)
>>> 
>>> # Compute NDVI
>>> red, nir = bands.values[0], bands.values[1]
>>> ndvi = (nir - red) / (nir + red + 1e-10)

Loading RGB with RPC refinement::

>>> # Apply RPCs for better geolocation (recommended)
>>> rgb = enmap.load_rgb(as_reflectance=True, apply_rpcs=True)
>>> 
>>> # Without RPCs (uses original map projection)
>>> rgb = enmap.load_rgb(as_reflectance=True, apply_rpcs=False)

Loading quality masks::

>>> # Load cloud mask
>>> cloud = enmap.load_product('QL_QUALITY_CLOUD')
>>> 
>>> # Available products: 
>>> # 'QL_QUALITY_CLOUD', 'QL_QUALITY_CIRRUS', 'QL_QUALITY_SNOW',
>>> # 'QL_QUALITY_HAZE', 'QL_PIXELMASK_VNIR', 'QL_PIXELMASK_SWIR'

Spatial subsetting with window_focus::

>>> from rasterio.windows import Window
>>> 
>>> # Focus on a specific region
>>> window = Window(col_off=100, row_off=200, width=500, height=500)
>>> enmap_subset = EnMAP('/path/to/METADATA.XML', window_focus=window)

Cloud storage access::

>>> import gcsfs
>>> 
>>> fs = gcsfs.GCSFileSystem()
>>> enmap = EnMAP('gs://bucket/ENMAP-METADATA.XML', fs=fs)

References¶

DLR EnMAP Mission: https://www.enmap.org/
GFZ enpt Package: https://github.com/GFZ/enpt (metadata parser reference)
EnMAP Product Specification Document

Source code in georeader/readers/enmap.py

class EnMAP:
    """
    Reader for EnMAP (Environmental Mapping and Analysis Program) hyperspectral images.

    This class provides comprehensive functionality to read and manipulate EnMAP satellite
    imagery products from DLR. It handles the multi-file product structure (separate VNIR/SWIR
    GeoTIFFs with XML metadata), supporting operations like:

    - Loading radiance or reflectance data at specific wavelengths
    - Automatic handling of VNIR/SWIR sensor selection based on wavelength
    - Converting DN to radiance using gain/offset from metadata
    - Converting radiance to reflectance using solar irradiance
    - Applying Rational Polynomial Coefficients (RPCs) for refined geolocation
    - Loading quality masks (cloud, cirrus, snow, haze)

    EnMAP Data Model
    ----------------
    EnMAP L1B products are orthorectified (map-projected) GeoTIFFs with separate files
    for VNIR and SWIR bands:

        File Structure:
        ┌────────────────────────────────────────────────────┐
        │  METADATA.XML  ──→  wavelengths, FWHM, angles,    │
        │                      gain/offset, RPCs             │
        │                                                    │
        │  SPECTRAL_IMAGE_VNIR.TIF  ──→  (88, H, W) bands   │
        │  SPECTRAL_IMAGE_SWIR.TIF  ──→  (136, H, W) bands  │
        │                                                    │
        │  QL_QUALITY_*.TIF  ──→  quality masks              │
        └────────────────────────────────────────────────────┘

    Radiometric Conversion
    ----------------------
    DN to radiance conversion is automatic::

        L_λ = (GAIN × DN + OFFSET) × 1000   [mW/(m²·sr·nm)]

        Note: DLR gains are multiplicative (not divisive as in some sensors)

    Spectral Configuration
    ----------------------
    EnMAP has two detectors with overlapping coverage:

        Wavelength: 420nm ──── 1000nm ──── 2450nm
                    ├── VNIR ────┤
                              ├──── SWIR ──────────┤
                              └ overlap┘
                              900-1000nm

    - VNIR: Silicon CCD, ~88 bands, 6.5nm sampling, SNR >500:1
    - SWIR: HgCdTe, ~136 bands, 10nm sampling, SNR >150:1

    Attributes
    ----------
    xml_file : str
        Path to the EnMAP XML metadata file.
    by_folder : bool
        Whether files are organized by folder structure (alternative naming convention).
    swir_file : str
        Path to the SWIR GeoTIFF file (derived from xml_file).
    fs : fsspec.AbstractFileSystem
        Filesystem for file access (local or cloud storage).
    vnir : RasterioReader
        Reader for VNIR spectral image.
    swir : RasterioReader
        Reader for SWIR spectral image.
    wl_center : Dict[str, np.ndarray]
        Center wavelengths per sensor: {'vnir': [...], 'swir': [...]}.
    wl_fwhm : Dict[str, np.ndarray]
        FWHM per sensor: {'vnir': [...], 'swir': [...]}.
    gain_arr : Dict[str, np.ndarray]
        Radiometric gains per sensor for DN→radiance conversion.
    offs_arr : Dict[str, np.ndarray]
        Radiometric offsets per sensor for DN→radiance conversion.
    vnir_range : Tuple[float, float]
        VNIR wavelength range (min, max) including FWHM margins.
    swir_range : Tuple[float, float]
        SWIR wavelength range (min, max) including FWHM margins.
    hsf : float
        Mean ground elevation (m) from scene metadata.
    sza : float
        Solar zenith angle (degrees).
    saa : float
        Solar azimuth angle (degrees).
    vza : float
        View zenith angle (across-track off-nadir angle, degrees).
    vaa : float
        View azimuth angle (scene azimuth, degrees).
    rpcs_vnir : rasterio.rpc.RPC
        Rational Polynomial Coefficients for VNIR refined geolocation.
    rpcs_swir : rasterio.rpc.RPC
        Rational Polynomial Coefficients for SWIR refined geolocation.
    time_coverage_start : datetime
        UTC datetime of acquisition start.
    time_coverage_end : datetime
        UTC datetime of acquisition end.
    units : str
        Radiance units: 'mW/m2/sr/nm'.
    cache_radiance : bool
        Opt-in flag (default False). When True, the raw SWIR/VNIR spectral cubes
        read by ``load_product`` are cached in memory so repeated calls skip the
        disk read. Call ``clear_radiance_cache()`` to release them.

    Properties (from underlying readers)
    ------------------------------------
    shape : Tuple[int, int, int]
        Full shape (total_bands, height, width).
    transform : rasterio.Affine
        Affine geotransform from SWIR file.
    crs : rasterio.crs.CRS
        Coordinate reference system from SWIR file.
    bounds : Tuple[float, float, float, float]
        Geographic bounds (xmin, ymin, xmax, ymax).
    res : Tuple[float, float]
        Pixel resolution (x, y).

    Examples
    --------
    Basic loading::

        >>> from georeader.readers.enmap import EnMAP
        >>> 
        >>> enmap = EnMAP('/data/ENMAP01-...-METADATA.XML')
        >>> print(enmap)  # View metadata summary

    Loading specific wavelengths::

        >>> # Load NDVI bands as reflectance
        >>> bands = enmap.load_wavelengths([665, 865], as_reflectance=True)
        >>> print(bands.shape)  # (2, H, W)
        >>> 
        >>> # Compute NDVI
        >>> red, nir = bands.values[0], bands.values[1]
        >>> ndvi = (nir - red) / (nir + red + 1e-10)

    Loading RGB with RPC refinement::

        >>> # Apply RPCs for better geolocation (recommended)
        >>> rgb = enmap.load_rgb(as_reflectance=True, apply_rpcs=True)
        >>> 
        >>> # Without RPCs (uses original map projection)
        >>> rgb = enmap.load_rgb(as_reflectance=True, apply_rpcs=False)

    Loading quality masks::

        >>> # Load cloud mask
        >>> cloud = enmap.load_product('QL_QUALITY_CLOUD')
        >>> 
        >>> # Available products: 
        >>> # 'QL_QUALITY_CLOUD', 'QL_QUALITY_CIRRUS', 'QL_QUALITY_SNOW',
        >>> # 'QL_QUALITY_HAZE', 'QL_PIXELMASK_VNIR', 'QL_PIXELMASK_SWIR'

    Spatial subsetting with window_focus::

        >>> from rasterio.windows import Window
        >>> 
        >>> # Focus on a specific region
        >>> window = Window(col_off=100, row_off=200, width=500, height=500)
        >>> enmap_subset = EnMAP('/path/to/METADATA.XML', window_focus=window)

    Cloud storage access::

        >>> import gcsfs
        >>> 
        >>> fs = gcsfs.GCSFileSystem()
        >>> enmap = EnMAP('gs://bucket/ENMAP-METADATA.XML', fs=fs)

    See Also
    --------
    georeader.readers.emit.EMITImage : EMIT hyperspectral reader
    georeader.readers.prisma.PRISMA : PRISMA hyperspectral reader
    georeader.rasterio_reader.RasterioReader : Base reader for GeoTIFF
    georeader.read.read_rpcs : Apply RPC transformations

    References
    ----------
    - DLR EnMAP Mission: https://www.enmap.org/
    - GFZ enpt Package: https://github.com/GFZ/enpt (metadata parser reference)
    - EnMAP Product Specification Document
    """

    def __init__(
        self,
        xml_file: str,
        by_folder: bool = False,
        window_focus: Optional[Window] = None,
        fs: Optional[fsspec.AbstractFileSystem] = None,
        cache_radiance: bool = False,
    ) -> None:
        self.xml_file = xml_file
        self.by_folder = by_folder
        if not self.xml_file.endswith(".xml") and not self.xml_file.endswith(".XML"):
            raise ValueError(
                f"Invalid SWIR file path {self.xml_file} must be a XML file"
            )

        if self.by_folder:
            assert (
                PRODUCT_FOLDERS["METADATA"] in self.xml_file
            ), f"Invalid SWIR file path {self.xml_file} must contain {PRODUCT_FOLDERS['METADATA']} if by folder"
            self.swir_file = (
                self.xml_file.replace(
                    PRODUCT_FOLDERS["METADATA"], PRODUCT_FOLDERS["SPECTRAL_IMAGE_SWIR"]
                )
                .replace(".XML", ".TIF")
                .replace(".xml", ".tif")
            )
        else:
            assert (
                "METADATA" in self.xml_file
            ), f"Invalid SWIR file path {self.xml_file} must contain METADATA if not by folder"
            self.swir_file = (
                self.xml_file.replace("METADATA", "SPECTRAL_IMAGE_SWIR")
                .replace(".XML", ".TIF")
                .replace(".xml", ".tif")
            )

        if not self.swir_file.endswith(".tif") and not self.swir_file.endswith(".TIF"):
            raise ValueError(
                f"Invalid SWIR file path {self.swir_file} must be a TIF file"
            )

        if self.xml_file.startswith("gs://") or self.xml_file.startswith("az://"):
            assert fs is not None, "Filesystem must be provided if using cloud storage"
            self.fs = fs
            assert fs.exists(self.xml_file), f"File {self.xml_file} does not exist"
            assert fs.exists(self.swir_file), f"File {self.swir_file} does not exist"
        else:
            self.fs = fs or fsspec.filesystem("file")
            assert os.path.exists(self.xml_file), f"File {self.xml_file} does not exist"
            assert os.path.exists(
                self.swir_file
            ), f"File {self.swir_file} does not exist"

        self.swir = RasterioReader(self.swir_file, window_focus=window_focus)

        if self.by_folder:
            self.vnir = RasterioReader(
                self.swir_file.replace(
                    PRODUCT_FOLDERS["SPECTRAL_IMAGE_SWIR"],
                    PRODUCT_FOLDERS["SPECTRAL_IMAGE_VNIR"],
                ),
                window_focus=window_focus,
            )
        else:
            self.vnir = RasterioReader(
                self.swir_file.replace("SPECTRAL_IMAGE_SWIR", "SPECTRAL_IMAGE_VNIR"),
                window_focus=window_focus,
            )

        with self.fs.open(self.xml_file) as fh:
            (
                self.wl_center,
                self.wl_fwhm,
                self.hsf,
                self.sza,
                self.saa,
                self.vaa,
                self.vza,
                self.gain_arr,
                self.offs_arr,
                startTime,
                endTime,
                self.rpcs_vnir,
                self.rpcs_swir,
            ) = read_xml(fh)

        self.swir_range = (
            self.wl_center["swir"][0] - self.wl_fwhm["swir"][0],
            self.wl_center["swir"][-1] + self.wl_fwhm["swir"][-1],
        )
        self.vnir_range = (
            self.wl_center["vnir"][0] - self.wl_fwhm["vnir"][0],
            self.wl_center["vnir"][-1] + self.wl_fwhm["vnir"][-1],
        )

        self.units = "mW/m2/sr/nm"  # == W/m^2/SR/um
        self.time_coverage_start = startTime
        self.time_coverage_end = endTime
        self._observation_date_correction_factor: Optional[float] = None

        # Opt-in cache for the raw (DN) spectral cubes, keyed by sensor
        # ("swir" / "vnir"). Default off. See load_product / clear_radiance_cache.
        self.cache_radiance: bool = cache_radiance
        self._cache: Dict[str, GeoTensor] = {}

    @property
    def observation_date_correction_factor(self) -> float:
        if self._observation_date_correction_factor is None:
            self._observation_date_correction_factor = (
                reflectance.observation_date_correction_factor(
                    date_of_acquisition=self.time_coverage_start,
                    center_coords=self.footprint("EPSG:4326").centroid.coords[0],
                )
            )
        return self._observation_date_correction_factor

    @property
    def window_focus(self) -> Optional[Window]:
        return self.swir.window_focus

    @property
    def shape(self) -> tuple:
        return (
            len(self.wl_center["vnir"]) + len(self.wl_center["swir"]),
        ) + self.swir.shape[-2:]

    @property
    def transform(self):
        return self.swir.transform

    @property
    def crs(self):
        return self.swir.crs

    @property
    def res(self):
        return self.swir.res

    @property
    def width(self):
        return self.window_focus.width

    @property
    def height(self):
        return self.window_focus.height

    @property
    def bounds(self):
        return self.swir.bounds

    @property
    def fill_value_default(self):
        return self.swir.fill_value_default

    def footprint(self, crs: Optional[Any] = None) -> Any:
        return self.swir.footprint(crs=crs)

    def _load_spectral_image(self, name_coef: str) -> GeoTensor:
        """Load the raw (DN) spectral cube for ``"swir"`` or ``"vnir"``.

        Reuses the already-open ``self.swir`` / ``self.vnir`` readers (which point
        at the same files ``load_product`` would otherwise re-open). When
        ``cache_radiance`` is enabled the ``RasterioReader.load()`` result is
        cached so repeated ``load_product`` calls skip the disk read +
        decompression. The cached cube is the pre-conversion DN data; it is never
        returned to callers directly (``load_product`` always wraps it in a fresh
        converted GeoTensor), so caller-side mutations cannot corrupt the cache.

        Args:
            name_coef (str): ``"swir"`` or ``"vnir"``.

        Returns:
            GeoTensor: raw DN spectral cube.
        """
        if self.cache_radiance and name_coef in self._cache:
            return self._cache[name_coef]

        reader = self.swir if name_coef == "swir" else self.vnir
        raster = reader.load()

        if self.cache_radiance:
            self._cache[name_coef] = raster

        return raster

    def clear_radiance_cache(self) -> None:
        """Drop any cached raw spectral cubes.

        After this call the next ``load_product("SPECTRAL_IMAGE_SWIR"/"VNIR")``
        re-reads from disk. Intended to be called once all per-scene products are
        computed, to release the (hundreds of MB) spectral cube before the next
        scene is processed.
        """
        self._cache.clear()

    def load_product(self, product_name: str) -> GeoTensor:
        if product_name not in PRODUCT_FOLDERS:
            raise ValueError(f"Invalid product name: {product_name}")

        # Convert to radiance if SPECTRAL_IMAGE_SWIR or SPECTRAL_IMAGE_VNIR.
        # Spectral cubes reuse (and optionally cache) the persistent self.swir /
        # self.vnir readers; other products (quality masks) are read fresh.
        if product_name == "SPECTRAL_IMAGE_SWIR":
            name_coef = "swir"
            raster_product = self._load_spectral_image(name_coef)
        elif product_name == "SPECTRAL_IMAGE_VNIR":
            name_coef = "vnir"
            raster_product = self._load_spectral_image(name_coef)
        else:
            name_coef = None
            if self.by_folder:
                folder = PRODUCT_FOLDERS[product_name]
                product_path = self.swir_file.replace(
                    PRODUCT_FOLDERS["SPECTRAL_IMAGE_SWIR"], folder
                )
            else:
                product_path = self.swir_file.replace(
                    "SPECTRAL_IMAGE_SWIR", product_name
                )
            raster_product = RasterioReader(
                product_path, window_focus=self.window_focus
            ).load()

        # https://github.com/GFZ/enpt/blob/main/enpt/model/images/images_sensorgeo.py#L327
        # Lλ = QCAL * GAIN + OFFSET
        # NOTE: - DLR provides gains between 2000 and 10000, so we have to DEVIDE by gains
        #       - DLR gains / offsets are provided in W/m2/sr/nm, so we have to multiply by 1000 to get
        #         mW/m2/sr/nm as needed later
        if name_coef is not None:
            gain = self.gain_arr[name_coef]
            offset = self.offs_arr[name_coef]
            invalids = raster_product.values == raster_product.fill_value_default
            # Arithmetic on a GeoTensor returns a new (float) GeoTensor preserving
            # georeferencing; we cannot mutate .values in place here because the
            # dtype changes from integer QCAL to float radiance.
            raster_product = (
                gain[:, np.newaxis, np.newaxis] * raster_product
                + offset[:, np.newaxis, np.newaxis]
            ) * SC_COEFF
            raster_product.values[invalids] = self.fill_value_default

        return raster_product

    def load_wavelengths(
        self,
        wavelengths: Union[float, List[float], NDArray],
        as_reflectance: bool = True,
    ) -> Union[GeoTensor, NDArray]:
        """
        Load the reflectance of the given wavelengths

        Args:
            wavelengths (Union[float, List[float], NDArray]): List of wavelengths to load
            as_reflectance (bool, optional): return the values as reflectance rather than radiance.
                Defaults to True. If False values will have units of W/m^2/SR/um == mW/m2/sr/nm (`self.units`)

        Returns:
            Union[GeoTensor, NDArray]: GeoTensor with the values in reflectance or radiance units.

        Raises:
            ValueError: If any wavelength is outside the sensor's range.
        """
        if isinstance(wavelengths, Number):
            wavelengths = np.array([wavelengths])
        else:
            wavelengths = np.array(wavelengths)

        # Check all wavelengths are within the range of the sensor
        if any(
            [
                wvl < self.vnir_range[0] or wvl > self.swir_range[1]
                for wvl in wavelengths
            ]
        ):
            raise ValueError(
                f"Invalid wavelength range, must be between {self.vnir_range[0]} and {self.swir_range[1]}"
            )

        wavelengths_loaded = []
        fwhm = []
        ltoa_img = []
        for b in range(len(wavelengths)):
            if (
                wavelengths[b] >= self.swir_range[0]
                and wavelengths[b] < self.swir_range[1]
            ):
                index_band = np.argmin(np.abs(wavelengths[b] - self.wl_center["swir"]))
                fwhm.append(self.wl_fwhm["swir"][index_band])
                wavelengths_loaded.append(self.wl_center["swir"][index_band])
                rst = self.swir.isel({"band": [index_band]}).load().squeeze()
                invalids = (rst.values == rst.fill_value_default) | np.isnan(rst.values)

                # Convert to radiance
                gain = self.gain_arr["swir"][index_band]
                offset = self.offs_arr["swir"][index_band]
                img = (gain * rst.values + offset) * SC_COEFF
                img[invalids] = self.fill_value_default
            else:
                index_band = np.argmin(np.abs(wavelengths[b] - self.wl_center["vnir"]))
                fwhm.append(self.wl_fwhm["vnir"][index_band])
                wavelengths_loaded.append(self.wl_center["vnir"][index_band])
                rst = self.vnir.isel({"band": [index_band]}).load().squeeze()
                invalids = (rst.values == rst.fill_value_default) | np.isnan(rst.values)

                # Convert to radiance
                gain = self.gain_arr["vnir"][index_band]
                offset = self.offs_arr["vnir"][index_band]
                img = (gain * rst.values + offset) * SC_COEFF
                img[invalids] = self.fill_value_default

            ltoa_img.append(img)

        ltoa_img = GeoTensor(
            np.stack(ltoa_img, axis=0),
            transform=self.transform,
            crs=self.crs,
            fill_value_default=self.fill_value_default,
        )

        if as_reflectance:
            thuiller = reflectance.load_thuillier_irradiance()
            response = reflectance.srf(
                wavelengths_loaded, fwhm, thuiller["Nanometer"].values
            )

            solar_irradiance_norm = thuiller["Radiance(mW/m2/nm)"].values.dot(
                response
            )  # mW/m$^2$/SR/nm
            solar_irradiance_norm /= 1_000  # W/m$^2$/nm

            # Divide by 10 to convert from mW/m^2/SR/nm to µW /cm²/SR/nm
            ltoa_img = reflectance.radiance_to_reflectance(
                ltoa_img,
                solar_irradiance_norm,
                units=self.units,
                observation_date_corr_factor=self.observation_date_correction_factor,
            )

        return ltoa_img

    def load_rgb(
        self,
        as_reflectance: bool = True,
        apply_rpcs: bool = True,
        dst_crs: str = "EPSG:4326",
        resolution_dst_crs: Optional[Union[float, Tuple[float, float]]] = None,
    ) -> GeoTensor:
        """
        Load RGB image from VNIR bands. Converts radiance to TOA reflectance if as_reflectance is True
        otherwise it will return the radiance values in W/m^2/SR/um == mW/m2/sr/nm (`self.units`)

        Args:
            as_reflectance (bool, optional): Convert radiance to TOA reflectance. Defaults to True.
            apply_rpcs (bool, optional): Apply RPCs to the image. Defaults to True.
            dst_crs (str, optional): Destination CRS. Defaults to "EPSG:4326".
            resolution_dst_crs (Optional[Union[float, Tuple[float, float]]], optional):
                Resolution of the destination CRS. Defaults to None.
        Returns:
            GeoTensor: with the RGB image
        """
        rgb = self.load_wavelengths(WAVELENGTHS_RGB, as_reflectance=as_reflectance)
        if apply_rpcs:
            return read.read_rpcs(
                rgb.values,
                rpcs=self.rpcs_vnir,
                dst_crs=dst_crs,
                resolution_dst_crs=resolution_dst_crs,
                fill_value_default=rgb.fill_value_default,
            )
        elif dst_crs is not None:
            return read.read_to_crs(
                rgb, resolution_dst_crs=resolution_dst_crs, dst_crs=dst_crs
            )

        return rgb

    def load(self) -> GeoTensor:
        swir = self.load_product("SPECTRAL_IMAGE_SWIR")
        # vnir = self.load_product('SPECTRAL_IMAGE_VNIR')

        return swir

    def __repr__(self) -> str:
        return f"""
        File: {self.xml_file}
        Bounds: {self.bounds}
        Time: {self.time_coverage_start}
        Spatial shape (height, width): {self.height, self.width}
        VNIR Range: {self.vnir_range} nbands: {len(self.wl_center['vnir'])} 
        SWIR Range: {self.swir_range} nbands: {len(self.wl_center['swir'])}
        """

`clear_radiance_cache()` ¶

Drop any cached raw spectral cubes.

After this call the next load_product("SPECTRAL_IMAGE_SWIR"/"VNIR") re-reads from disk. Intended to be called once all per-scene products are computed, to release the (hundreds of MB) spectral cube before the next scene is processed.

Source code in georeader/readers/enmap.py

def clear_radiance_cache(self) -> None:
    """Drop any cached raw spectral cubes.

    After this call the next ``load_product("SPECTRAL_IMAGE_SWIR"/"VNIR")``
    re-reads from disk. Intended to be called once all per-scene products are
    computed, to release the (hundreds of MB) spectral cube before the next
    scene is processed.
    """
    self._cache.clear()

`load_rgb(as_reflectance=True, apply_rpcs=True, dst_crs='EPSG:4326', resolution_dst_crs=None)` ¶

Load RGB image from VNIR bands. Converts radiance to TOA reflectance if as_reflectance is True otherwise it will return the radiance values in W/m^2/SR/um == mW/m2/sr/nm (self.units)

Parameters:

Name	Type	Description	Default
`as_reflectance`	`bool`	Convert radiance to TOA reflectance. Defaults to True.	`True`
`apply_rpcs`	`bool`	Apply RPCs to the image. Defaults to True.	`True`
`dst_crs`	`str`	Destination CRS. Defaults to "EPSG:4326".	`'EPSG:4326'`
`resolution_dst_crs`	`Optional[Union[float, Tuple[float, float]]]`	Resolution of the destination CRS. Defaults to None.	`None`

Returns: GeoTensor: with the RGB image

Source code in georeader/readers/enmap.py

def load_rgb(
    self,
    as_reflectance: bool = True,
    apply_rpcs: bool = True,
    dst_crs: str = "EPSG:4326",
    resolution_dst_crs: Optional[Union[float, Tuple[float, float]]] = None,
) -> GeoTensor:
    """
    Load RGB image from VNIR bands. Converts radiance to TOA reflectance if as_reflectance is True
    otherwise it will return the radiance values in W/m^2/SR/um == mW/m2/sr/nm (`self.units`)

    Args:
        as_reflectance (bool, optional): Convert radiance to TOA reflectance. Defaults to True.
        apply_rpcs (bool, optional): Apply RPCs to the image. Defaults to True.
        dst_crs (str, optional): Destination CRS. Defaults to "EPSG:4326".
        resolution_dst_crs (Optional[Union[float, Tuple[float, float]]], optional):
            Resolution of the destination CRS. Defaults to None.
    Returns:
        GeoTensor: with the RGB image
    """
    rgb = self.load_wavelengths(WAVELENGTHS_RGB, as_reflectance=as_reflectance)
    if apply_rpcs:
        return read.read_rpcs(
            rgb.values,
            rpcs=self.rpcs_vnir,
            dst_crs=dst_crs,
            resolution_dst_crs=resolution_dst_crs,
            fill_value_default=rgb.fill_value_default,
        )
    elif dst_crs is not None:
        return read.read_to_crs(
            rgb, resolution_dst_crs=resolution_dst_crs, dst_crs=dst_crs
        )

    return rgb

`load_wavelengths(wavelengths, as_reflectance=True)` ¶

Load the reflectance of the given wavelengths

Parameters:

Name	Type	Description	Default
`wavelengths`	`Union[float, List[float], NDArray]`	List of wavelengths to load	required
`as_reflectance`	`bool`	return the values as reflectance rather than radiance. Defaults to True. If False values will have units of W/m^2/SR/um == mW/m2/sr/nm (`self.units`)	`True`

Returns:

Type	Description
`Union[GeoTensor, NDArray]`	Union[GeoTensor, NDArray]: GeoTensor with the values in reflectance or radiance units.

Raises:

Type	Description
`ValueError`	If any wavelength is outside the sensor's range.

Source code in georeader/readers/enmap.py

def load_wavelengths(
    self,
    wavelengths: Union[float, List[float], NDArray],
    as_reflectance: bool = True,
) -> Union[GeoTensor, NDArray]:
    """
    Load the reflectance of the given wavelengths

    Args:
        wavelengths (Union[float, List[float], NDArray]): List of wavelengths to load
        as_reflectance (bool, optional): return the values as reflectance rather than radiance.
            Defaults to True. If False values will have units of W/m^2/SR/um == mW/m2/sr/nm (`self.units`)

    Returns:
        Union[GeoTensor, NDArray]: GeoTensor with the values in reflectance or radiance units.

    Raises:
        ValueError: If any wavelength is outside the sensor's range.
    """
    if isinstance(wavelengths, Number):
        wavelengths = np.array([wavelengths])
    else:
        wavelengths = np.array(wavelengths)

    # Check all wavelengths are within the range of the sensor
    if any(
        [
            wvl < self.vnir_range[0] or wvl > self.swir_range[1]
            for wvl in wavelengths
        ]
    ):
        raise ValueError(
            f"Invalid wavelength range, must be between {self.vnir_range[0]} and {self.swir_range[1]}"
        )

    wavelengths_loaded = []
    fwhm = []
    ltoa_img = []
    for b in range(len(wavelengths)):
        if (
            wavelengths[b] >= self.swir_range[0]
            and wavelengths[b] < self.swir_range[1]
        ):
            index_band = np.argmin(np.abs(wavelengths[b] - self.wl_center["swir"]))
            fwhm.append(self.wl_fwhm["swir"][index_band])
            wavelengths_loaded.append(self.wl_center["swir"][index_band])
            rst = self.swir.isel({"band": [index_band]}).load().squeeze()
            invalids = (rst.values == rst.fill_value_default) | np.isnan(rst.values)

            # Convert to radiance
            gain = self.gain_arr["swir"][index_band]
            offset = self.offs_arr["swir"][index_band]
            img = (gain * rst.values + offset) * SC_COEFF
            img[invalids] = self.fill_value_default
        else:
            index_band = np.argmin(np.abs(wavelengths[b] - self.wl_center["vnir"]))
            fwhm.append(self.wl_fwhm["vnir"][index_band])
            wavelengths_loaded.append(self.wl_center["vnir"][index_band])
            rst = self.vnir.isel({"band": [index_band]}).load().squeeze()
            invalids = (rst.values == rst.fill_value_default) | np.isnan(rst.values)

            # Convert to radiance
            gain = self.gain_arr["vnir"][index_band]
            offset = self.offs_arr["vnir"][index_band]
            img = (gain * rst.values + offset) * SC_COEFF
            img[invalids] = self.fill_value_default

        ltoa_img.append(img)

    ltoa_img = GeoTensor(
        np.stack(ltoa_img, axis=0),
        transform=self.transform,
        crs=self.crs,
        fill_value_default=self.fill_value_default,
    )

    if as_reflectance:
        thuiller = reflectance.load_thuillier_irradiance()
        response = reflectance.srf(
            wavelengths_loaded, fwhm, thuiller["Nanometer"].values
        )

        solar_irradiance_norm = thuiller["Radiance(mW/m2/nm)"].values.dot(
            response
        )  # mW/m$^2$/SR/nm
        solar_irradiance_norm /= 1_000  # W/m$^2$/nm

        # Divide by 10 to convert from mW/m^2/SR/nm to µW /cm²/SR/nm
        ltoa_img = reflectance.radiance_to_reflectance(
            ltoa_img,
            solar_irradiance_norm,
            units=self.units,
            observation_date_corr_factor=self.observation_date_correction_factor,
        )

    return ltoa_img

Carbon Mapper Reader¶

The Carbon Mapper reader provides typed access to the Carbon Mapper STAC catalogue and plume API — atmospheric methane / carbon-dioxide retrievals from the Tanager-1, EMIT, AVIRIS, and GAO instruments. Carbon Mapper publishes:

L2B scenes (per-pixel CH4 column-matched-filter, RGB, uncertainty, artifact-mask) addressed by scene_id in the l2b-ch4-mfa-v3a STAC collection.
L3A per-plume rasters (alpha-banded delineated plume mask) addressed by plume_id in the l3a collection.
Source records — DBSCAN clusters of plumes detected at the same physical site, addressed by deterministic source_name.

Key features:

Token-aware HTTP client (obtain_token, refresh_token, download_asset) with file-based persistence (CarbonMapperConfig).
Typed query layer (CMTileItem, CMRawPlume, CMSource, exception hierarchy) — never returns raw dicts.
Lazy raster wrappers (CMImageRaster, CMPlumeRaster) backed by RasterioReader. CMPlumeRaster.polygon() extracts the authoritative plume polygon from the L3A plume_tif band-4 alpha mask — the upstream source of truth for plume geometry.
Cross-resolution helpers: get_tile_for_plume, get_source_for_plume, list_tiles_for_source, list_plumes_for_tile.

Optional install: the reader is gated behind the [carbonmapper] extra to keep the base install minimal:

pip install 'georeader-spaceml[carbonmapper]'

This pulls in pydantic (for CMRawPlume) and requests (for the API client). Azure SDK is intentionally not included — downstream consumers can layer keyvault-backed token loading on top of CarbonMapperConfig.

API Reference¶

High-level typed queries over the Carbon Mapper REST + STAC APIs.

This module is the typed, cross-resolution layer that sits between the raw HTTP wrappers in :mod:georeader.readers.carbonmapper.download and consumers (the Phase 2 DailyMonitoringCM ETL, analyst notebooks, future Partner-feed backfills).

Why this exists¶

:mod:download exposes ~16 low-level endpoint wrappers that return raw JSON / pandas DataFrames. Every consumer otherwise has to:

Pick the right endpoint (/catalog/plume-csv vs /catalog/plumes/annotated vs STAC search — all three have different schemas).
Parse the response into something usable.
Stitch resources together by hand: plume → scene_id via rsplit("-", 1)[0], scene_id → STAC item, plume → source via /catalog/source/plume/name/{plume_id}.

This module lifts those patterns into:

One function per logical question (not per HTTP endpoint).
Typed return values (:class:CMRawPlume, :class:CMTileItem, :class:CMSource) — never raw dicts.
Owned knowledge of the bbox-encoding (data_model §2.1) and source_name query-suffix (data_model §2.2) quirks.

Failure modes¶

The exception hierarchy is part of the contract:

:class:CMPlumeNotFound — get_plume 404.
:class:CMSourceNotFound — get_source 404.
:class:CMSceneNotPublished — get_tile / get_tile_for_plume 404 (CM publishes L2B selectively — data_model §5.2). The cross-resolution helper :func:get_tile_for_plume catches this and returns None; the single-resource :func:get_tile re-raises so callers can choose to defer.

Examples¶

"What does CM know about this plume?":

from georeader.readers.carbonmapper.api_queries import get_plume_context plume, tile, source = get_plume_context(token, "tan20251212t185057c20s4001-E") plume.plume_id 'tan20251212t185057c20s4001-E' tile.scene_id if tile else None 'tan20251212t185057c20s4001' source.sector if source else None # may be None if unattributed '1B2'

"All tiles ever observing this chronic emitter":

from georeader.readers.carbonmapper.api_queries import list_tiles_for_source tiles = list_tiles_for_source(token, "CH4_1B2_100m_-104.17525_32.49125") {t.platform for t in tiles}

`CMTileItem` `dataclass` ¶

Lightweight Carbon Mapper L2B STAC item — API-only, no DB binding.

The DB-bound counterpart is CarbonMapperTile (Phase 1). The promotion direction (API → DB) lives on the DB side via CarbonMapperTile.from_cm_tile_item(item, cm_provider=...); this keeps :mod:api_queries free of any database imports.

Frozen so instances are hashable and safe to use as dict keys when deduplicating scene_ids in cross-resolution queries.

Attributes¶

scene_id: STAC item id — equivalent to plume_id.rsplit("-", 1)[0] for plumes that came from this scene. collection: STAC collection id, e.g. "l2b-ch4-mfa-v3a". datetime: UTC-aware acquisition time parsed from properties["datetime"]. platform: properties["platform"] — "Tanager1", "EMIT", etc. bbox: (W, S, E, N) in WGS-84 decimal degrees. geometry: Shapely geometry (typically a Polygon) of the scene footprint. asset_urls: Mapping of asset name → href URL, e.g. {"cmf": "https://.../cmf.tif", "rgb": ...}. The L2B CH4 collection consistently exposes cmf, rgb, uncertainty, and artifact-mask. properties: Full properties mapping from the STAC item. raw: Original STAC item dict — useful for fields not yet exposed on the dataclass.

Examples¶

from georeader.readers.carbonmapper.api_queries import CMTileItem tile = CMTileItem.from_stac_item({ ... "id": "tan20251212t185057c20s4001", ... "collection": "l2b-ch4-mfa-v3a", ... "properties": {"datetime": "2025-12-12T18:50:57Z", ... "platform": "Tanager1"}, ... "bbox": [-103.6, 31.4, -103.4, 31.6], ... "geometry": {"type": "Polygon", "coordinates": [ ... [[-103.6, 31.4], [-103.4, 31.4], ... [-103.4, 31.6], [-103.6, 31.6], [-103.6, 31.4]]]}, ... "assets": {"cmf": {"href": "https://cm/.../cmf.tif"}}, ... }) tile.scene_id, tile.platform ('tan20251212t185057c20s4001', 'Tanager1') tile.asset_urls["cmf"] 'https://cm/.../cmf.tif'

Source code in georeader/readers/carbonmapper/api_queries.py

@dataclass(frozen=True)
class CMTileItem:
    """Lightweight Carbon Mapper L2B STAC item — API-only, no DB binding.

    The DB-bound counterpart is ``CarbonMapperTile`` (Phase 1). The
    promotion direction (API → DB) lives on the *DB* side via
    ``CarbonMapperTile.from_cm_tile_item(item, cm_provider=...)``; this
    keeps :mod:`api_queries` free of any database imports.

    Frozen so instances are hashable and safe to use as dict keys when
    deduplicating ``scene_ids`` in cross-resolution queries.

    Attributes
    ----------
    scene_id:
        STAC item id — equivalent to ``plume_id.rsplit("-", 1)[0]`` for
        plumes that came from this scene.
    collection:
        STAC collection id, e.g. ``"l2b-ch4-mfa-v3a"``.
    datetime:
        UTC-aware acquisition time parsed from
        ``properties["datetime"]``.
    platform:
        ``properties["platform"]`` — ``"Tanager1"``, ``"EMIT"``, etc.
    bbox:
        ``(W, S, E, N)`` in WGS-84 decimal degrees.
    geometry:
        Shapely geometry (typically a Polygon) of the scene footprint.
    asset_urls:
        Mapping of asset name → href URL, e.g.
        ``{"cmf": "https://.../cmf.tif", "rgb": ...}``. The L2B CH4
        collection consistently exposes ``cmf``, ``rgb``,
        ``uncertainty``, and ``artifact-mask``.
    properties:
        Full ``properties`` mapping from the STAC item.
    raw:
        Original STAC item dict — useful for fields not yet exposed
        on the dataclass.

    Examples
    --------
    >>> from georeader.readers.carbonmapper.api_queries import CMTileItem
    >>> tile = CMTileItem.from_stac_item({
    ...     "id": "tan20251212t185057c20s4001",
    ...     "collection": "l2b-ch4-mfa-v3a",
    ...     "properties": {"datetime": "2025-12-12T18:50:57Z",
    ...                    "platform": "Tanager1"},
    ...     "bbox": [-103.6, 31.4, -103.4, 31.6],
    ...     "geometry": {"type": "Polygon", "coordinates": [
    ...         [[-103.6, 31.4], [-103.4, 31.4],
    ...          [-103.4, 31.6], [-103.6, 31.6], [-103.6, 31.4]]]},
    ...     "assets": {"cmf": {"href": "https://cm/.../cmf.tif"}},
    ... })
    >>> tile.scene_id, tile.platform
    ('tan20251212t185057c20s4001', 'Tanager1')
    >>> tile.asset_urls["cmf"]
    'https://cm/.../cmf.tif'
    """

    scene_id: str
    collection: str
    datetime: datetime
    platform: str
    bbox: tuple[float, float, float, float]
    geometry: BaseGeometry
    asset_urls: Mapping[str, str]
    properties: Mapping[str, Any]
    raw: Mapping[str, Any]

    @classmethod
    def from_stac_item(cls, item: Mapping[str, Any]) -> "CMTileItem":
        """Build a :class:`CMTileItem` from a raw STAC item dict.

        Tolerates both string and pre-parsed datetime values for
        ``properties["datetime"]`` and falls back to ``utcnow`` if the
        property is missing entirely.

        Parameters
        ----------
        item:
            STAC item dict (Feature shape) as returned by
            :func:`georeader.readers.carbonmapper.download.stac_get_item` or
            :func:`georeader.readers.carbonmapper.download.stac_search`.

        Returns
        -------
        CMTileItem

        Raises
        ------
        ValueError
            If ``item["bbox"]`` is missing or not 4-length.
        """
        props = dict(item.get("properties") or {})
        bbox = tuple(item.get("bbox") or ())
        if len(bbox) != 4:
            raise ValueError(f"STAC item missing 4-tuple bbox: {item.get('id')!r}")

        dt_raw = props.get("datetime")
        if isinstance(dt_raw, datetime):
            dt = dt_raw
        elif isinstance(dt_raw, str):
            dt = datetime.fromisoformat(dt_raw.replace("Z", "+00:00"))
        else:
            dt = datetime.now(timezone.utc)

        geom_dict = item.get("geometry") or {}
        if not geom_dict:
            raise ValueError(f"STAC item missing geometry: {item.get('id')!r}")
        geom = shape(geom_dict)  # type: ignore[arg-type]

        assets = item.get("assets") or {}
        asset_urls = {
            name: asset.get("href", "")
            for name, asset in assets.items()
            if isinstance(asset, Mapping)
        }

        return cls(
            scene_id=str(item.get("id", "")),
            collection=str(item.get("collection", "")),
            datetime=dt,
            platform=str(props.get("platform", "")),
            bbox=(float(bbox[0]), float(bbox[1]), float(bbox[2]), float(bbox[3])),
            geometry=geom,
            asset_urls=asset_urls,
            properties=props,
            raw=dict(item),
        )

`from_stac_item(item)` `classmethod` ¶

Build a :class:CMTileItem from a raw STAC item dict.

Tolerates both string and pre-parsed datetime values for properties["datetime"] and falls back to utcnow if the property is missing entirely.

Parameters¶

item: STAC item dict (Feature shape) as returned by :func:georeader.readers.carbonmapper.download.stac_get_item or :func:georeader.readers.carbonmapper.download.stac_search.

Returns¶

CMTileItem

Raises¶

ValueError If item["bbox"] is missing or not 4-length.

Source code in georeader/readers/carbonmapper/api_queries.py

@classmethod
def from_stac_item(cls, item: Mapping[str, Any]) -> "CMTileItem":
    """Build a :class:`CMTileItem` from a raw STAC item dict.

    Tolerates both string and pre-parsed datetime values for
    ``properties["datetime"]`` and falls back to ``utcnow`` if the
    property is missing entirely.

    Parameters
    ----------
    item:
        STAC item dict (Feature shape) as returned by
        :func:`georeader.readers.carbonmapper.download.stac_get_item` or
        :func:`georeader.readers.carbonmapper.download.stac_search`.

    Returns
    -------
    CMTileItem

    Raises
    ------
    ValueError
        If ``item["bbox"]`` is missing or not 4-length.
    """
    props = dict(item.get("properties") or {})
    bbox = tuple(item.get("bbox") or ())
    if len(bbox) != 4:
        raise ValueError(f"STAC item missing 4-tuple bbox: {item.get('id')!r}")

    dt_raw = props.get("datetime")
    if isinstance(dt_raw, datetime):
        dt = dt_raw
    elif isinstance(dt_raw, str):
        dt = datetime.fromisoformat(dt_raw.replace("Z", "+00:00"))
    else:
        dt = datetime.now(timezone.utc)

    geom_dict = item.get("geometry") or {}
    if not geom_dict:
        raise ValueError(f"STAC item missing geometry: {item.get('id')!r}")
    geom = shape(geom_dict)  # type: ignore[arg-type]

    assets = item.get("assets") or {}
    asset_urls = {
        name: asset.get("href", "")
        for name, asset in assets.items()
        if isinstance(asset, Mapping)
    }

    return cls(
        scene_id=str(item.get("id", "")),
        collection=str(item.get("collection", "")),
        datetime=dt,
        platform=str(props.get("platform", "")),
        bbox=(float(bbox[0]), float(bbox[1]), float(bbox[2]), float(bbox[3])),
        geometry=geom,
        asset_urls=asset_urls,
        properties=props,
        raw=dict(item),
    )

`CMAPIError` ¶

Bases: Exception

Base for everything raised by :mod:api_queries.

Catch this to handle any expected Carbon Mapper API miss in one block. requests.HTTPError for non-404 statuses (e.g. 500, 429) propagates unchanged — those are infra issues, not data issues.

Source code in georeader/readers/carbonmapper/api_queries.py

class CMAPIError(Exception):
    """Base for everything raised by :mod:`api_queries`.

    Catch this to handle any expected Carbon Mapper API miss in one
    block. ``requests.HTTPError`` for non-404 statuses (e.g. 500, 429)
    propagates unchanged — those are infra issues, not data issues.
    """

`CMPlumeNotFound` ¶

Bases: CMAPIError

Raised by :func:get_plume when the plume is unknown to CM.

The unmodified plume_id is preserved on the instance for logging.

Examples¶

try: ... get_plume(token, "tan-does-not-exist") # doctest: +SKIP ... except CMPlumeNotFound as exc: ... log.warning("missing plume", plume_id=exc.plume_id)

Source code in georeader/readers/carbonmapper/api_queries.py

class CMPlumeNotFound(CMAPIError):
    """Raised by :func:`get_plume` when the plume is unknown to CM.

    The unmodified ``plume_id`` is preserved on the instance for
    logging.

    Examples
    --------
    >>> try:
    ...     get_plume(token, "tan-does-not-exist")  # doctest: +SKIP
    ... except CMPlumeNotFound as exc:
    ...     log.warning("missing plume", plume_id=exc.plume_id)
    """

    def __init__(self, plume_id: str):
        super().__init__(f"Plume not found: {plume_id}")
        self.plume_id = plume_id

`CMSceneNotPublished` ¶

Bases: CMAPIError

Raised when STAC has no L2B item for a given scene_id.

Carbon Mapper publishes L2B selectively (data_model.md §5.2): plumes can exist for scenes whose L2B raster has not been (or never will be) released. The Phase 2 promotion path defers such plumes rather than failing hard.

The :func:get_tile single-resource fetcher raises this so callers can pick a strategy; the cross-resolution :func:get_tile_for_plume catches it and returns None.

Source code in georeader/readers/carbonmapper/api_queries.py

class CMSceneNotPublished(CMAPIError):
    """Raised when STAC has no L2B item for a given ``scene_id``.

    Carbon Mapper publishes L2B selectively (``data_model.md §5.2``):
    plumes can exist for scenes whose L2B raster has not been (or never
    will be) released. The Phase 2 promotion path defers such plumes
    rather than failing hard.

    The :func:`get_tile` single-resource fetcher *raises* this so
    callers can pick a strategy; the cross-resolution
    :func:`get_tile_for_plume` *catches* it and returns ``None``.
    """

    def __init__(self, scene_id: str):
        super().__init__(f"L2B scene not published: {scene_id}")
        self.scene_id = scene_id

`CMSourceNotFound` ¶

Bases: CMAPIError

Raised by :func:get_source when the source name is unknown.

The (cleaned, query-suffix-stripped) source_name is preserved on the instance.

Source code in georeader/readers/carbonmapper/api_queries.py

class CMSourceNotFound(CMAPIError):
    """Raised by :func:`get_source` when the source name is unknown.

    The (cleaned, query-suffix-stripped) ``source_name`` is preserved
    on the instance.
    """

    def __init__(self, source_name: str):
        super().__init__(f"Source not found: {source_name}")
        self.source_name = source_name

`get_tile(token, scene_id, *, collection=DEFAULT_L2B_COLLECTION)` ¶

Fetch a single L2B STAC item by scene_id.

Wraps GET /stac/collections/{collection}/items/{scene_id}.

Parameters¶

token: Bearer token (STAC item endpoints accept anonymous reads for published items, but auth surfaces additional fields). scene_id: The L2B scene_id, equal to plume_id.rsplit("-", 1)[0] for any plume that came from this scene. collection: STAC collection — defaults to :data:DEFAULT_L2B_COLLECTION (CH4 matched-filter v3a). Override for CO2 or earlier versions.

Returns¶

CMTileItem

Raises¶

CMSceneNotPublished When the L2B item has not been published yet (HTTP 404). Re-raised — not caught — so callers can choose to defer.

Examples¶

tile = get_tile(token, "tan20251212t185057c20s4001") # doctest: +SKIP tile.platform, list(tile.asset_urls) ('Tanager1', ['cmf', 'rgb', 'uncertainty', 'artifact-mask'])

Source code in georeader/readers/carbonmapper/api_queries.py

def get_tile(
    token: str,
    scene_id: str,
    *,
    collection: str = DEFAULT_L2B_COLLECTION,
) -> CMTileItem:
    """Fetch a single L2B STAC item by ``scene_id``.

    Wraps ``GET /stac/collections/{collection}/items/{scene_id}``.

    Parameters
    ----------
    token:
        Bearer token (STAC item endpoints accept anonymous reads for
        published items, but auth surfaces additional fields).
    scene_id:
        The L2B scene_id, equal to ``plume_id.rsplit("-", 1)[0]`` for
        any plume that came from this scene.
    collection:
        STAC collection — defaults to :data:`DEFAULT_L2B_COLLECTION`
        (CH4 matched-filter v3a). Override for CO2 or earlier versions.

    Returns
    -------
    CMTileItem

    Raises
    ------
    CMSceneNotPublished
        When the L2B item has not been published yet (HTTP 404).
        Re-raised — not caught — so callers can choose to defer.

    Examples
    --------
    >>> tile = get_tile(token, "tan20251212t185057c20s4001")  # doctest: +SKIP
    >>> tile.platform, list(tile.asset_urls)
    ('Tanager1', ['cmf', 'rgb', 'uncertainty', 'artifact-mask'])
    """
    try:
        raw = _dl.stac_get_item(collection, scene_id, token=token)
    except requests.HTTPError as exc:
        if _is_404(exc):
            raise CMSceneNotPublished(scene_id) from exc
        raise
    return CMTileItem.from_stac_item(raw)

`get_plume(token, plume_id)` ¶

Fetch a single plume by its CM plume_id.

Wraps GET /catalog/plume/{id} and parses the result through :class:CMRawPlume.

Parameters¶

token: Carbon Mapper Bearer token. Required for non-public fields. plume_id: Either the colloquial name (e.g. "tan20251212t185057c20s4001-E") or the UUID form.

Returns¶

CMRawPlume

Raises¶

CMPlumeNotFound When the API returns 404. requests.HTTPError For non-404 errors (5xx, 429, etc.).

Examples¶

plume = get_plume(token, "tan20251212t185057c20s4001-E") # doctest: +SKIP plume.plume_id, plume.gas ('tan20251212t185057c20s4001-E', 'CH4')

Source code in georeader/readers/carbonmapper/api_queries.py

def get_plume(token: str, plume_id: str) -> CMRawPlume:
    """Fetch a single plume by its CM ``plume_id``.

    Wraps ``GET /catalog/plume/{id}`` and parses the result through
    :class:`CMRawPlume`.

    Parameters
    ----------
    token:
        Carbon Mapper Bearer token. Required for non-public fields.
    plume_id:
        Either the colloquial name (e.g.
        ``"tan20251212t185057c20s4001-E"``) or the UUID form.

    Returns
    -------
    CMRawPlume

    Raises
    ------
    CMPlumeNotFound
        When the API returns 404.
    requests.HTTPError
        For non-404 errors (5xx, 429, etc.).

    Examples
    --------
    >>> plume = get_plume(token, "tan20251212t185057c20s4001-E")  # doctest: +SKIP
    >>> plume.plume_id, plume.gas
    ('tan20251212t185057c20s4001-E', 'CH4')
    """
    try:
        raw = _dl.get_plume_by_id(plume_id, token=token)
    except requests.HTTPError as exc:
        if _is_404(exc):
            raise CMPlumeNotFound(plume_id) from exc
        raise
    return CMRawPlume(**raw)

`get_source(token, source_name)` ¶

Fetch a single Carbon Mapper source by its canonical name.

Strips the source-name query-string suffix (?plume_gas=...) automatically (data_model §2.2) — pass either the dirty or clean form.

Parameters¶

token: Bearer token. source_name: Canonical or query-suffixed source name, e.g. "CH4_1B2_100m_-104.17525_32.49125" or "CH4_1B2_100m_-104.17525_32.49125?plume_gas=CH4".

Returns¶

CMSource

Raises¶

CMSourceNotFound When the API returns 404.

Examples¶

src = get_source(token, "CH4_1B2_100m_-104.17525_32.49125") # doctest: +SKIP src.sector, src.plume_count ('1B2', 12)

Source code in georeader/readers/carbonmapper/api_queries.py

def get_source(token: str, source_name: str) -> CMSource:
    """Fetch a single Carbon Mapper source by its canonical name.

    Strips the source-name query-string suffix (``?plume_gas=...``)
    automatically (``data_model §2.2``) — pass either the dirty or
    clean form.

    Parameters
    ----------
    token:
        Bearer token.
    source_name:
        Canonical or query-suffixed source name, e.g.
        ``"CH4_1B2_100m_-104.17525_32.49125"`` or
        ``"CH4_1B2_100m_-104.17525_32.49125?plume_gas=CH4"``.

    Returns
    -------
    CMSource

    Raises
    ------
    CMSourceNotFound
        When the API returns 404.

    Examples
    --------
    >>> src = get_source(token, "CH4_1B2_100m_-104.17525_32.49125")  # doctest: +SKIP
    >>> src.sector, src.plume_count
    ('1B2', 12)
    """
    cleaned = _strip_query_suffix(source_name)
    try:
        raw = _dl.get_source_by_name(cleaned, token=token)
    except requests.HTTPError as exc:
        if _is_404(exc):
            raise CMSourceNotFound(cleaned) from exc
        raise
    # The single-source endpoint can return either a Feature or properties
    # directly; coerce to a Feature shape so CMSource.from_geojson_feature
    # handles both.
    if "properties" not in raw and "source_name" in raw:
        feature = {"properties": dict(raw),
                   "geometry": {"type": "Point",
                                "coordinates": [raw.get("lon"), raw.get("lat")]}}
    else:
        feature = dict(raw)

    # The /catalog/source/{name} endpoint sometimes returns top-level
    # geometry with null coords and stashes the real centroid under
    # properties.point — fall back to that when the outer geometry is
    # unusable.
    geom = feature.get("geometry") or {}
    coords = geom.get("coordinates") or [None, None]
    if not coords or coords[0] is None or coords[1] is None:
        props = feature.get("properties") or {}
        point = props.get("point") or {}
        pcoords = point.get("coordinates") if isinstance(point, dict) else None
        if pcoords and pcoords[0] is not None and pcoords[1] is not None:
            feature = dict(feature)
            feature["geometry"] = {"type": "Point", "coordinates": list(pcoords)}

    return CMSource.from_geojson_feature(feature)

`get_tile_for_plume(token, plume_id, *, collection=DEFAULT_L2B_COLLECTION)` ¶

Resolve a plume to its parent L2B STAC item.

Derives the parent scene_id via plume_id.rsplit("-", 1)[0] and looks up the corresponding STAC item.

Unlike :func:get_tile, this helper catches :class:CMSceneNotPublished and returns None — appropriate for consumers (Phase 2 ETL) that want to defer rather than error.

Parameters¶

token: Bearer token. plume_id: Colloquial plume id (with the -{part} suffix). collection: STAC collection — defaults to :data:DEFAULT_L2B_COLLECTION.

Returns¶

CMTileItem | None None when the L2B scene has not been published yet.

Examples¶

tile = get_tile_for_plume(token, "tan20251212t185057c20s4001-E") # doctest: +SKIP tile.scene_id if tile else "deferred" # doctest: +SKIP 'tan20251212t185057c20s4001'

Source code in georeader/readers/carbonmapper/api_queries.py

def get_tile_for_plume(
    token: str,
    plume_id: str,
    *,
    collection: str = DEFAULT_L2B_COLLECTION,
) -> CMTileItem | None:
    """Resolve a plume to its parent L2B STAC item.

    Derives the parent ``scene_id`` via
    ``plume_id.rsplit("-", 1)[0]`` and looks up the corresponding
    STAC item.

    Unlike :func:`get_tile`, this helper **catches**
    :class:`CMSceneNotPublished` and returns ``None`` — appropriate for
    consumers (Phase 2 ETL) that want to defer rather than error.

    Parameters
    ----------
    token:
        Bearer token.
    plume_id:
        Colloquial plume id (with the ``-{part}`` suffix).
    collection:
        STAC collection — defaults to :data:`DEFAULT_L2B_COLLECTION`.

    Returns
    -------
    CMTileItem | None
        ``None`` when the L2B scene has not been published yet.

    Examples
    --------
    >>> tile = get_tile_for_plume(token, "tan20251212t185057c20s4001-E")  # doctest: +SKIP
    >>> tile.scene_id if tile else "deferred"  # doctest: +SKIP
    'tan20251212t185057c20s4001'
    """
    scene_id = _scene_id_from_plume(plume_id)
    try:
        return get_tile(token, scene_id, collection=collection)
    except CMSceneNotPublished:
        return None

`get_source_for_plume(token, plume_id)` ¶

Resolve a plume to its attributed Carbon Mapper source.

Wraps /catalog/source/plume/name/{plume_id} — the by-name endpoint, which returns the cleaned source_name (preferred over the UUID-keyed sibling for colloquial plume_id strings).

Returns None when CM has not attributed the plume to a source (HTTP 404). Other HTTP errors propagate.

Parameters¶

token: Bearer token. plume_id: Colloquial plume id.

Returns¶

CMSource | None

Examples¶

src = get_source_for_plume(token, "tan20251212t185057c20s4001-E") # doctest: +SKIP src.source_name if src else "unattributed" # doctest: +SKIP 'CH4_1B2_100m_-104.0_32.0'

Source code in georeader/readers/carbonmapper/api_queries.py

def get_source_for_plume(
    token: str,
    plume_id: str,
) -> CMSource | None:
    """Resolve a plume to its attributed Carbon Mapper source.

    Wraps ``/catalog/source/plume/name/{plume_id}`` — the *by-name*
    endpoint, which returns the cleaned ``source_name`` (preferred over
    the UUID-keyed sibling for colloquial ``plume_id`` strings).

    Returns ``None`` when CM has not attributed the plume to a source
    (HTTP 404). Other HTTP errors propagate.

    Parameters
    ----------
    token:
        Bearer token.
    plume_id:
        Colloquial plume id.

    Returns
    -------
    CMSource | None

    Examples
    --------
    >>> src = get_source_for_plume(token, "tan20251212t185057c20s4001-E")  # doctest: +SKIP
    >>> src.source_name if src else "unattributed"  # doctest: +SKIP
    'CH4_1B2_100m_-104.0_32.0'
    """
    try:
        raw = _dl.get_source_for_plume_name(plume_id, token=token)
    except requests.HTTPError as exc:
        if _is_404(exc):
            return None
        raise
    if not raw:
        return None
    if "geometry" not in raw and "properties" not in raw:
        feature = {
            "properties": dict(raw),
            "geometry": {"type": "Point",
                         "coordinates": [raw.get("lon"), raw.get("lat")]},
        }
    else:
        feature = dict(raw)

    # The endpoint occasionally returns a plume-shaped payload (no
    # source_name, null top-level geometry) when CM has not yet
    # attributed the plume — treat as unattributed.
    props = feature.get("properties") or {}
    if not props.get("source_name") and not feature.get("source_name"):
        return None

    # Fall back to properties.point when the outer geometry is null
    # (same quirk as get_source).
    geom = feature.get("geometry") or {}
    coords = geom.get("coordinates") or [None, None]
    if not coords or coords[0] is None or coords[1] is None:
        point = props.get("point") or {}
        pcoords = point.get("coordinates") if isinstance(point, dict) else None
        if pcoords and pcoords[0] is not None and pcoords[1] is not None:
            feature["geometry"] = {"type": "Point", "coordinates": list(pcoords)}

    return CMSource.from_geojson_feature(feature)

`get_plume_context(token, plume_id)` ¶

Single-call fetch of a plume plus its parent tile and source.

The most common notebook / ETL question is "give me everything CM knows about this plume". This helper batches the three independent REST/STAC calls behind a single name and surfaces the contracts as a typed tuple.

Failure modes are asymmetric:

The plume itself must exist — CMPlumeNotFound propagates.
Tile resolution returns None when the scene has not been published to L2B (CMSceneNotPublished caught internally).
Source resolution returns None when CM has not attributed the plume (404 caught internally).

Parameters¶

token: Bearer token. plume_id: Colloquial plume id.

Returns¶

(CMRawPlume, CMTileItem | None, CMSource | None)

Raises¶

CMPlumeNotFound When the plume itself is unknown.

Examples¶

Notebook exploration:

plume, tile, source = get_plume_context( # doctest: +SKIP ... token, "tan20251212t185057c20s4001-E", ... ) print(f"emission: {plume.emission_auto:.0f} kg/h") # doctest: +SKIP emission: 1240 kg/h if source: # doctest: +SKIP ... print(f"source {source.source_name} sector {source.sector}")

Source code in georeader/readers/carbonmapper/api_queries.py

def get_plume_context(
    token: str,
    plume_id: str,
) -> tuple[CMRawPlume, CMTileItem | None, CMSource | None]:
    """Single-call fetch of a plume plus its parent tile and source.

    The most common notebook / ETL question is *"give me everything CM
    knows about this plume"*. This helper batches the three independent
    REST/STAC calls behind a single name and surfaces the contracts as
    a typed tuple.

    Failure modes are asymmetric:

    - The plume itself **must** exist — ``CMPlumeNotFound`` propagates.
    - Tile resolution returns ``None`` when the scene has not been
      published to L2B (``CMSceneNotPublished`` caught internally).
    - Source resolution returns ``None`` when CM has not attributed
      the plume (404 caught internally).

    Parameters
    ----------
    token:
        Bearer token.
    plume_id:
        Colloquial plume id.

    Returns
    -------
    (CMRawPlume, CMTileItem | None, CMSource | None)

    Raises
    ------
    CMPlumeNotFound
        When the plume itself is unknown.

    Examples
    --------
    Notebook exploration:

    >>> plume, tile, source = get_plume_context(  # doctest: +SKIP
    ...     token, "tan20251212t185057c20s4001-E",
    ... )
    >>> print(f"emission: {plume.emission_auto:.0f} kg/h")  # doctest: +SKIP
    emission: 1240 kg/h
    >>> if source:                                          # doctest: +SKIP
    ...     print(f"source {source.source_name} sector {source.sector}")
    """
    plume = get_plume(token, plume_id)
    tile = get_tile_for_plume(token, plume_id)
    source = get_source_for_plume(token, plume_id)
    return plume, tile, source

`list_tiles(token, *, bbox=None, datetime_min=None, datetime_max=None, collection=DEFAULT_L2B_COLLECTION, limit=1000)` ¶

Materialised list of L2B STAC items matching filters.

Wraps /stac/search (comma-joined STAC bbox encoding).

Parameters¶

token: Bearer token. bbox: (W, S, E, N) WGS-84 spatial filter. datetime_min, datetime_max: Optional UTC bounds. collection: STAC collection — defaults to :data:DEFAULT_L2B_COLLECTION. limit: Max items in this call.

Returns¶

list[CMTileItem]

Examples¶

tiles = list_tiles( # doctest: +SKIP ... token, bbox=(-104.5, 31.0, -101.5, 33.5), limit=10, ... ) {t.platform for t in tiles} # doctest: +SKIP

Source code in georeader/readers/carbonmapper/api_queries.py

def list_tiles(
    token: str,
    *,
    bbox: BBox | None = None,
    datetime_min: datetime | None = None,
    datetime_max: datetime | None = None,
    collection: str = DEFAULT_L2B_COLLECTION,
    limit: int = 1_000,
) -> list[CMTileItem]:
    """Materialised list of L2B STAC items matching filters.

    Wraps ``/stac/search`` (comma-joined STAC bbox encoding).

    Parameters
    ----------
    token:
        Bearer token.
    bbox:
        ``(W, S, E, N)`` WGS-84 spatial filter.
    datetime_min, datetime_max:
        Optional UTC bounds.
    collection:
        STAC collection — defaults to :data:`DEFAULT_L2B_COLLECTION`.
    limit:
        Max items in this call.

    Returns
    -------
    list[CMTileItem]

    Examples
    --------
    >>> tiles = list_tiles(  # doctest: +SKIP
    ...     token, bbox=(-104.5, 31.0, -101.5, 33.5), limit=10,
    ... )
    >>> {t.platform for t in tiles}  # doctest: +SKIP
    {'Tanager1', 'EMIT'}
    """
    dt_range = _build_datetime_range(datetime_min, datetime_max)
    result = _dl.stac_search(
        collections=[collection],
        bbox=bbox,
        datetime_range=dt_range,
        limit=limit,
        token=token,
    )
    features = result.get("features", []) if isinstance(result, Mapping) else []
    return [CMTileItem.from_stac_item(f) for f in features]

`list_plumes(token, *, bbox=None, sectors=None, instruments=None, datetime_min=None, datetime_max=None, gas=Gas.CH4, limit=1000)` ¶

Materialised list of plumes matching filters.

Wraps /catalog/plumes/annotated and converts each row into a :class:CMRawPlume. The bbox is encoded as repeated keys (REST style — see :func:georeader.readers.carbonmapper.download._rest_bbox_params).

Parameters¶

token: Bearer token. bbox: (W, S, E, N) WGS-84 spatial filter. sectors: IPCC sector codes — e.g. ["1B2", "6A"]. instruments: Instrument short codes — e.g. ["emi", "tan"] or :class:Instrument members like [Instrument.EMIT, Instrument.TANAGER]. datetime_min, datetime_max: Optional UTC bounds — combined into an RFC 3339 interval. gas: :data:Gas.CH4 (default). CH4-only for this PR; Gas.CO2 lands in a follow-up. Typed as Gas | Literal["CH4"] so plain string call-sites (gas="CH4") continue to type-check. limit: Max rows returned in this call. The API caps at 1 000 per page.

Returns¶

list[CMRawPlume]

Examples¶

Permian methane plumes for Q1 2025 from EMIT and Tanager:

from datetime import datetime, timezone plumes = list_plumes( # doctest: +SKIP ... token, ... bbox=(-104.5, 31.0, -101.5, 33.5), ... instruments=["emi", "tan"], ... datetime_min=datetime(2025, 1, 1, tzinfo=timezone.utc), ... datetime_max=datetime(2025, 4, 1, tzinfo=timezone.utc), ... limit=500, ... ) sum(p.emission_auto or 0 for p in plumes) # doctest: +SKIP 412350.0

Source code in georeader/readers/carbonmapper/api_queries.py

def list_plumes(
    token: str,
    *,
    bbox: BBox | None = None,
    sectors: list[str] | None = None,
    instruments: list[str] | None = None,
    datetime_min: datetime | None = None,
    datetime_max: datetime | None = None,
    gas: Gas | Literal["CH4"] = Gas.CH4,
    limit: int = 1_000,
) -> list[CMRawPlume]:
    """Materialised list of plumes matching filters.

    Wraps ``/catalog/plumes/annotated`` and converts each row into a
    :class:`CMRawPlume`. The bbox is encoded as repeated keys (REST
    style — see :func:`georeader.readers.carbonmapper.download._rest_bbox_params`).

    Parameters
    ----------
    token:
        Bearer token.
    bbox:
        ``(W, S, E, N)`` WGS-84 spatial filter.
    sectors:
        IPCC sector codes — e.g. ``["1B2", "6A"]``.
    instruments:
        Instrument short codes — e.g. ``["emi", "tan"]`` or
        :class:`Instrument` members like ``[Instrument.EMIT, Instrument.TANAGER]``.
    datetime_min, datetime_max:
        Optional UTC bounds — combined into an RFC 3339 interval.
    gas:
        :data:`Gas.CH4` (default). **CH4-only for this PR**;
        ``Gas.CO2`` lands in a follow-up. Typed as
        ``Gas | Literal["CH4"]`` so plain string call-sites
        (``gas="CH4"``) continue to type-check.
    limit:
        Max rows returned in this call. The API caps at 1 000 per page.

    Returns
    -------
    list[CMRawPlume]

    Examples
    --------
    Permian methane plumes for Q1 2025 from EMIT and Tanager:

    >>> from datetime import datetime, timezone
    >>> plumes = list_plumes(  # doctest: +SKIP
    ...     token,
    ...     bbox=(-104.5, 31.0, -101.5, 33.5),
    ...     instruments=["emi", "tan"],
    ...     datetime_min=datetime(2025, 1, 1, tzinfo=timezone.utc),
    ...     datetime_max=datetime(2025, 4, 1, tzinfo=timezone.utc),
    ...     limit=500,
    ... )
    >>> sum(p.emission_auto or 0 for p in plumes)  # doctest: +SKIP
    412350.0
    """
    dt_range = _build_datetime_range(datetime_min, datetime_max)
    result = _dl.get_plumes_annotated(
        plume_gas=str(gas),
        bbox=bbox,
        datetime_range=dt_range,
        sectors=sectors,
        instruments=instruments,
        limit=limit,
        token=token,
    )
    items = result.get("items", []) if isinstance(result, Mapping) else []
    return [CMRawPlume(**row) for row in items]

`list_sources(token, *, bbox=None, sectors=None, gas=Gas.CH4)` ¶

List Carbon Mapper sources matching filters.

Wraps the source listing endpoint (REST Catalog). Each item is parsed via :meth:CMSource.from_geojson_feature, which strips the source-name query-suffix.

Parameters¶

token: Bearer token. bbox: (W, S, E, N) WGS-84 spatial filter (REST repeated-keys encoding). sectors: IPCC sector codes. gas: :data:Gas.CH4 (default). CH4-only for this PR; Gas.CO2 lands in a follow-up.

Returns¶

list[CMSource]

Examples¶

Top oil & gas sources in the Permian:

sources = list_sources( # doctest: +SKIP ... token, ... bbox=(-104.5, 31.0, -101.5, 33.5), ... sectors=["1B2"], ... ) sorted(sources, key=lambda s: -(s.emission_auto or 0))[:3] # doctest: +SKIP [, , ]

Source code in georeader/readers/carbonmapper/api_queries.py

def list_sources(
    token: str,
    *,
    bbox: BBox | None = None,
    sectors: list[str] | None = None,
    gas: Gas | Literal["CH4"] = Gas.CH4,
) -> list[CMSource]:
    """List Carbon Mapper sources matching filters.

    Wraps the source listing endpoint (REST Catalog). Each item is
    parsed via :meth:`CMSource.from_geojson_feature`, which strips the
    source-name query-suffix.

    Parameters
    ----------
    token:
        Bearer token.
    bbox:
        ``(W, S, E, N)`` WGS-84 spatial filter (REST repeated-keys
        encoding).
    sectors:
        IPCC sector codes.
    gas:
        :data:`Gas.CH4` (default). **CH4-only for this PR**;
        ``Gas.CO2`` lands in a follow-up.

    Returns
    -------
    list[CMSource]

    Examples
    --------
    Top oil & gas sources in the Permian:

    >>> sources = list_sources(  # doctest: +SKIP
    ...     token,
    ...     bbox=(-104.5, 31.0, -101.5, 33.5),
    ...     sectors=["1B2"],
    ... )
    >>> sorted(sources, key=lambda s: -(s.emission_auto or 0))[:3]  # doctest: +SKIP
    [<CMSource ...>, <CMSource ...>, <CMSource ...>]
    """
    # The `download.get_sources` wrapper actually targets
    # `/plumes/annotated` (see its docstring) — the true source listing
    # lives at `/catalog/sources.geojson` and returns a GeoJSON
    # FeatureCollection. Hit it directly with REST repeated-keys bbox.
    params: list[tuple[str, str]] = [("plume_gas", str(gas))]
    if bbox is not None:
        for v in bbox:
            params.append(("bbox", str(v)))
    if sectors:
        for s in sectors:
            params.append(("sectors", s))
    resp = requests.get(
        f"{_dl.CATALOG_URL}/sources.geojson",
        params=params,
        headers=_dl._headers(token),
        timeout=60,
    )
    resp.raise_for_status()
    fc = resp.json()
    features = fc.get("features", []) if isinstance(fc, Mapping) else []
    return [CMSource.from_geojson_feature(f) for f in features]

`list_plumes_for_tile(token, scene_id, *, gas=Gas.CH4)` ¶

All plumes attributed to a given L2B scene.

Carbon Mapper plume_ids embed the scene_id — plume_id = "{scene_id}-{part}" — so we filter the annotated plumes listing client-side by prefix.

Parameters¶

token: Bearer token. scene_id: L2B scene id, e.g. "tan20251212t185057c20s4001". gas: :data:Gas.CH4 (default). CH4-only for this PR; Gas.CO2 lands in a follow-up.

Returns¶

list[CMRawPlume]

Note¶

The current implementation pulls a 1 000-plume page and filters in Python. For high-volume scenes that may miss tail rows; pass a bbox filter or use :func:list_plumes directly when completeness matters.

Examples¶

plumes = list_plumes_for_tile( # doctest: +SKIP ... token, "tan20251212t185057c20s4001", ... ) [p.plume_id[-1] for p in plumes] # doctest: +SKIP ['A', 'B', 'C', 'E']

Source code in georeader/readers/carbonmapper/api_queries.py

def list_plumes_for_tile(
    token: str,
    scene_id: str,
    *,
    gas: Gas | Literal["CH4"] = Gas.CH4,
) -> list[CMRawPlume]:
    """All plumes attributed to a given L2B scene.

    Carbon Mapper plume_ids embed the scene_id —
    ``plume_id = "{scene_id}-{part}"`` — so we filter the annotated
    plumes listing client-side by prefix.

    Parameters
    ----------
    token:
        Bearer token.
    scene_id:
        L2B scene id, e.g. ``"tan20251212t185057c20s4001"``.
    gas:
        :data:`Gas.CH4` (default). **CH4-only for this PR**;
        ``Gas.CO2`` lands in a follow-up.

    Returns
    -------
    list[CMRawPlume]

    Note
    ----
    The current implementation pulls a 1 000-plume page and filters
    in Python. For high-volume scenes that may miss tail rows; pass a
    bbox filter or use :func:`list_plumes` directly when completeness
    matters.

    Examples
    --------
    >>> plumes = list_plumes_for_tile(  # doctest: +SKIP
    ...     token, "tan20251212t185057c20s4001",
    ... )
    >>> [p.plume_id[-1] for p in plumes]  # doctest: +SKIP
    ['A', 'B', 'C', 'E']
    """
    result = _dl.get_plumes_annotated(
        plume_gas=str(gas),
        limit=1_000,
        token=token,
    )
    items = result.get("items", []) if isinstance(result, Mapping) else []
    prefix = f"{scene_id}-"
    return [
        CMRawPlume(**row)
        for row in items
        if str(row.get("plume_id", "")).startswith(prefix)
    ]

`list_plumes_for_source(token, source_name, *, limit=10000)` ¶

All plumes attributed to a Carbon Mapper source.

Wraps /catalog/source-plumes-csv/{source_name}. The CSV endpoint is single-shot (no pagination) — the result is fully materialised.

Strips the ?... query suffix from source_name automatically (data_model §2.2).

Parameters¶

token: Bearer token. source_name: Canonical or query-suffixed source name. limit: Cap the returned list. Defaults to 10 000 — CM sources rarely exceed a few hundred plumes, so this is just a safety cap.

Returns¶

list[CMRawPlume]

Examples¶

plumes = list_plumes_for_source( # doctest: +SKIP ... token, "CH4_1B2_100m_-104.17525_32.49125", ... ) len(plumes), plumes[0].plume_id[:3] # doctest: +SKIP (47, 'tan')

Source code in georeader/readers/carbonmapper/api_queries.py

def list_plumes_for_source(
    token: str,
    source_name: str,
    *,
    limit: int = 10_000,
) -> list[CMRawPlume]:
    """All plumes attributed to a Carbon Mapper source.

    Wraps ``/catalog/source-plumes-csv/{source_name}``. The CSV
    endpoint is single-shot (no pagination) — the result is fully
    materialised.

    Strips the ``?...`` query suffix from ``source_name`` automatically
    (``data_model §2.2``).

    Parameters
    ----------
    token:
        Bearer token.
    source_name:
        Canonical or query-suffixed source name.
    limit:
        Cap the returned list. Defaults to 10 000 — CM sources rarely
        exceed a few hundred plumes, so this is just a safety cap.

    Returns
    -------
    list[CMRawPlume]

    Examples
    --------
    >>> plumes = list_plumes_for_source(  # doctest: +SKIP
    ...     token, "CH4_1B2_100m_-104.17525_32.49125",
    ... )
    >>> len(plumes), plumes[0].plume_id[:3]  # doctest: +SKIP
    (47, 'tan')
    """
    import io
    import pandas as pd

    cleaned = _strip_query_suffix(source_name)
    csv_text = _dl.get_source_plumes_csv(cleaned, token=token)
    if not csv_text:
        return []
    df = pd.read_csv(io.StringIO(csv_text))
    if limit and len(df) > limit:
        df = df.head(limit)
    # CSV -> dict gives `float('nan')` for empty cells. Pydantic
    # str-typed fields like `sensitivity_mode` reject NaN; coerce
    # NaNs to None so optional fields fall back to their defaults.
    rows = df.to_dict(orient="records")
    cleaned: list[CMRawPlume] = []
    for row in rows:
        sane = {k: (None if isinstance(v, float) and v != v else v)
                for k, v in row.items()}
        cleaned.append(CMRawPlume(**sane))
    return cleaned

`list_tiles_for_source(token, source_name, *, collection=DEFAULT_L2B_COLLECTION)` ¶

All distinct parent L2B tiles touched by a source's plumes.

Implementation:

:func:list_plumes_for_source — every plume attributed to the source.
{plume_id.rsplit("-", 1)[0] for ...} — distinct scene_ids.
stac_search(ids=[...]) — resolve to STAC items.

Useful for tile-level backfill: given a chronic emitter, fetch every L2B scene that ever observed it, regardless of whether plumes were detected on a given pass.

Parameters¶

token: Bearer token. source_name: Canonical or query-suffixed source name. collection: STAC collection — defaults to :data:DEFAULT_L2B_COLLECTION.

Returns¶

list[CMTileItem] Empty list if the source has no plumes.

Examples¶

tiles = list_tiles_for_source( # doctest: +SKIP ... token, "CH4_1B2_100m_-104.17525_32.49125", ... ) sorted({t.platform for t in tiles}) # doctest: +SKIP ['EMIT', 'Tanager1']

Source code in georeader/readers/carbonmapper/api_queries.py

def list_tiles_for_source(
    token: str,
    source_name: str,
    *,
    collection: str = DEFAULT_L2B_COLLECTION,
) -> list[CMTileItem]:
    """All distinct parent L2B tiles touched by a source's plumes.

    Implementation:

    1. :func:`list_plumes_for_source` — every plume attributed to the
       source.
    2. ``{plume_id.rsplit("-", 1)[0] for ...}`` — distinct scene_ids.
    3. ``stac_search(ids=[...])`` — resolve to STAC items.

    Useful for tile-level backfill: given a chronic emitter, fetch
    every L2B scene that ever observed it, regardless of whether
    plumes were detected on a given pass.

    Parameters
    ----------
    token:
        Bearer token.
    source_name:
        Canonical or query-suffixed source name.
    collection:
        STAC collection — defaults to :data:`DEFAULT_L2B_COLLECTION`.

    Returns
    -------
    list[CMTileItem]
        Empty list if the source has no plumes.

    Examples
    --------
    >>> tiles = list_tiles_for_source(  # doctest: +SKIP
    ...     token, "CH4_1B2_100m_-104.17525_32.49125",
    ... )
    >>> sorted({t.platform for t in tiles})  # doctest: +SKIP
    ['EMIT', 'Tanager1']
    """
    plumes = list_plumes_for_source(token, source_name)
    scene_ids = sorted({_scene_id_from_plume(p.plume_id) for p in plumes})
    if not scene_ids:
        return []
    result = _dl.stac_search(
        collections=[collection], ids=scene_ids, limit=len(scene_ids), token=token,
    )
    features = result.get("features", []) if isinstance(result, Mapping) else []
    return [CMTileItem.from_stac_item(f) for f in features]

plume.py¶

Unified Pydantic model for Carbon Mapper plume records.

Handles payloads from both Carbon Mapper API formats:

CSV bulk export (/api/v1/catalog/plume-csv) — provides plume_latitude, plume_longitude, datetime, plume_bounds.
Annotated plume JSON (/api/v1/catalog/plumes/annotated) — provides geometry_json, scene_timestamp, validated, has_phme.

All fields except plume_id are optional so that the model can be constructed from either format without validation errors.

CH4 only for this PR. The catalog model surface is gas-agnostic (CMRawPlume.gas returns whatever the API gave us), but query helpers in :mod:api_queries are typed Literal["CH4"] to keep the supported-product surface explicit. CO2 lands in a follow-up.

Version timeline. Carbon Mapper bumps emission_version per processing-software release. v3a is the canonical STAC-exposed version family (in /stac/collections); v3c is the live processing version of newer plumes — reachable via direct asset URLs from /catalog/plume/{id} but not registered in STAC. The :attr:CMRawPlume.version property exposes this so callers can branch between STAC-item lookup (v3a) and URL-pattern derivation (v3c) — see :class:~georeader.readers.carbonmapper.image.CMPlumeImage, which handles both transparently.

This module is the API-side typed view of a Carbon Mapper plume record. Downstream consumers (e.g. UNEP IMEO MARS) may persist the record into their own tables / views; field-level docstrings below mirror the column comments on the src_carbon_mapper_plumes SQL view in pysat (UNEP-IMEO-MARS/pysat <https://github.com/UNEP-IMEO-MARS/pysat>_), so the upstream API and one downstream staging view share a single source of truth.

`CARBONMAPPER_INSTRUMENTS = {'emi': 'EMIT', 'tan': 'Tanager-1', 'ang': 'AVIRIS-NG', 'gao': 'Global Airborne Observatory', 'av3': 'AVIRIS-3'}` `module-attribute` ¶

`CM_INSTRUMENT_TO_SATELLITE = {'tan': 'Tanager1', 'ang': 'AVIRISNG', 'av3': 'AVIRIS3', 'emi': 'EMIT', 'gao': 'GAO'}` `module-attribute` ¶

`CMRawPlume` ¶

Bases: BaseModel

Unified Carbon Mapper plume model.

Accepts payloads from both the CSV bulk-export endpoint and the annotated plume JSON endpoint. Only plume_id is required — all other fields default to None so either format can be parsed without errors.

Geometry is built automatically from whichever source is available:

geometry_json (GeoJSON dict) — Point geometries are buffered by 0.001° to produce a small polygon.
plume_bounds (bounding box) — converted to a shapely.box.

Note that geometry here is not the retrieved plume mask polygon — it's just the API's reported point/bounds. For the authoritative plume polygon, use :meth:~georeader.readers.carbonmapper.rasters.CMPlumeRaster.polygon, which extracts it from the L3A plume_tif band-4 alpha mask.

Downstream MARS staging-view counterpart¶

UNEP IMEO MARS persists this record into src_plume_staging_hist and exposes it via the src_carbon_mapper_plumes view (defined in pysat sql/view01_raw_carbon_mapper_plumes_view.sql <https://github.com/UNEP-IMEO-MARS/pysat/blob/main/sql/view01_raw_carbon_mapper_plumes_view.sql>_). Field-level docstrings below mirror that view's COMMENT ON COLUMN statements.

Mapping reference (CMRawPlume field → SQL view column):

============================ ===================================== CMRawPlume field src_carbon_mapper_plumes column ============================ ===================================== plume_id source_id datetime_str / tile_date scene_timestamp published_at_str published_at modified_str modified plume_latitude lat plume_longitude lon plume_bounds_raw plume_bounds wind_source_auto wind_source wind_speed_avg_auto wind_speed_m_s wind_speed_std_auto wind_speed_std_m_s wind_direction_avg_auto wind_direction_deg wind_direction_std_auto wind_direction_std_deg emission_auto emission_rate_kg_h emission_uncertainty_auto emission_rate_uncertainty_kg_h ipcc_sector sector con_tif concentration_tif rgb_tif, rgb_png same names plume_tif, plume_png same names ============================ =====================================

Source code in georeader/readers/carbonmapper/plume.py

class CMRawPlume(BaseModel):
    """Unified Carbon Mapper plume model.

    Accepts payloads from both the CSV bulk-export endpoint and the
    annotated plume JSON endpoint. Only ``plume_id`` is required — all
    other fields default to ``None`` so either format can be parsed
    without errors.

    Geometry is built automatically from whichever source is available:

    1. ``geometry_json`` (GeoJSON dict) — Point geometries are buffered
       by 0.001° to produce a small polygon.
    2. ``plume_bounds`` (bounding box) — converted to a ``shapely.box``.

    Note that ``geometry`` here is **not** the retrieved plume mask
    polygon — it's just the API's reported point/bounds. For the
    authoritative plume polygon, use
    :meth:`~georeader.readers.carbonmapper.rasters.CMPlumeRaster.polygon`,
    which extracts it from the L3A ``plume_tif`` band-4 alpha mask.

    Downstream MARS staging-view counterpart
    ----------------------------------------
    UNEP IMEO MARS persists this record into ``src_plume_staging_hist``
    and exposes it via the **``src_carbon_mapper_plumes`` view** (defined
    in `pysat sql/view01_raw_carbon_mapper_plumes_view.sql
    <https://github.com/UNEP-IMEO-MARS/pysat/blob/main/sql/view01_raw_carbon_mapper_plumes_view.sql>`_).
    Field-level docstrings below mirror that view's
    ``COMMENT ON COLUMN`` statements.

    Mapping reference (CMRawPlume field → SQL view column):

    ============================  =====================================
    ``CMRawPlume`` field          ``src_carbon_mapper_plumes`` column
    ============================  =====================================
    ``plume_id``                  ``source_id``
    ``datetime_str`` /            ``tile_date``
      ``scene_timestamp``
    ``published_at_str``          ``published_at``
    ``modified_str``              ``modified``
    ``plume_latitude``            ``lat``
    ``plume_longitude``           ``lon``
    ``plume_bounds_raw``          ``plume_bounds``
    ``wind_source_auto``          ``wind_source``
    ``wind_speed_avg_auto``       ``wind_speed_m_s``
    ``wind_speed_std_auto``       ``wind_speed_std_m_s``
    ``wind_direction_avg_auto``   ``wind_direction_deg``
    ``wind_direction_std_auto``   ``wind_direction_std_deg``
    ``emission_auto``             ``emission_rate_kg_h``
    ``emission_uncertainty_auto`` ``emission_rate_uncertainty_kg_h``
    ``ipcc_sector``               ``sector``
    ``con_tif``                   ``concentration_tif``
    ``rgb_tif``, ``rgb_png``      same names
    ``plume_tif``, ``plume_png``  same names
    ============================  =====================================
    """

    model_config = ConfigDict(
        arbitrary_types_allowed=True,
        populate_by_name=True,
        str_strip_whitespace=True,
        validate_assignment=True,
    )

    # Field descriptions below mirror the column comments on the
    # ``src_carbon_mapper_plumes`` view in pysat
    # (sql/view01_raw_carbon_mapper_plumes_view.sql) so the upstream API
    # docs, the staging-table view, and this in-memory model all share
    # one source of truth. Keep them in sync if the SQL view's COMMENT
    # ON COLUMN statements change.

    # --- Core identifiers ---
    plume_id: str = Field(
        description=(
            "Unique plume identifier in the format "
            "``{platform}{YYYYMMDD}{HHMMSS}-{part}``. The first three "
            "characters represent the platform (e.g. ``gao`` for Global "
            "Airborne Observatory) followed by the acquisition date and "
            "time in ISO 8601 UTC format. The ``-{part}`` suffix (e.g. "
            "``-A``) retains key information from the original radiance "
            "filename and indicates the order of multiple plumes "
            "detected in the same image."
        ),
    )
    gas: str | None = Field(
        default="CH4",
        description="The gas molecule detected during imaging operations.",
    )

    # --- Coordinates (CSV: required; JSON: derived from geometry_json) ---
    plume_latitude: float | None = Field(
        default=None, alias="plume_latitude",
        description="Latitude estimate of plume origin (decimal degrees, EPSG:4326).",
    )
    plume_longitude: float | None = Field(
        default=None, alias="plume_longitude",
        description="Longitude estimate of plume origin (decimal degrees, EPSG:4326).",
    )

    # --- Timestamps ---
    # CSV format uses "datetime"; annotated format uses "scene_timestamp"
    datetime_str: str | None = Field(
        default=None, alias="datetime",
        description=(
            "Acquisition time (UTC ISO 8601). Maps to the SQL view's "
            "``tile_date`` column. Set on CSV-format payloads; the "
            "annotated-JSON endpoint uses ``scene_timestamp`` instead."
        ),
    )
    scene_timestamp: str | None = Field(
        default=None,
        description=(
            "Acquisition time (UTC ISO 8601) — annotated-JSON variant of "
            "``datetime``. Either field may be populated, never both."
        ),
    )
    scene_uuid: str | None = Field(
        default=None,
        alias="scene_id",
        description=(
            "Internal Carbon Mapper scene UUID — what the API returns "
            "in the ``scene_id`` JSON field. **Not** the parseable scene "
            "name (e.g. ``tan20251212t185057c20s4001``); for that, use "
            "the :attr:`scene_id` property which derives from "
            "``plume_id.rsplit('-', 1)[0]`` and matches the STAC item id "
            "in the ``l2b-ch4-mfa-v3a`` collection."
        ),
    )
    published_at_str: str | None = Field(
        default=None, alias="published_at",
        description="Date and time the observation was published (UTC).",
    )
    modified_str: str | None = Field(
        default=None, alias="modified",
        description="Date and time the observation was last modified (UTC).",
    )

    # --- Emissions ---
    emission_auto: float | None = Field(
        default=None,
        description=(
            "Quantified emission rate of the plume [kg/hr], estimated "
            "using the Integrated Methane Enhancement (IME) method "
            "(Duren et al. 2019, *California's Methane Super-Emitters*, "
            "Nature)."
        ),
    )
    emission_uncertainty_auto: float | None = Field(
        default=None,
        description=(
            "Uncertainty in the emission rate [± kg/hr range], derived "
            "from uncertainty in IME and wind speed."
        ),
    )

    # --- Wind ---
    wind_speed_avg_auto: float | None = Field(
        default=None,
        description="Mean wind speed at the plume site [m/s].",
    )
    wind_speed_std_auto: float | None = Field(
        default=None,
        description="Standard deviation of wind speed [m/s].",
    )
    wind_direction_avg_auto: float | None = Field(
        default=None,
        description="Wind direction at the plume site [degrees].",
    )
    wind_direction_std_auto: float | None = Field(
        default=None,
        description="Standard deviation of wind direction [degrees].",
    )
    wind_source_auto: str | None = Field(
        default=None,
        description=(
            "Wind reanalysis source (e.g. ``HRRR``, ``ECMWF_IFS``, "
            "``ERA5``). Indicates which forecast/reanalysis product fed "
            "the IME quantification."
        ),
    )

    # --- Instrument / platform ---
    instrument: str | None = Field(
        default=None,
        description=(
            "Three-character sensor abbreviation: ``ang`` (AVIRIS-NG), "
            "``av3`` (AVIRIS-3), ``emi`` (EMIT), ``tan`` (Tanager-1), "
            "``gao`` (GAO)."
        ),
    )
    platform: str | None = Field(
        default=None,
        description="Unique name of the platform the instrument is attached to.",
    )
    provider: str | None = Field(
        default=None,
        description="Short description of the data provider's name.",
    )

    # --- Classification / metadata ---
    ipcc_sector: str | None = Field(
        default=None, alias="ipcc_sector",
        description=(
            "IPCC emissions sector (e.g. ``1B2`` for Oil & Gas) when "
            "Carbon Mapper attributes one. Reference: "
            "https://www.ipcc-nggip.iges.or.jp/public/gl/guidelin/ch1ri.pdf"
        ),
    )
    sector: str | None = Field(
        default=None,
        description=(
            "Carbon Mapper free-text sector category. Often a "
            "human-readable wrapper around ``ipcc_sector`` (e.g. "
            '``"Oil & Gas (1B2)"``).'
        ),
    )
    emission_cmf_type: str | None = Field(
        default=None, alias="emission_cmf_type",
        description=(
            "Statistical column-wise atmospheric retrieval algorithm "
            "used to threshold methane / carbon dioxide plumes from "
            "background concentrations (e.g. ``mfa``)."
        ),
    )
    mission_phase: str | None = Field(
        default=None,
        description=(
            "Operational mission phase, such as ``first_light`` or "
            "``production``."
        ),
    )
    emission_version: str | None = Field(
        default=None,
        description=(
            "Version label for the algorithm + calibration applied to "
            "produce this emission record. Pairs with reprocessing "
            "campaigns."
        ),
    )
    processing_software: str | None = Field(
        default=None,
        description=(
            "Software version used by the provider to process the raw "
            "satellite data (e.g. ``cmpro: 3.41.4``)."
        ),
    )
    gsd: float | None = Field(
        default=None,
        description=(
            "Native ground sample distance — the distance on the ground "
            "represented by the center-to-center spacing of pixels in "
            "the sensor's raw radiance data [meters]."
        ),
    )
    sensitivity_mode: str | None = Field(
        default=None,
        description=(
            "The sensor's configured detection threshold and "
            "radiometric settings, which affect signal-to-noise ratio "
            "(SNR), exposure time, and spectral fidelity."
        ),
    )
    off_nadir: float | None = Field(
        default=None,
        description=(
            "Angle between the satellite's sensor line of sight and the "
            "point directly below the satellite (nadir) [degrees]. "
            "Carbon Mapper publishes this on the plume; the equivalent "
            "STAC property at the L2B scene level is ``view:off_nadir``."
        ),
    )

    # --- Quality & validation (annotated JSON) ---
    plume_quality: str | None = Field(
        default=None,
        description=(
            "CM-side quality flag for the plume retrieval. Presence "
            "implies the record was reviewed by Carbon Mapper's "
            "pipeline."
        ),
    )
    validated: bool | None = Field(
        default=None,
        description="CM-side validation flag (annotated JSON only).",
    )
    validator_user: str | None = Field(
        default=None,
        description="Validator user id from the CM annotated payload.",
    )
    has_phme: bool | None = Field(
        default=None,
        description=(
            "Whether the plume has been Plume Height + Mass Estimated. "
            "Annotated JSON only."
        ),
    )
    detection_institution: str | None = Field(
        default=None,
        description="Detection institution string from the CM annotated payload.",
    )

    # --- Source linkage (annotated JSON) ---
    source_id: str | None = Field(
        default=None,
        description=(
            "Carbon Mapper-assigned emission-source id. Joins to the CM "
            "API's source endpoint."
        ),
    )
    source_name: str | None = Field(
        default=None,
        description=(
            "Carbon Mapper source-name string (e.g. "
            "``CH4_1B2_100m_-104.17525_32.49125``)."
        ),
    )

    # --- Assets ---
    plume_tif: str | None = Field(
        default=None,
        description=(
            "HTTPS link to a GeoTIFF of the delineated plume (L3A "
            "alpha-banded mask). "
            ":meth:`~georeader.readers.carbonmapper.rasters.CMPlumeRaster.polygon`"
            " extracts the polygon from band 4 of this file — the "
            "authoritative source for the retrieved plume shape."
        ),
    )
    plume_png: str | None = Field(
        default=None,
        description="HTTPS link to a PNG visualisation of the delineated plume.",
    )
    con_tif: str | None = Field(
        default=None,
        description=(
            "HTTPS link to a GeoTIFF pixel map of unsmoothed "
            "concentration values [ppm·m]. The L2B-tile-level "
            "equivalent is the ``cmf`` asset on the parent STAC item."
        ),
    )
    rgb_tif: str | None = Field(
        default=None,
        description=(
            "HTTPS link to a 3-band, natural-colour, full-strip "
            "surface-reflectance GeoTIFF. The L2B-tile-level sibling "
            "lives in the ``l2b-rgb-v3a`` STAC collection."
        ),
    )
    rgb_png: str | None = Field(
        default=None,
        description=(
            "HTTPS link to a natural-colour, full-strip "
            "surface-reflectance PNG."
        ),
    )
    plume_rgb_png: str | None = Field(
        default=None,
        description="HTTPS link to a PNG of the plume overlaid on RGB.",
    )

    # --- Geometry sources ---
    geometry_json: dict | None = Field(
        default=None,
        description=(
            "Raw GeoJSON geometry dict from the CM payload — typically "
            "a Point or coarse Polygon. **Not** the retrieved plume "
            "polygon; for that, use ``CMPlumeRaster.polygon()`` against "
            "``plume_tif``."
        ),
    )
    plume_bounds_raw: Optional[Union[str, List[float], Tuple[float, float, float, float]]] = Field(
        default=None, alias="plume_bounds",
        description="Geographic bounds encompassing the plume image (W, S, E, N).",
    )

    # --- Derived ---
    geometry: BaseGeometry | None = Field(
        default=None,
        description=(
            "Shapely geometry built from ``geometry_json`` (preferred) "
            "or ``plume_bounds`` at validation time. **Not** the "
            "retrieved plume mask — same caveat as ``geometry_json``."
        ),
    )

    # ------------------------------------------------------------------ #
    # Field validators                                                     #
    # ------------------------------------------------------------------ #

    @field_validator(
        "plume_latitude",
        "plume_longitude",
        "gsd",
        "off_nadir",
        "emission_auto",
        "emission_uncertainty_auto",
        "wind_speed_avg_auto",
        "wind_speed_std_auto",
        "wind_direction_avg_auto",
        "wind_direction_std_auto",
        mode="before",
    )
    @classmethod
    def _coerce_float(cls, v: Any) -> float | None:
        return _to_float(v)

    @field_validator("validated", "has_phme", mode="before")
    @classmethod
    def _coerce_bool(cls, v: Any) -> bool | None:
        if v is None:
            return None
        if isinstance(v, bool):
            return v
        if isinstance(v, str):
            return v.lower() in ("true", "1", "yes")
        return bool(v)

    # ------------------------------------------------------------------ #
    # Model validator                                                      #
    # ------------------------------------------------------------------ #

    @model_validator(mode="after")
    def _build_geometry(self) -> "CMRawPlume":
        """Build shapely geometry from ``geometry_json`` or ``plume_bounds``."""
        geom: BaseGeometry | None = None

        # Priority 1: GeoJSON
        if self.geometry_json:
            try:
                geom = shape(self.geometry_json)
            except Exception:
                geom = None
            geom_type = self.geometry_json.get("type", "")
            if geom_type == "Point" and geom is not None:
                # Buffer by ~111 m to get a small polygon
                geom = geom.buffer(0.001)
            # Fill lat/lon from Point coordinates if not set
            if (self.plume_latitude is None or self.plume_longitude is None) and geom_type == "Point":
                coords = self.geometry_json.get("coordinates")
                if coords and len(coords) >= 2:
                    object.__setattr__(self, "plume_longitude", float(coords[0]))
                    object.__setattr__(self, "plume_latitude", float(coords[1]))

        # Priority 2: Bounding box
        if geom is None:
            b = _parse_bounds(self.plume_bounds_raw)
            if b is not None:
                try:
                    geom = box(*b)
                except Exception:
                    geom = None

        object.__setattr__(self, "geometry", geom)
        return self

    # ------------------------------------------------------------------ #
    # Properties                                                           #
    # ------------------------------------------------------------------ #

    @property
    def observation_datetime(self) -> datetime | None:
        """Parse observation time from ``datetime_str`` or ``scene_timestamp``."""
        return _parse_iso_datetime(self.datetime_str) or _parse_iso_datetime(self.scene_timestamp)

    @property
    def published_at(self) -> datetime | None:
        return _parse_iso_datetime(self.published_at_str)

    @property
    def modified_at(self) -> datetime | None:
        return _parse_iso_datetime(self.modified_str)

    @property
    def lat(self) -> float | None:
        return self.plume_latitude

    @property
    def lon(self) -> float | None:
        return self.plume_longitude

    @property
    def geometry_wkt(self) -> str | None:
        return self.geometry.wkt if self.geometry is not None else None

    @property
    def wind_u(self) -> float | None:
        """Eastward wind component (m/s), meteorological convention."""
        u, _ = decompose_wind(self.wind_speed_avg_auto, self.wind_direction_avg_auto)
        return u

    @property
    def wind_v(self) -> float | None:
        """Northward wind component (m/s), meteorological convention."""
        _, v = decompose_wind(self.wind_speed_avg_auto, self.wind_direction_avg_auto)
        return v

    @property
    def instrument_name(self) -> str | None:
        """Human-readable instrument name from :data:`CARBONMAPPER_INSTRUMENTS`.

        The lookup is case-insensitive — upstream payloads occasionally
        report ``"GAO"`` while ``plume_id`` prefixes are lowercase, so
        the table key is normalised at lookup time rather than relying
        on every caller to lowercase first.
        """
        if self.instrument is None:
            return None
        return CARBONMAPPER_INSTRUMENTS.get(
            self.instrument.lower(), self.instrument,
        )

    @property
    def scene_id(self) -> str:
        """Parent L2B scene id, derived from ``plume_id``.

        Equivalent to ``plume_id.rsplit('-', 1)[0]`` — same string used
        as the STAC item id in the ``l2b-ch4-mfa-v3a`` collection. Use
        this to bridge from a plume to its parent scene without an HTTP
        round-trip:

        >>> raw.scene_id                       # doctest: +SKIP
        'tan20251212t185057c20s4001'
        >>> tile = api_queries.get_tile(token, raw.scene_id)  # doctest: +SKIP

        Distinct from :attr:`scene_uuid`, which is the API's internal
        UUID for the scene.
        """
        return self.plume_id.rsplit("-", 1)[0]

    @property
    def version(self) -> str | None:
        """Processing version (``"v3a"`` / ``"v3b"`` / ``"v3c"`` / ...).

        Re-exposes :attr:`emission_version` as a more obvious branch
        point for STAC-vs-CDN access: ``v3a`` plumes are STAC-resident,
        ``v3c`` plumes are reachable only via the URL-pattern derivation
        in :class:`~georeader.readers.carbonmapper.image.CMPlumeImage`.
        Returns ``None`` if the upstream payload didn't include
        ``emission_version`` (older CSV exports).
        """
        return self.emission_version

    # ------------------------------------------------------------------ #
    # Serialisation                                                        #
    # ------------------------------------------------------------------ #

    def to_source_dict(self) -> Dict[str, Any]:
        """Serialise to a dict suitable for round-tripping through :meth:`from_raw`."""
        d: Dict[str, Any] = {"plume_id": self.plume_id, "gas": self.gas}

        # Coordinates
        if self.plume_latitude is not None:
            d["plume_latitude"] = self.plume_latitude
        if self.plume_longitude is not None:
            d["plume_longitude"] = self.plume_longitude

        # Timestamps
        if self.datetime_str is not None:
            d["datetime"] = self.datetime_str
        if self.scene_timestamp is not None:
            d["scene_timestamp"] = self.scene_timestamp
        # Round-trip the API's `scene_id` (UUID) under its on-the-wire
        # name; the parseable form is derived via the property.
        if self.scene_uuid is not None:
            d["scene_id"] = self.scene_uuid
        if self.published_at_str is not None:
            d["published_at"] = self.published_at_str
        if self.modified_str is not None:
            d["modified"] = self.modified_str

        # Emissions
        d["emission_auto"] = self.emission_auto
        d["emission_uncertainty_auto"] = self.emission_uncertainty_auto

        # Wind
        d["wind_speed_avg_auto"] = self.wind_speed_avg_auto
        d["wind_speed_std_auto"] = self.wind_speed_std_auto
        d["wind_direction_avg_auto"] = self.wind_direction_avg_auto
        d["wind_direction_std_auto"] = self.wind_direction_std_auto
        d["wind_source_auto"] = self.wind_source_auto

        # Instrument / platform
        d["instrument"] = self.instrument
        d["platform"] = self.platform
        d["provider"] = self.provider

        # Classification
        if self.ipcc_sector is not None:
            d["ipcc_sector"] = self.ipcc_sector
        if self.sector is not None:
            d["sector"] = self.sector
        d["emission_cmf_type"] = self.emission_cmf_type
        d["mission_phase"] = self.mission_phase
        d["emission_version"] = self.emission_version
        d["processing_software"] = self.processing_software
        d["gsd"] = self.gsd
        d["sensitivity_mode"] = self.sensitivity_mode
        d["off_nadir"] = self.off_nadir

        # Quality / validation
        if self.plume_quality is not None:
            d["plume_quality"] = self.plume_quality
        if self.validated is not None:
            d["validated"] = self.validated
        if self.validator_user is not None:
            d["validator_user"] = self.validator_user
        if self.has_phme is not None:
            d["has_phme"] = self.has_phme
        if self.detection_institution is not None:
            d["detection_institution"] = self.detection_institution

        # Source linkage
        if self.source_id is not None:
            d["source_id"] = self.source_id
        if self.source_name is not None:
            d["source_name"] = self.source_name

        # Assets
        d["plume_tif"] = self.plume_tif
        d["plume_png"] = self.plume_png
        d["con_tif"] = self.con_tif
        d["rgb_tif"] = self.rgb_tif
        d["rgb_png"] = self.rgb_png
        if self.plume_rgb_png is not None:
            d["plume_rgb_png"] = self.plume_rgb_png

        # Geometry sources
        if self.geometry_json is not None:
            d["geometry_json"] = self.geometry_json
        if self.plume_bounds_raw is not None:
            d["plume_bounds"] = self.plume_bounds_raw

        return d

    # ------------------------------------------------------------------ #
    # Factory classmethods                                                 #
    # ------------------------------------------------------------------ #

    @classmethod
    def from_raw(cls, raw: Union[str, Dict[str, Any]]) -> "CMRawPlume":
        """Create from a JSON string or dict (CSV row or annotated-plume payload)."""
        if isinstance(raw, str):
            raw = json.loads(raw)
        return cls(**raw)

    # ------------------------------------------------------------------ #
    # Representation                                                       #
    # ------------------------------------------------------------------ #

    def _short_wkt_preview(self, max_len: int = 160) -> str | None:
        if not self.geometry:
            return None
        txt = self.geometry.wkt.replace("\n", " ").strip()
        return txt if len(txt) <= max_len else txt[: max_len - 3] + "..."

    def __str__(self) -> str:
        geom = self.geometry
        geom_type = getattr(geom, "geom_type", None)
        area = round(geom.area, 6) if geom is not None else None
        dt = self.observation_datetime.isoformat() if self.observation_datetime else None
        return (
            f"{self.__class__.__name__}\n"
            f"  plume_id: {self.plume_id}\n"
            f"  observation_datetime (UTC): {dt}\n"
            f"  lat: {self.lat}\n"
            f"  lon: {self.lon}\n"
            f"  instrument: {self.instrument}\n"
            f"  platform: {self.platform}\n"
            f"  geometry_type: {geom_type}\n"
            f"  geometry_area_deg2: {area}\n"
            f"  emission_auto: {self.emission_auto}\n"
            f"  emission_uncertainty_auto: {self.emission_uncertainty_auto}\n"
            f"  wind_speed_avg_auto: {self.wind_speed_avg_auto}\n"
            f"  wind_direction_avg_auto: {self.wind_direction_avg_auto}\n"
            f"  gas: {self.gas}\n"
            f"  validated: {self.validated}\n"
        )

    def __repr__(self) -> str:
        geom = self.geometry
        geom_type = getattr(geom, "geom_type", None)
        area = geom.area if geom is not None else None
        return (
            f"{self.__class__.__name__}(\n"
            f"  plume_id={self.plume_id!r},\n"
            f"  lat={self.lat},\n"
            f"  lon={self.lon},\n"
            f"  gas={self.gas!r},\n"
            f"  instrument={self.instrument!r},\n"
            f"  platform={self.platform!r},\n"
            f"  emission_auto={self.emission_auto},\n"
            f"  emission_uncertainty_auto={self.emission_uncertainty_auto},\n"
            f"  wind_speed_avg_auto={self.wind_speed_avg_auto},\n"
            f"  wind_direction_avg_auto={self.wind_direction_avg_auto},\n"
            f"  validated={self.validated},\n"
            f"  geometry_type={geom_type},\n"
            f"  geometry_area_deg2={area},\n"
            f"  geometry_wkt_preview={self._short_wkt_preview()!r}\n"
            f")"
        )

`instrument_name` `property` ¶

Human-readable instrument name from :data:CARBONMAPPER_INSTRUMENTS.

The lookup is case-insensitive — upstream payloads occasionally report "GAO" while plume_id prefixes are lowercase, so the table key is normalised at lookup time rather than relying on every caller to lowercase first.

`observation_datetime` `property` ¶

Parse observation time from datetime_str or scene_timestamp.

`scene_id` `property` ¶

Parent L2B scene id, derived from plume_id.

Equivalent to plume_id.rsplit('-', 1)[0] — same string used as the STAC item id in the l2b-ch4-mfa-v3a collection. Use this to bridge from a plume to its parent scene without an HTTP round-trip:

raw.scene_id # doctest: +SKIP 'tan20251212t185057c20s4001' tile = api_queries.get_tile(token, raw.scene_id) # doctest: +SKIP

Distinct from :attr:scene_uuid, which is the API's internal UUID for the scene.

`version` `property` ¶

Processing version ("v3a" / "v3b" / "v3c" / ...).

Re-exposes :attr:emission_version as a more obvious branch point for STAC-vs-CDN access: v3a plumes are STAC-resident, v3c plumes are reachable only via the URL-pattern derivation in :class:~georeader.readers.carbonmapper.image.CMPlumeImage. Returns None if the upstream payload didn't include emission_version (older CSV exports).

`wind_u` `property` ¶

Eastward wind component (m/s), meteorological convention.

`wind_v` `property` ¶

Northward wind component (m/s), meteorological convention.

`from_raw(raw)` `classmethod` ¶

Create from a JSON string or dict (CSV row or annotated-plume payload).

Source code in georeader/readers/carbonmapper/plume.py

@classmethod
def from_raw(cls, raw: Union[str, Dict[str, Any]]) -> "CMRawPlume":
    """Create from a JSON string or dict (CSV row or annotated-plume payload)."""
    if isinstance(raw, str):
        raw = json.loads(raw)
    return cls(**raw)

`to_source_dict()` ¶

Serialise to a dict suitable for round-tripping through :meth:from_raw.

Source code in georeader/readers/carbonmapper/plume.py

def to_source_dict(self) -> Dict[str, Any]:
    """Serialise to a dict suitable for round-tripping through :meth:`from_raw`."""
    d: Dict[str, Any] = {"plume_id": self.plume_id, "gas": self.gas}

    # Coordinates
    if self.plume_latitude is not None:
        d["plume_latitude"] = self.plume_latitude
    if self.plume_longitude is not None:
        d["plume_longitude"] = self.plume_longitude

    # Timestamps
    if self.datetime_str is not None:
        d["datetime"] = self.datetime_str
    if self.scene_timestamp is not None:
        d["scene_timestamp"] = self.scene_timestamp
    # Round-trip the API's `scene_id` (UUID) under its on-the-wire
    # name; the parseable form is derived via the property.
    if self.scene_uuid is not None:
        d["scene_id"] = self.scene_uuid
    if self.published_at_str is not None:
        d["published_at"] = self.published_at_str
    if self.modified_str is not None:
        d["modified"] = self.modified_str

    # Emissions
    d["emission_auto"] = self.emission_auto
    d["emission_uncertainty_auto"] = self.emission_uncertainty_auto

    # Wind
    d["wind_speed_avg_auto"] = self.wind_speed_avg_auto
    d["wind_speed_std_auto"] = self.wind_speed_std_auto
    d["wind_direction_avg_auto"] = self.wind_direction_avg_auto
    d["wind_direction_std_auto"] = self.wind_direction_std_auto
    d["wind_source_auto"] = self.wind_source_auto

    # Instrument / platform
    d["instrument"] = self.instrument
    d["platform"] = self.platform
    d["provider"] = self.provider

    # Classification
    if self.ipcc_sector is not None:
        d["ipcc_sector"] = self.ipcc_sector
    if self.sector is not None:
        d["sector"] = self.sector
    d["emission_cmf_type"] = self.emission_cmf_type
    d["mission_phase"] = self.mission_phase
    d["emission_version"] = self.emission_version
    d["processing_software"] = self.processing_software
    d["gsd"] = self.gsd
    d["sensitivity_mode"] = self.sensitivity_mode
    d["off_nadir"] = self.off_nadir

    # Quality / validation
    if self.plume_quality is not None:
        d["plume_quality"] = self.plume_quality
    if self.validated is not None:
        d["validated"] = self.validated
    if self.validator_user is not None:
        d["validator_user"] = self.validator_user
    if self.has_phme is not None:
        d["has_phme"] = self.has_phme
    if self.detection_institution is not None:
        d["detection_institution"] = self.detection_institution

    # Source linkage
    if self.source_id is not None:
        d["source_id"] = self.source_id
    if self.source_name is not None:
        d["source_name"] = self.source_name

    # Assets
    d["plume_tif"] = self.plume_tif
    d["plume_png"] = self.plume_png
    d["con_tif"] = self.con_tif
    d["rgb_tif"] = self.rgb_tif
    d["rgb_png"] = self.rgb_png
    if self.plume_rgb_png is not None:
        d["plume_rgb_png"] = self.plume_rgb_png

    # Geometry sources
    if self.geometry_json is not None:
        d["geometry_json"] = self.geometry_json
    if self.plume_bounds_raw is not None:
        d["plume_bounds"] = self.plume_bounds_raw

    return d

`decompose_wind(speed, direction_deg)` ¶

Convert wind speed + meteorological direction to (u, v) components.

Meteorological convention: 0° = wind from North, 90° = wind from East. Returns the eastward (u) and northward (v) wind vector components.

Source code in georeader/readers/carbonmapper/plume.py

def decompose_wind(
    speed: float | None,
    direction_deg: float | None,
) -> tuple[float | None, float | None]:
    """Convert wind speed + meteorological direction to (u, v) components.

    Meteorological convention: 0° = wind *from* North, 90° = wind *from* East.
    Returns the eastward (u) and northward (v) wind vector components.
    """
    if speed is None or direction_deg is None:
        return None, None
    direction_rad = math.radians(direction_deg)
    wind_u = -speed * math.sin(direction_rad)
    wind_v = -speed * math.cos(direction_rad)
    return wind_u, wind_v

Typed model for a Carbon Mapper source (DBSCAN cluster of plumes).

A Carbon Mapper source groups all plumes detected at the same geographic location into a persistent point-source record. Sources are addressed by a deterministic name of the form {gas}_{sector}_{footprint_m}m_{lon}_{lat} — e.g. "CH4_1B2_100m_-104.17525_32.49125".

This module is the API-side typed view of a Carbon Mapper source. Downstream consumers may persist it into their own tables, but this package deliberately does not assume any particular DB schema.

Notable quirks handled here¶

/catalog/sources.geojson features sometimes return source_name with a stray query-string fragment appended ("...?plume_gas=CH4&bbox=..."). :func:_strip_query_suffix removes it; :meth:CMSource.from_geojson_feature calls it always so callers never see the dirty form.
The endpoints return either a GeoJSON Feature (with properties / geometry) or a flat dict; the higher-level :mod:georeader.readers.carbonmapper.api_queries normalises these before invoking :meth:from_geojson_feature.

`CMSource` `dataclass` ¶

Typed view of a Carbon Mapper source (cluster of plumes).

Frozen — instances are immutable and hashable. The raw dict captures the full upstream properties payload so consumers can reach for fields not yet exposed on the dataclass without round- tripping through the API.

Attributes¶

source_name: Canonical name (no ?... suffix). Stable across CM API revisions for the same physical site. gas: Gas species — typically "CH4" or "CO2". sector: IPCC sector code, e.g. "1B2" (Oil & Gas), "6A" (Solid Waste), "1B1a" (Coal Mining). point: Centroid as a Shapely :class:shapely.geometry.Point in WGS-84. plume_count: Number of plumes Carbon Mapper has attributed to this source. persistence: Carbon Mapper's persistence metric (overpasses-with-detection / total-overpasses), in [0, 1]. emission_auto: Persistence-weighted average emission rate in kg/h. None when CM has not produced an aggregate estimate. emission_uncertainty_auto: Companion uncertainty for emission_auto, in kg/h. first_observation, last_observation: Earliest and latest detection datetimes (UTC-aware). raw: Original properties mapping from the API response.

Examples¶

Parse from a /catalog/sources.geojson feature:

feature = { ... "properties": { ... "source_name": "CH4_1B2_100m_-104.17525_32.49125?plume_gas=CH4", ... "sector": "1B2", "gas": "CH4", ... "plume_count": 12, "persistence": 0.42, ... "emission_auto": 250.0, ... }, ... "geometry": {"type": "Point", ... "coordinates": [-104.17525, 32.49125]}, ... } src = CMSource.from_geojson_feature(feature) src.source_name # query suffix stripped 'CH4_1B2_100m_-104.17525_32.49125' src.point.x, src.point.y (-104.17525, 32.49125) src.plume_count, src.sector (12, '1B2')

Source code in georeader/readers/carbonmapper/source.py

@dataclass(frozen=True)
class CMSource:
    """Typed view of a Carbon Mapper source (cluster of plumes).

    Frozen — instances are immutable and hashable. The ``raw`` dict
    captures the full upstream properties payload so consumers can
    reach for fields not yet exposed on the dataclass without round-
    tripping through the API.

    Attributes
    ----------
    source_name:
        Canonical name (no ``?...`` suffix). Stable across CM API
        revisions for the same physical site.
    gas:
        Gas species — typically ``"CH4"`` or ``"CO2"``.
    sector:
        IPCC sector code, e.g. ``"1B2"`` (Oil & Gas), ``"6A"`` (Solid
        Waste), ``"1B1a"`` (Coal Mining).
    point:
        Centroid as a Shapely :class:`shapely.geometry.Point` in WGS-84.
    plume_count:
        Number of plumes Carbon Mapper has attributed to this source.
    persistence:
        Carbon Mapper's persistence metric (overpasses-with-detection /
        total-overpasses), in ``[0, 1]``.
    emission_auto:
        Persistence-weighted average emission rate in ``kg/h``. ``None``
        when CM has not produced an aggregate estimate.
    emission_uncertainty_auto:
        Companion uncertainty for ``emission_auto``, in ``kg/h``.
    first_observation, last_observation:
        Earliest and latest detection datetimes (UTC-aware).
    raw:
        Original ``properties`` mapping from the API response.

    Examples
    --------
    Parse from a ``/catalog/sources.geojson`` feature:

    >>> feature = {
    ...     "properties": {
    ...         "source_name": "CH4_1B2_100m_-104.17525_32.49125?plume_gas=CH4",
    ...         "sector": "1B2", "gas": "CH4",
    ...         "plume_count": 12, "persistence": 0.42,
    ...         "emission_auto": 250.0,
    ...     },
    ...     "geometry": {"type": "Point",
    ...                  "coordinates": [-104.17525, 32.49125]},
    ... }
    >>> src = CMSource.from_geojson_feature(feature)
    >>> src.source_name              # query suffix stripped
    'CH4_1B2_100m_-104.17525_32.49125'
    >>> src.point.x, src.point.y
    (-104.17525, 32.49125)
    >>> src.plume_count, src.sector
    (12, '1B2')
    """

    source_name: str
    gas: str
    sector: str
    point: Point
    plume_count: int
    persistence: float
    emission_auto: float | None = None
    emission_uncertainty_auto: float | None = None
    first_observation: datetime | None = None
    last_observation: datetime | None = None
    raw: dict = field(default_factory=dict)

    @classmethod
    def from_geojson_feature(cls, feature: dict) -> "CMSource":
        """Parse a ``/catalog/sources.geojson`` feature into a CMSource.

        Always strips the ``source_name`` query-string suffix
        (``?plume_gas=...``) — this is the canonical strip site, so
        downstream code can treat ``CMSource.source_name`` as clean.

        Parameters
        ----------
        feature:
            GeoJSON Feature dict with at least ``"geometry"`` (Point)
            and ``"properties"`` (with ``source_name`` and friends).

        Returns
        -------
        CMSource
            Typed source record with the suffix stripped.

        Raises
        ------
        ValueError
            If ``feature["geometry"]`` does not carry a Point coordinate
            pair.

        Examples
        --------
        >>> feature = {
        ...     "properties": {"source_name": "x?bbox=1", "sector": "1B2",
        ...                    "gas": "CH4", "plume_count": 1,
        ...                    "persistence": 0.5},
        ...     "geometry": {"type": "Point", "coordinates": [-100.0, 30.0]},
        ... }
        >>> CMSource.from_geojson_feature(feature).source_name
        'x'
        """
        props = dict(feature.get("properties") or {})
        geom = feature.get("geometry") or {}
        coords = geom.get("coordinates") or (None, None)
        lon, lat = (coords + [None, None])[:2] if isinstance(coords, list) else (None, None)

        if lon is None or lat is None:
            raise ValueError(
                f"feature is missing Point coordinates: {feature!r}"
            )

        return cls(
            source_name=_strip_query_suffix(str(props.get("source_name", ""))),
            gas=str(props.get("gas", "") or ""),
            sector=str(props.get("sector", "") or ""),
            point=Point(float(lon), float(lat)),
            plume_count=int(props.get("plume_count") or 0),
            persistence=float(props.get("persistence") or 0.0),
            emission_auto=_to_float(props.get("emission_auto")),
            emission_uncertainty_auto=_to_float(
                props.get("emission_uncertainty_auto")
            ),
            first_observation=_parse_iso_datetime(props.get("first_observation")),
            last_observation=_parse_iso_datetime(props.get("last_observation")),
            raw=props,
        )

`from_geojson_feature(feature)` `classmethod` ¶

Parse a /catalog/sources.geojson feature into a CMSource.

Always strips the source_name query-string suffix (?plume_gas=...) — this is the canonical strip site, so downstream code can treat CMSource.source_name as clean.

Parameters¶

feature: GeoJSON Feature dict with at least "geometry" (Point) and "properties" (with source_name and friends).

Returns¶

CMSource Typed source record with the suffix stripped.

Raises¶

ValueError If feature["geometry"] does not carry a Point coordinate pair.

Examples¶

feature = { ... "properties": {"source_name": "x?bbox=1", "sector": "1B2", ... "gas": "CH4", "plume_count": 1, ... "persistence": 0.5}, ... "geometry": {"type": "Point", "coordinates": [-100.0, 30.0]}, ... } CMSource.from_geojson_feature(feature).source_name 'x'

Source code in georeader/readers/carbonmapper/source.py

@classmethod
def from_geojson_feature(cls, feature: dict) -> "CMSource":
    """Parse a ``/catalog/sources.geojson`` feature into a CMSource.

    Always strips the ``source_name`` query-string suffix
    (``?plume_gas=...``) — this is the canonical strip site, so
    downstream code can treat ``CMSource.source_name`` as clean.

    Parameters
    ----------
    feature:
        GeoJSON Feature dict with at least ``"geometry"`` (Point)
        and ``"properties"`` (with ``source_name`` and friends).

    Returns
    -------
    CMSource
        Typed source record with the suffix stripped.

    Raises
    ------
    ValueError
        If ``feature["geometry"]`` does not carry a Point coordinate
        pair.

    Examples
    --------
    >>> feature = {
    ...     "properties": {"source_name": "x?bbox=1", "sector": "1B2",
    ...                    "gas": "CH4", "plume_count": 1,
    ...                    "persistence": 0.5},
    ...     "geometry": {"type": "Point", "coordinates": [-100.0, 30.0]},
    ... }
    >>> CMSource.from_geojson_feature(feature).source_name
    'x'
    """
    props = dict(feature.get("properties") or {})
    geom = feature.get("geometry") or {}
    coords = geom.get("coordinates") or (None, None)
    lon, lat = (coords + [None, None])[:2] if isinstance(coords, list) else (None, None)

    if lon is None or lat is None:
        raise ValueError(
            f"feature is missing Point coordinates: {feature!r}"
        )

    return cls(
        source_name=_strip_query_suffix(str(props.get("source_name", ""))),
        gas=str(props.get("gas", "") or ""),
        sector=str(props.get("sector", "") or ""),
        point=Point(float(lon), float(lat)),
        plume_count=int(props.get("plume_count") or 0),
        persistence=float(props.get("persistence") or 0.0),
        emission_auto=_to_float(props.get("emission_auto")),
        emission_uncertainty_auto=_to_float(
            props.get("emission_uncertainty_auto")
        ),
        first_observation=_parse_iso_datetime(props.get("first_observation")),
        last_observation=_parse_iso_datetime(props.get("last_observation")),
        raw=props,
    )

Carbon Mapper L2B scene raster wrapper.

:class:CMImageRaster exposes every loadable L2B scene asset (cmf / cmf-unortho / uncertainty / uncertainty-unortho / artifact-mask / rgb / uas) as lazy properties backed by :class:~georeader.rasterio_reader.RasterioReader (or plain text for the uas.txt sidecar).

Per-plume L3A products (mask, concentrations, IME-clipped concentrations, RGB, outline) live in :mod:~georeader.readers.carbonmapper.image — :class:~georeader.readers.carbonmapper.image.CMPlumeImage is the counterpart to this class for plume-level data.

Intentionally NOT wrapped:

PNG assets (rgb_png etc.) — un-georeferenced, not COGs.
Per-plume con_tif from the catalog REST surface — duplicates the column-density crop already provided by CMPlumeImage.

Pure raster wrappers — no DB binding, no blob upload. The DB-bound classes (CarbonMapperTile, CarbonMapperLocationImage) and the analyst notebooks consume them.

`CM_L2B_BANDS = ('cmf', 'cmf-unortho', 'uncertainty', 'uncertainty-unortho', 'artifact-mask', 'rgb')` `module-attribute` ¶

`DEFAULT_L2B_RGB_COLLECTION = 'l2b-rgb-v3a'` `module-attribute` ¶

`CMImageRaster` `dataclass` ¶

L2B scene exposed as four georeader-backed rasters.

Lazy: instantiating the dataclass does NOT issue HTTP / blob reads; access .cmf / .rgb / etc. or call :meth:read_window / :meth:read_polygon to trigger I/O.

Attributes:

Name	Type	Description
`scene_id`	`str`	CM L2B item id (e.g. `"tan20251212t185057c20s4001"`).
`asset_paths`	`Mapping[str, PathLike]`	Mapping of band name → URL (`https://`) or local / blob path. `artifact-mask` may be missing — the accessor returns `None`.
`overview_level`	`Optional[int]`	Forwarded to `RasterioReader`. `None` for full resolution; integer for COG overviews (faster previews).

Source code in georeader/readers/carbonmapper/rasters.py

@dataclass(repr=False)
class CMImageRaster:
    """L2B scene exposed as four georeader-backed rasters.

    Lazy: instantiating the dataclass does NOT issue HTTP / blob reads;
    access ``.cmf`` / ``.rgb`` / etc. or call :meth:`read_window` /
    :meth:`read_polygon` to trigger I/O.

    Attributes:
        scene_id: CM L2B item id (e.g. ``"tan20251212t185057c20s4001"``).
        asset_paths: Mapping of band name → URL (``https://``) or
            local / blob path. ``artifact-mask`` may be missing — the
            accessor returns ``None``.
        overview_level: Forwarded to ``RasterioReader``. ``None`` for
            full resolution; integer for COG overviews (faster previews).
    """

    scene_id: str
    asset_paths: Mapping[str, PathLike]
    overview_level: Optional[int] = None

    # ---- Constructors --------------------------------------------------

    @classmethod
    def from_cm_tile_item(cls, item: CMTileItem) -> "CMImageRaster":
        """Build from the lightweight STAC item (Phase 0.2).

        STAC asset keys carry file extensions (``cmf.tif``,
        ``uncertainty.tif``, ``artifact-mask.tif``, ``uas.txt``,
        ``*-unortho.tif`` variants). This method strips the
        appropriate extension and retains every key listed in
        :data:`CM_L2B_BANDS` plus ``uas`` (the text sidecar).
        """
        paths: dict[str, PathLike] = {}
        for key, url in item.asset_urls.items():
            if not url:
                continue
            if key.endswith(".tif"):
                stripped = key[:-4]
            elif key.endswith(".txt"):
                stripped = key[:-4]
            else:
                stripped = key
            if stripped in _CM_L2B_KEYS_ALL:
                paths[stripped] = url
        return cls(scene_id=item.scene_id, asset_paths=paths)

    def with_rgb(self, rgb_item: CMTileItem) -> "CMImageRaster":
        """Return a copy with ``rgb`` merged in from a sibling STAC item.

        The CH4 (``l2b-ch4-mfa-v3a``) and RGB (``l2b-rgb-v3a``) L2B
        collections share ``scene_id`` and pixel grid, but each STAC
        item only exposes its own assets. Fetch both with
        :func:`api_queries.get_tile` (passing ``collection=...``) and
        compose them via this method:

        >>> ir = CMImageRaster.from_cm_tile_item(ch4_item)
        >>> ir = ir.with_rgb(rgb_item)
        >>> ir.rgb is not None
        True

        Raises:
            ValueError: If ``rgb_item.scene_id`` doesn't match
                ``self.scene_id`` (mismatched scenes don't share a grid
                — usually a programming error).
        """
        if rgb_item.scene_id != self.scene_id:
            raise ValueError(
                f"scene_id mismatch: {self.scene_id!r} vs {rgb_item.scene_id!r}"
            )
        # Pick the rgb GeoTIFF (with or without `.tif` extension);
        # ignore everything else on the rgb item.
        new_paths = dict(self.asset_paths)
        for key, url in rgb_item.asset_urls.items():
            if not url:
                continue
            stripped = key[:-4] if key.endswith(".tif") else key
            if stripped == "rgb":
                new_paths["rgb"] = url
                break
        return CMImageRaster(
            scene_id=self.scene_id,
            asset_paths=new_paths,
            overview_level=self.overview_level,
        )

    @classmethod
    def from_scene_id(
        cls,
        scene_id: str,
        *,
        token: str,
        l2b_collection_candidates: Sequence[str] = DEFAULT_L2B_CH4_COLLECTION_CANDIDATES,
        rgb_collection_candidates: Sequence[str] = DEFAULT_L2B_RGB_COLLECTION_CANDIDATES,
        with_rgb: bool = True,
        overview_level: int | None = None,
        http_timeout: float = 30.0,
    ) -> CMImageRaster:
        """Build by deriving L2B asset URLs from the scene_id (URL-pattern).

        Bypasses STAC entirely — derives every asset URL by templating
        against the verified asset-proxy pattern (see
        :func:`_l2b_asset_url`) and probing the candidate collections
        in order. Required for 2026 plumes (v3c/v3d L3A) whose L2B
        parent scenes are **not** in ``/stac/collections``.

        Parameters
        ----------
        scene_id:
            L2B scene id, equal to ``plume_id.rsplit("-", 1)[0]`` for
            any plume that came from this scene. Must follow the
            ``<inst><YYYYMMDD>t<HHMMSS>...`` convention so the date
            can be parsed.
        token:
            Bearer token. Required — the asset-proxy URLs return 401
            without it.
        l2b_collection_candidates:
            L2B CH4 collection IDs to probe, in order. First one to
            serve a 200/206 on ``cmf.tif`` wins. Defaults to
            :data:`DEFAULT_L2B_CH4_COLLECTION_CANDIDATES` —
            ``("l2b-ch4-mfa-v3c", "l2b-ch4-mfa-v3a")``.
        rgb_collection_candidates:
            L2B RGB sibling collection IDs probed identically (on
            ``rgb.tif``). Defaults to
            :data:`DEFAULT_L2B_RGB_COLLECTION_CANDIDATES`.
        with_rgb:
            When ``True`` (default), probe the RGB sibling collections
            and attach the ``rgb`` URL on success. When ``False``,
            ``self.rgb`` will be ``None``.
        overview_level:
            Forwarded to :class:`RasterioReader`.
        http_timeout:
            Per-probe range-GET timeout (seconds).

        Returns
        -------
        CMImageRaster
            With ``asset_paths`` populated for the 6 L2B CH4 assets
            (``cmf``, ``cmf-unortho``, ``uncertainty``,
            ``uncertainty-unortho``, ``artifact-mask``, ``uas``) and,
            when ``with_rgb=True``, the ``rgb`` sibling URL.

        Raises
        ------
        CMSceneNotPublished
            When every candidate L2B collection 404s for ``scene_id``
            — the scene either hasn't been processed yet or only
            exists in a collection variant not listed in
            ``l2b_collection_candidates``. Catch in ETL paths that
            want to defer rather than error.
        ValueError
            When ``scene_id`` doesn't carry an 8-digit date at
            positions ``[3:11]``.

        Examples
        --------
        >>> tile = CMImageRaster.from_scene_id(  # doctest: +SKIP
        ...     "tan20260331t181625c77s4001", token=tok,
        ... )
        >>> tile.cmf  # doctest: +SKIP
        <RasterioReader …/l2b-ch4-mfa-v3c/2026/03/31/…>
        """
        l2b_coll = _probe_l2b_collection(
            scene_id,
            l2b_collection_candidates,
            probe_asset="cmf.tif",
            token=token,
            http_timeout=http_timeout,
        )
        if l2b_coll is None:
            raise CMSceneNotPublished(scene_id)

        # Build the 6 CH4-collection asset URLs from the winning prefix.
        # Extensions are baked in — `_open` strips nothing, so keys must
        # match the lazy-property names exactly (without extensions).
        asset_paths: dict[str, PathLike] = {
            "cmf":                 _l2b_asset_url(l2b_coll, scene_id, "cmf.tif"),
            "cmf-unortho":         _l2b_asset_url(l2b_coll, scene_id, "cmf-unortho.tif"),
            "uncertainty":         _l2b_asset_url(l2b_coll, scene_id, "uncertainty.tif"),
            "uncertainty-unortho": _l2b_asset_url(l2b_coll, scene_id, "uncertainty-unortho.tif"),
            "artifact-mask":       _l2b_asset_url(l2b_coll, scene_id, "artifact-mask.tif"),
            "uas":                 _l2b_asset_url(l2b_coll, scene_id, "uas.txt"),
        }

        if with_rgb:
            rgb_coll = _probe_l2b_collection(
                scene_id,
                rgb_collection_candidates,
                probe_asset="rgb.tif",
                token=token,
                http_timeout=http_timeout,
            )
            if rgb_coll is not None:
                asset_paths["rgb"] = _l2b_asset_url(rgb_coll, scene_id, "rgb.tif")

        return cls(
            scene_id=scene_id,
            asset_paths=asset_paths,
            overview_level=overview_level,
        )

    @classmethod
    def from_local(cls, scene_dir: PathLike) -> "CMImageRaster":
        """Build from a downloaded scene directory.

        Picks up every L2B asset present (``cmf.tif`` / ``rgb.tif`` /
        ``uncertainty.tif`` / ``artifact-mask.tif`` and the
        un-orthorectified variants), plus the ``uas.txt`` sidecar.
        Missing files become absent keys in ``asset_paths``.
        """
        d = Path(scene_dir)
        paths: dict[str, PathLike] = {}
        for band in CM_L2B_BANDS:
            p = d / f"{band}.tif"
            if p.exists():
                paths[band] = str(p)
        uas_path = d / "uas.txt"
        if uas_path.exists():
            paths["uas"] = str(uas_path)
        return cls(scene_id=d.name, asset_paths=paths)

    # ---- Lazy band readers --------------------------------------------

    @cached_property
    def cmf(self) -> RasterioReader:
        """CH4 matched-filter retrieval, orthorectified (ppm·m).
        Always present on L2B-CH4 items."""
        return self._open("cmf")

    @cached_property
    def cmf_unortho(self) -> Optional[RasterioReader]:
        """CH4 retrieval in raw sensor frame (pre-orthorectification).
        ``None`` for older collection variants (e.g. ``mfm-v1``) that
        don't ship the unortho sibling."""
        return self._open_optional("cmf-unortho")

    @cached_property
    def rgb(self) -> Optional[RasterioReader]:
        """3-band uint8 RGB. ``None`` for L2B-CH4 collections (RGB lives
        in a separate STAC collection — fetch and pass via
        ``asset_paths`` or compose via :meth:`with_rgb`)."""
        return self._open_optional("rgb")

    @cached_property
    def uncertainty(self) -> RasterioReader:
        """Companion uncertainty raster aligned with ``cmf``."""
        return self._open("uncertainty")

    @cached_property
    def uncertainty_unortho(self) -> Optional[RasterioReader]:
        """Per-pixel uncertainty in raw sensor frame. ``None`` for
        older collection variants without the unortho sibling."""
        return self._open_optional("uncertainty-unortho")

    @cached_property
    def artifact_mask(self) -> Optional[RasterioReader]:
        """Artefact mask (covers ~25% of scene). Flags un-orthorectified
        strip pixels and geometric anomalies — **not** a cloud mask.
        ``None`` if absent."""
        return self._open_optional("artifact-mask")

    @cached_property
    def uas(self) -> Optional[str]:
        """UAS sensor-metadata sidecar — raw text from ``uas.txt``.

        Lazy-fetched on first access (one HTTP GET if the path is a
        URL, or a file read for local paths) and cached as a string.
        Callers parse the structure as needed; we don't impose a
        schema. Returns ``None`` if no ``uas`` URL/path was supplied.

        Auth: rasterio's curl session is configured via the
        ``GDAL_HTTP_HEADERS`` env var (set by the standard reader
        bootstrap). We re-use that header here so a single
        ``Authorization: Bearer <token>`` setup applies to every
        L2B asset, raster or text alike.
        """
        path = self.asset_paths.get("uas")
        if path is None:
            return None
        sp = str(path)
        if sp.startswith(("http://", "https://")):
            headers: dict[str, str] = {}
            gdal_hdr = os.environ.get("GDAL_HTTP_HEADERS", "")
            if gdal_hdr.lower().startswith("authorization:"):
                headers["Authorization"] = gdal_hdr.split(":", 1)[1].strip()
            r = requests.get(sp, headers=headers, timeout=30)
            r.raise_for_status()
            return r.text
        with open(sp, "r") as fh:
            return fh.read()

    # ---- Geometric metadata (pulled from cmf as the canonical band) ---

    @property
    def crs(self) -> str:
        return str(self.cmf.crs)

    @property
    def transform(self):
        return self.cmf.transform

    @property
    def bounds(self) -> BBox:
        b = self.cmf.bounds
        return (float(b[0]), float(b[1]), float(b[2]), float(b[3]))

    @property
    def shape(self) -> tuple[int, int]:
        return (self.cmf.height, self.cmf.width)

    # ---- Read helpers (delegate to georeader.read) --------------------

    def read_polygon(
        self,
        polygon: BaseGeometry,
        *,
        crs_polygon: str = "EPSG:4326",
        bands: Iterable[str] = CM_L2B_BANDS,
    ) -> dict[str, Optional[GeoData]]:
        """Read a polygon clip from the requested bands.

        Args:
            polygon: Clip geometry.
            crs_polygon: CRS of ``polygon``. Defaults to ``"EPSG:4326"``.
            bands: Subset of band names. Bands whose asset is missing
                or whose window has zero overlap return ``None``.

        Returns:
            ``{"cmf": <GeoData>, "rgb": <GeoData>, ...}`` — windowed
            ``RasterioReader`` instances (lazy, satisfying the
            :class:`GeoData` protocol). Call ``.load()`` to materialise
            as :class:`GeoTensor`.
        """
        out: dict[str, Optional[GeoData]] = {}
        for band in bands:
            if self.asset_paths.get(band) is None:
                out[band] = None
                continue
            # `uas` is a text sidecar, not a raster — skip the band
            # reader path. Callers reading text sidecars use the
            # `.uas` property directly.
            if band == "uas":
                continue
            reader = self._open(band)
            # `boundless=False` makes `read_from_polygon` return `None`
            # for windows that don't intersect the raster (e.g. an
            # artifact-mask whose un-orthorectified strip falls outside
            # the requested AOI), instead of allocating a fill-valued
            # tensor the size of the requested window. Real CRS / I/O
            # errors are left to propagate — the prior bare
            # `except Exception` swallowed those silently.
            #
            # `read_from_polygon` returns ``GeoData | NDArray``; with
            # ``return_only_data=False`` the GeoData arm is the one we
            # always hit. ``RasterioReader`` satisfies the ``GeoData``
            # protocol structurally, but ty doesn't currently infer
            # that — cast for clarity.
            result = read.read_from_polygon(
                cast(GeoData, reader),
                polygon=polygon,
                crs_polygon=crs_polygon,
                boundless=False,
            )
            out[band] = cast(GeoData, result) if result is not None else None
        return out

    def read_window(
        self,
        bounds_4326: BBox,
        *,
        bands: Iterable[str] = CM_L2B_BANDS,
    ) -> dict[str, Optional[GeoData]]:
        """Read a WGS-84 bbox window from the requested bands."""
        return self.read_polygon(box(*bounds_4326), bands=bands)

    def read_window_to_crs(
        self,
        bounds_4326: BBox,
        crs_dst: str,
        *,
        bands: Iterable[str] = CM_L2B_BANDS,
    ) -> dict[str, Optional[GeoTensor]]:
        """Read a window then reproject each band to ``crs_dst``.

        Reprojection materialises the data — values are
        :class:`GeoTensor`, not lazy readers.
        """
        crops = self.read_window(bounds_4326, bands=bands)
        # `read_to_crs` returns ``GeoTensor | NDArray``; same narrowing
        # rationale as ``read_from_polygon`` above.
        return {
            band: (
                cast(GeoTensor, read.read_to_crs(geo, crs_dst))
                if geo is not None
                else None
            )
            for band, geo in crops.items()
        }

    # ---- Internals -----------------------------------------------------

    def _open(self, band: str) -> RasterioReader:
        path = self.asset_paths.get(band)
        if path is None:
            raise KeyError(f"Asset {band!r} not present on {self.scene_id}")
        return RasterioReader(str(path), overview_level=self.overview_level)

    def _open_optional(self, band: str) -> Optional[RasterioReader]:
        if self.asset_paths.get(band) is None:
            return None
        return self._open(band)

    # ---- Repr ---------------------------------------------------------

    def __repr__(self) -> str:
        present = [b for b in CM_L2B_BANDS if b in self.asset_paths]
        missing = [b for b in CM_L2B_BANDS if b not in self.asset_paths]
        extra = sorted(set(self.asset_paths) - set(CM_L2B_BANDS))
        ov = self.overview_level if self.overview_level is not None else "full"
        lines = [
            "CMImageRaster",
            f"  scene_id:       {self.scene_id}",
            f"  bands present:  {present or '<none>'}",
        ]
        if missing:
            lines.append(f"  bands missing:  {missing}")
        if extra:
            lines.append(f"  extra keys:     {extra}")
        lines.append(f"  overview_level: {ov}")
        return "\n".join(lines)

    __str__ = __repr__

`artifact_mask` `cached` `property` ¶

Artefact mask (covers ~25% of scene). Flags un-orthorectified strip pixels and geometric anomalies — not a cloud mask. None if absent.

`cmf` `cached` `property` ¶

CH4 matched-filter retrieval, orthorectified (ppm·m). Always present on L2B-CH4 items.

`cmf_unortho` `cached` `property` ¶

CH4 retrieval in raw sensor frame (pre-orthorectification). None for older collection variants (e.g. mfm-v1) that don't ship the unortho sibling.

`rgb` `cached` `property` ¶

3-band uint8 RGB. None for L2B-CH4 collections (RGB lives in a separate STAC collection — fetch and pass via asset_paths or compose via :meth:with_rgb).

`uas` `cached` `property` ¶

UAS sensor-metadata sidecar — raw text from uas.txt.

Lazy-fetched on first access (one HTTP GET if the path is a URL, or a file read for local paths) and cached as a string. Callers parse the structure as needed; we don't impose a schema. Returns None if no uas URL/path was supplied.

Auth: rasterio's curl session is configured via the GDAL_HTTP_HEADERS env var (set by the standard reader bootstrap). We re-use that header here so a single Authorization: Bearer <token> setup applies to every L2B asset, raster or text alike.

`uncertainty` `cached` `property` ¶

Companion uncertainty raster aligned with cmf.

`uncertainty_unortho` `cached` `property` ¶

Per-pixel uncertainty in raw sensor frame. None for older collection variants without the unortho sibling.

`from_cm_tile_item(item)` `classmethod` ¶

Build from the lightweight STAC item (Phase 0.2).

STAC asset keys carry file extensions (cmf.tif, uncertainty.tif, artifact-mask.tif, uas.txt, *-unortho.tif variants). This method strips the appropriate extension and retains every key listed in :data:CM_L2B_BANDS plus uas (the text sidecar).

Source code in georeader/readers/carbonmapper/rasters.py

@classmethod
def from_cm_tile_item(cls, item: CMTileItem) -> "CMImageRaster":
    """Build from the lightweight STAC item (Phase 0.2).

    STAC asset keys carry file extensions (``cmf.tif``,
    ``uncertainty.tif``, ``artifact-mask.tif``, ``uas.txt``,
    ``*-unortho.tif`` variants). This method strips the
    appropriate extension and retains every key listed in
    :data:`CM_L2B_BANDS` plus ``uas`` (the text sidecar).
    """
    paths: dict[str, PathLike] = {}
    for key, url in item.asset_urls.items():
        if not url:
            continue
        if key.endswith(".tif"):
            stripped = key[:-4]
        elif key.endswith(".txt"):
            stripped = key[:-4]
        else:
            stripped = key
        if stripped in _CM_L2B_KEYS_ALL:
            paths[stripped] = url
    return cls(scene_id=item.scene_id, asset_paths=paths)

`from_local(scene_dir)` `classmethod` ¶

Build from a downloaded scene directory.

Picks up every L2B asset present (cmf.tif / rgb.tif / uncertainty.tif / artifact-mask.tif and the un-orthorectified variants), plus the uas.txt sidecar. Missing files become absent keys in asset_paths.

Source code in georeader/readers/carbonmapper/rasters.py

@classmethod
def from_local(cls, scene_dir: PathLike) -> "CMImageRaster":
    """Build from a downloaded scene directory.

    Picks up every L2B asset present (``cmf.tif`` / ``rgb.tif`` /
    ``uncertainty.tif`` / ``artifact-mask.tif`` and the
    un-orthorectified variants), plus the ``uas.txt`` sidecar.
    Missing files become absent keys in ``asset_paths``.
    """
    d = Path(scene_dir)
    paths: dict[str, PathLike] = {}
    for band in CM_L2B_BANDS:
        p = d / f"{band}.tif"
        if p.exists():
            paths[band] = str(p)
    uas_path = d / "uas.txt"
    if uas_path.exists():
        paths["uas"] = str(uas_path)
    return cls(scene_id=d.name, asset_paths=paths)

`from_scene_id(scene_id, *, token, l2b_collection_candidates=DEFAULT_L2B_CH4_COLLECTION_CANDIDATES, rgb_collection_candidates=DEFAULT_L2B_RGB_COLLECTION_CANDIDATES, with_rgb=True, overview_level=None, http_timeout=30.0)` `classmethod` ¶

Build by deriving L2B asset URLs from the scene_id (URL-pattern).

Bypasses STAC entirely — derives every asset URL by templating against the verified asset-proxy pattern (see :func:_l2b_asset_url) and probing the candidate collections in order. Required for 2026 plumes (v3c/v3d L3A) whose L2B parent scenes are not in /stac/collections.

Parameters¶

scene_id: L2B scene id, equal to plume_id.rsplit("-", 1)[0] for any plume that came from this scene. Must follow the <inst><YYYYMMDD>t<HHMMSS>... convention so the date can be parsed. token: Bearer token. Required — the asset-proxy URLs return 401 without it. l2b_collection_candidates: L2B CH4 collection IDs to probe, in order. First one to serve a 200/206 on cmf.tif wins. Defaults to :data:DEFAULT_L2B_CH4_COLLECTION_CANDIDATES — ("l2b-ch4-mfa-v3c", "l2b-ch4-mfa-v3a"). rgb_collection_candidates: L2B RGB sibling collection IDs probed identically (on rgb.tif). Defaults to :data:DEFAULT_L2B_RGB_COLLECTION_CANDIDATES. with_rgb: When True (default), probe the RGB sibling collections and attach the rgb URL on success. When False, self.rgb will be None. overview_level: Forwarded to :class:RasterioReader. http_timeout: Per-probe range-GET timeout (seconds).

Returns¶

CMImageRaster With asset_paths populated for the 6 L2B CH4 assets (cmf, cmf-unortho, uncertainty, uncertainty-unortho, artifact-mask, uas) and, when with_rgb=True, the rgb sibling URL.

Raises¶

CMSceneNotPublished When every candidate L2B collection 404s for scene_id — the scene either hasn't been processed yet or only exists in a collection variant not listed in l2b_collection_candidates. Catch in ETL paths that want to defer rather than error. ValueError When scene_id doesn't carry an 8-digit date at positions [3:11].

Examples¶

tile = CMImageRaster.from_scene_id( # doctest: +SKIP ... "tan20260331t181625c77s4001", token=tok, ... ) tile.cmf # doctest: +SKIP

Source code in georeader/readers/carbonmapper/rasters.py

@classmethod
def from_scene_id(
    cls,
    scene_id: str,
    *,
    token: str,
    l2b_collection_candidates: Sequence[str] = DEFAULT_L2B_CH4_COLLECTION_CANDIDATES,
    rgb_collection_candidates: Sequence[str] = DEFAULT_L2B_RGB_COLLECTION_CANDIDATES,
    with_rgb: bool = True,
    overview_level: int | None = None,
    http_timeout: float = 30.0,
) -> CMImageRaster:
    """Build by deriving L2B asset URLs from the scene_id (URL-pattern).

    Bypasses STAC entirely — derives every asset URL by templating
    against the verified asset-proxy pattern (see
    :func:`_l2b_asset_url`) and probing the candidate collections
    in order. Required for 2026 plumes (v3c/v3d L3A) whose L2B
    parent scenes are **not** in ``/stac/collections``.

    Parameters
    ----------
    scene_id:
        L2B scene id, equal to ``plume_id.rsplit("-", 1)[0]`` for
        any plume that came from this scene. Must follow the
        ``<inst><YYYYMMDD>t<HHMMSS>...`` convention so the date
        can be parsed.
    token:
        Bearer token. Required — the asset-proxy URLs return 401
        without it.
    l2b_collection_candidates:
        L2B CH4 collection IDs to probe, in order. First one to
        serve a 200/206 on ``cmf.tif`` wins. Defaults to
        :data:`DEFAULT_L2B_CH4_COLLECTION_CANDIDATES` —
        ``("l2b-ch4-mfa-v3c", "l2b-ch4-mfa-v3a")``.
    rgb_collection_candidates:
        L2B RGB sibling collection IDs probed identically (on
        ``rgb.tif``). Defaults to
        :data:`DEFAULT_L2B_RGB_COLLECTION_CANDIDATES`.
    with_rgb:
        When ``True`` (default), probe the RGB sibling collections
        and attach the ``rgb`` URL on success. When ``False``,
        ``self.rgb`` will be ``None``.
    overview_level:
        Forwarded to :class:`RasterioReader`.
    http_timeout:
        Per-probe range-GET timeout (seconds).

    Returns
    -------
    CMImageRaster
        With ``asset_paths`` populated for the 6 L2B CH4 assets
        (``cmf``, ``cmf-unortho``, ``uncertainty``,
        ``uncertainty-unortho``, ``artifact-mask``, ``uas``) and,
        when ``with_rgb=True``, the ``rgb`` sibling URL.

    Raises
    ------
    CMSceneNotPublished
        When every candidate L2B collection 404s for ``scene_id``
        — the scene either hasn't been processed yet or only
        exists in a collection variant not listed in
        ``l2b_collection_candidates``. Catch in ETL paths that
        want to defer rather than error.
    ValueError
        When ``scene_id`` doesn't carry an 8-digit date at
        positions ``[3:11]``.

    Examples
    --------
    >>> tile = CMImageRaster.from_scene_id(  # doctest: +SKIP
    ...     "tan20260331t181625c77s4001", token=tok,
    ... )
    >>> tile.cmf  # doctest: +SKIP
    <RasterioReader …/l2b-ch4-mfa-v3c/2026/03/31/…>
    """
    l2b_coll = _probe_l2b_collection(
        scene_id,
        l2b_collection_candidates,
        probe_asset="cmf.tif",
        token=token,
        http_timeout=http_timeout,
    )
    if l2b_coll is None:
        raise CMSceneNotPublished(scene_id)

    # Build the 6 CH4-collection asset URLs from the winning prefix.
    # Extensions are baked in — `_open` strips nothing, so keys must
    # match the lazy-property names exactly (without extensions).
    asset_paths: dict[str, PathLike] = {
        "cmf":                 _l2b_asset_url(l2b_coll, scene_id, "cmf.tif"),
        "cmf-unortho":         _l2b_asset_url(l2b_coll, scene_id, "cmf-unortho.tif"),
        "uncertainty":         _l2b_asset_url(l2b_coll, scene_id, "uncertainty.tif"),
        "uncertainty-unortho": _l2b_asset_url(l2b_coll, scene_id, "uncertainty-unortho.tif"),
        "artifact-mask":       _l2b_asset_url(l2b_coll, scene_id, "artifact-mask.tif"),
        "uas":                 _l2b_asset_url(l2b_coll, scene_id, "uas.txt"),
    }

    if with_rgb:
        rgb_coll = _probe_l2b_collection(
            scene_id,
            rgb_collection_candidates,
            probe_asset="rgb.tif",
            token=token,
            http_timeout=http_timeout,
        )
        if rgb_coll is not None:
            asset_paths["rgb"] = _l2b_asset_url(rgb_coll, scene_id, "rgb.tif")

    return cls(
        scene_id=scene_id,
        asset_paths=asset_paths,
        overview_level=overview_level,
    )

`read_polygon(polygon, *, crs_polygon='EPSG:4326', bands=CM_L2B_BANDS)` ¶

Read a polygon clip from the requested bands.

Parameters:

Name	Type	Description	Default
`polygon`	`BaseGeometry`	Clip geometry.	required
`crs_polygon`	`str`	CRS of `polygon`. Defaults to `"EPSG:4326"`.	`'EPSG:4326'`
`bands`	`Iterable[str]`	Subset of band names. Bands whose asset is missing or whose window has zero overlap return `None`.	`CM_L2B_BANDS`

Returns:

Name	Type	Description
	`dict[str, Optional[GeoData]]`	`{"cmf": <GeoData>, "rgb": <GeoData>, ...}` — windowed
	`dict[str, Optional[GeoData]]`	`RasterioReader` instances (lazy, satisfying the
	`dict[str, Optional[GeoData]]`	class:`GeoData` protocol). Call `.load()` to materialise
`as`	`dict[str, Optional[GeoData]]`	class:`GeoTensor`.

Source code in georeader/readers/carbonmapper/rasters.py

def read_polygon(
    self,
    polygon: BaseGeometry,
    *,
    crs_polygon: str = "EPSG:4326",
    bands: Iterable[str] = CM_L2B_BANDS,
) -> dict[str, Optional[GeoData]]:
    """Read a polygon clip from the requested bands.

    Args:
        polygon: Clip geometry.
        crs_polygon: CRS of ``polygon``. Defaults to ``"EPSG:4326"``.
        bands: Subset of band names. Bands whose asset is missing
            or whose window has zero overlap return ``None``.

    Returns:
        ``{"cmf": <GeoData>, "rgb": <GeoData>, ...}`` — windowed
        ``RasterioReader`` instances (lazy, satisfying the
        :class:`GeoData` protocol). Call ``.load()`` to materialise
        as :class:`GeoTensor`.
    """
    out: dict[str, Optional[GeoData]] = {}
    for band in bands:
        if self.asset_paths.get(band) is None:
            out[band] = None
            continue
        # `uas` is a text sidecar, not a raster — skip the band
        # reader path. Callers reading text sidecars use the
        # `.uas` property directly.
        if band == "uas":
            continue
        reader = self._open(band)
        # `boundless=False` makes `read_from_polygon` return `None`
        # for windows that don't intersect the raster (e.g. an
        # artifact-mask whose un-orthorectified strip falls outside
        # the requested AOI), instead of allocating a fill-valued
        # tensor the size of the requested window. Real CRS / I/O
        # errors are left to propagate — the prior bare
        # `except Exception` swallowed those silently.
        #
        # `read_from_polygon` returns ``GeoData | NDArray``; with
        # ``return_only_data=False`` the GeoData arm is the one we
        # always hit. ``RasterioReader`` satisfies the ``GeoData``
        # protocol structurally, but ty doesn't currently infer
        # that — cast for clarity.
        result = read.read_from_polygon(
            cast(GeoData, reader),
            polygon=polygon,
            crs_polygon=crs_polygon,
            boundless=False,
        )
        out[band] = cast(GeoData, result) if result is not None else None
    return out

`read_window(bounds_4326, *, bands=CM_L2B_BANDS)` ¶

Read a WGS-84 bbox window from the requested bands.

Source code in georeader/readers/carbonmapper/rasters.py

def read_window(
    self,
    bounds_4326: BBox,
    *,
    bands: Iterable[str] = CM_L2B_BANDS,
) -> dict[str, Optional[GeoData]]:
    """Read a WGS-84 bbox window from the requested bands."""
    return self.read_polygon(box(*bounds_4326), bands=bands)

`read_window_to_crs(bounds_4326, crs_dst, *, bands=CM_L2B_BANDS)` ¶

Read a window then reproject each band to crs_dst.

Reprojection materialises the data — values are :class:GeoTensor, not lazy readers.

Source code in georeader/readers/carbonmapper/rasters.py

def read_window_to_crs(
    self,
    bounds_4326: BBox,
    crs_dst: str,
    *,
    bands: Iterable[str] = CM_L2B_BANDS,
) -> dict[str, Optional[GeoTensor]]:
    """Read a window then reproject each band to ``crs_dst``.

    Reprojection materialises the data — values are
    :class:`GeoTensor`, not lazy readers.
    """
    crops = self.read_window(bounds_4326, bands=bands)
    # `read_to_crs` returns ``GeoTensor | NDArray``; same narrowing
    # rationale as ``read_from_polygon`` above.
    return {
        band: (
            cast(GeoTensor, read.read_to_crs(geo, crs_dst))
            if geo is not None
            else None
        )
        for band, geo in crops.items()
    }

`with_rgb(rgb_item)` ¶

Return a copy with rgb merged in from a sibling STAC item.

The CH4 (l2b-ch4-mfa-v3a) and RGB (l2b-rgb-v3a) L2B collections share scene_id and pixel grid, but each STAC item only exposes its own assets. Fetch both with :func:api_queries.get_tile (passing collection=...) and compose them via this method:

ir = CMImageRaster.from_cm_tile_item(ch4_item) ir = ir.with_rgb(rgb_item) ir.rgb is not None True

Raises:

Type	Description
`ValueError`	If `rgb_item.scene_id` doesn't match `self.scene_id` (mismatched scenes don't share a grid — usually a programming error).

Source code in georeader/readers/carbonmapper/rasters.py

def with_rgb(self, rgb_item: CMTileItem) -> "CMImageRaster":
    """Return a copy with ``rgb`` merged in from a sibling STAC item.

    The CH4 (``l2b-ch4-mfa-v3a``) and RGB (``l2b-rgb-v3a``) L2B
    collections share ``scene_id`` and pixel grid, but each STAC
    item only exposes its own assets. Fetch both with
    :func:`api_queries.get_tile` (passing ``collection=...``) and
    compose them via this method:

    >>> ir = CMImageRaster.from_cm_tile_item(ch4_item)
    >>> ir = ir.with_rgb(rgb_item)
    >>> ir.rgb is not None
    True

    Raises:
        ValueError: If ``rgb_item.scene_id`` doesn't match
            ``self.scene_id`` (mismatched scenes don't share a grid
            — usually a programming error).
    """
    if rgb_item.scene_id != self.scene_id:
        raise ValueError(
            f"scene_id mismatch: {self.scene_id!r} vs {rgb_item.scene_id!r}"
        )
    # Pick the rgb GeoTIFF (with or without `.tif` extension);
    # ignore everything else on the rgb item.
    new_paths = dict(self.asset_paths)
    for key, url in rgb_item.asset_urls.items():
        if not url:
            continue
        stripped = key[:-4] if key.endswith(".tif") else key
        if stripped == "rgb":
            new_paths["rgb"] = url
            break
    return CMImageRaster(
        scene_id=self.scene_id,
        asset_paths=new_paths,
        overview_level=self.overview_level,
    )

Rasterize Carbon Mapper sources (point clusters) onto a target grid.

Carbon Mapper sources are point geometries (DBSCAN-clustered plume locations). For training labels, QA overlays, and source-prior features it is useful to project them onto the same grid as an L2B scene as a binary mask. This module provides:

:func:rasterize_sources — one-shot function: list of points → :class:~georeader.geotensor.GeoTensor mask.
:class:CMSourceRaster — lazy wrapper that mirrors :class:~georeader.readers.carbonmapper.rasters.CMImageRaster shape (read_polygon / read_window / read_window_to_crs) so callers can compose the source mask with the L2B rasters.

Both delegate the actual burn-in to :func:georeader.rasterize.rasterize_geopandas_like / :func:~georeader.rasterize.rasterize_from_geopandas — no custom rasterio.features call lives in this module.

The Carbon Mapper API does not publish a sources raster — these helpers build it client-side from :func:list_sources (or any iterable of :class:~georeader.readers.carbonmapper.source.CMSource).

`CMSourceRaster` `dataclass` ¶

Lazy binary-mask raster of Carbon Mapper sources on a target grid.

Mirrors the read-helper surface of :class:~georeader.readers.carbonmapper.rasters.CMImageRaster so callers can compose source masks with L2B reads.

Attributes¶

sources: Source points to rasterize. transform, shape, crs: Target grid spec. Use :meth:from_cmtileitem or :meth:from_geodata to inherit the spec from an existing raster. buffer_m: Per-point disk radius in metres. 0 → single pixel.

Source code in georeader/readers/carbonmapper/sources_raster.py

@dataclass(repr=False)
class CMSourceRaster:
    """Lazy binary-mask raster of Carbon Mapper sources on a target grid.

    Mirrors the read-helper surface of
    :class:`~georeader.readers.carbonmapper.rasters.CMImageRaster` so
    callers can compose source masks with L2B reads.

    Attributes
    ----------
    sources:
        Source points to rasterize.
    transform, shape, crs:
        Target grid spec. Use :meth:`from_cmtileitem` or
        :meth:`from_geodata` to inherit the spec from an existing
        raster.
    buffer_m:
        Per-point disk radius in metres. ``0`` → single pixel.
    """

    sources: Sequence[SourceLike]
    transform: rasterio.Affine
    shape: tuple[int, int]
    crs: rasterio.crs.CRS
    buffer_m: float = 0.0

    # ---- Constructors ----

    @classmethod
    def from_geodata(
        cls,
        sources: Sequence[SourceLike],
        template: GeoData,
        *,
        buffer_m: float = 0.0,
    ) -> "CMSourceRaster":
        """Build a source raster aligned to an existing :class:`GeoData`."""
        return cls(
            sources=sources,
            transform=template.transform,
            shape=(template.shape[-2], template.shape[-1]),
            crs=rasterio.crs.CRS.from_user_input(template.crs),
            buffer_m=buffer_m,
        )

    @classmethod
    def from_cmtileitem(
        cls,
        sources: Sequence[SourceLike],
        tile: CMTileItem,
        *,
        buffer_m: float = 0.0,
    ) -> "CMSourceRaster":
        """Build a source raster aligned to an L2B :class:`CMTileItem`.

        Resolves the tile's ``cmf`` GeoTIFF header to inherit
        ``(transform, shape, crs)``. Issues one HEAD/GET-range read.
        """
        cmf_url = tile.assets.get("cmf") or tile.assets.get("ch4-mfa")
        if cmf_url is None:
            raise ValueError(
                f"CMTileItem {tile.scene_id!r} has no 'cmf' asset to align to."
            )
        with rasterio.open(cmf_url) as ds:
            return cls(
                sources=sources,
                transform=ds.transform,
                shape=(ds.height, ds.width),
                crs=ds.crs,
                buffer_m=buffer_m,
            )

    # ---- Eager render ----

    def load(self) -> GeoTensor:
        """Rasterize all sources onto the full grid."""
        return rasterize_sources(
            self.sources,
            transform=self.transform,
            shape=self.shape,
            crs=self.crs,
            buffer_m=self.buffer_m,
        )

    # ---- Read helpers (mirror CMImageRaster) ----

    def read_polygon(
        self,
        polygon: BaseGeometry,
        *,
        crs_polygon: str = "EPSG:4326",
    ) -> GeoTensor:
        """Read a polygon clip of the source mask."""
        full = self.load()
        # `read_from_polygon` returns ``GeoData | NDArray``; with the
        # default ``return_only_data=False`` the GeoData arm is the one
        # we always hit.
        return cast(
            GeoTensor,
            read.read_from_polygon(
                cast(GeoData, full),
                polygon=polygon,
                crs_polygon=crs_polygon,
            ),
        )

    def read_window(self, bounds_4326: BBox) -> GeoTensor:
        """Read a WGS-84 bbox window of the source mask."""
        return self.read_polygon(box(*bounds_4326))

    def read_window_to_crs(
        self,
        bounds_4326: BBox,
        crs_dst: str,
    ) -> GeoTensor:
        """Read a window then reproject the mask to ``crs_dst``."""
        crop = self.read_window(bounds_4326)
        return cast(GeoTensor, read.read_to_crs(crop, crs_dst))

    # ---- Repr ----

    def __repr__(self) -> str:
        return (
            f"{type(self).__name__}(n_sources={len(self.sources)}, "
            f"shape={self.shape}, buffer_m={self.buffer_m}, crs={self.crs})"
        )

`from_cmtileitem(sources, tile, *, buffer_m=0.0)` `classmethod` ¶

Build a source raster aligned to an L2B :class:CMTileItem.

Resolves the tile's cmf GeoTIFF header to inherit (transform, shape, crs). Issues one HEAD/GET-range read.

Source code in georeader/readers/carbonmapper/sources_raster.py

@classmethod
def from_cmtileitem(
    cls,
    sources: Sequence[SourceLike],
    tile: CMTileItem,
    *,
    buffer_m: float = 0.0,
) -> "CMSourceRaster":
    """Build a source raster aligned to an L2B :class:`CMTileItem`.

    Resolves the tile's ``cmf`` GeoTIFF header to inherit
    ``(transform, shape, crs)``. Issues one HEAD/GET-range read.
    """
    cmf_url = tile.assets.get("cmf") or tile.assets.get("ch4-mfa")
    if cmf_url is None:
        raise ValueError(
            f"CMTileItem {tile.scene_id!r} has no 'cmf' asset to align to."
        )
    with rasterio.open(cmf_url) as ds:
        return cls(
            sources=sources,
            transform=ds.transform,
            shape=(ds.height, ds.width),
            crs=ds.crs,
            buffer_m=buffer_m,
        )

`from_geodata(sources, template, *, buffer_m=0.0)` `classmethod` ¶

Build a source raster aligned to an existing :class:GeoData.

Source code in georeader/readers/carbonmapper/sources_raster.py

@classmethod
def from_geodata(
    cls,
    sources: Sequence[SourceLike],
    template: GeoData,
    *,
    buffer_m: float = 0.0,
) -> "CMSourceRaster":
    """Build a source raster aligned to an existing :class:`GeoData`."""
    return cls(
        sources=sources,
        transform=template.transform,
        shape=(template.shape[-2], template.shape[-1]),
        crs=rasterio.crs.CRS.from_user_input(template.crs),
        buffer_m=buffer_m,
    )

`load()` ¶

Rasterize all sources onto the full grid.

Source code in georeader/readers/carbonmapper/sources_raster.py

def load(self) -> GeoTensor:
    """Rasterize all sources onto the full grid."""
    return rasterize_sources(
        self.sources,
        transform=self.transform,
        shape=self.shape,
        crs=self.crs,
        buffer_m=self.buffer_m,
    )

`read_polygon(polygon, *, crs_polygon='EPSG:4326')` ¶

Read a polygon clip of the source mask.

Source code in georeader/readers/carbonmapper/sources_raster.py

def read_polygon(
    self,
    polygon: BaseGeometry,
    *,
    crs_polygon: str = "EPSG:4326",
) -> GeoTensor:
    """Read a polygon clip of the source mask."""
    full = self.load()
    # `read_from_polygon` returns ``GeoData | NDArray``; with the
    # default ``return_only_data=False`` the GeoData arm is the one
    # we always hit.
    return cast(
        GeoTensor,
        read.read_from_polygon(
            cast(GeoData, full),
            polygon=polygon,
            crs_polygon=crs_polygon,
        ),
    )

`read_window(bounds_4326)` ¶

Read a WGS-84 bbox window of the source mask.

Source code in georeader/readers/carbonmapper/sources_raster.py

def read_window(self, bounds_4326: BBox) -> GeoTensor:
    """Read a WGS-84 bbox window of the source mask."""
    return self.read_polygon(box(*bounds_4326))

`read_window_to_crs(bounds_4326, crs_dst)` ¶

Read a window then reproject the mask to crs_dst.

Source code in georeader/readers/carbonmapper/sources_raster.py

def read_window_to_crs(
    self,
    bounds_4326: BBox,
    crs_dst: str,
) -> GeoTensor:
    """Read a window then reproject the mask to ``crs_dst``."""
    crop = self.read_window(bounds_4326)
    return cast(GeoTensor, read.read_to_crs(crop, crs_dst))

`rasterize_sources(sources, *, transform, shape, crs, buffer_m=0.0)` ¶

Rasterize source points onto a target grid as a binary mask.

Each source contributes a value of 1 at its pixel; if buffer_m > 0 a disk of that radius (in metres) is stamped instead. Sources falling outside the grid are silently dropped.

Delegates to :func:georeader.rasterize.rasterize_from_geopandas.

Parameters¶

sources: Iterable of :class:CMSource, Shapely :class:Point, or (lon, lat) tuples — all interpreted as WGS-84 lon/lat. transform: Affine transform of the target grid. shape: (height, width) of the target grid. crs: CRS of the target grid. Must be projected when buffer_m > 0. buffer_m: Buffer radius in metres applied around each source point. 0 (default) → all_touched single-pixel stamp per source.

Returns¶

GeoTensor 2D mask of shape with values in {0, 1}.

Raises¶

ValueError If buffer_m > 0 and crs is geographic, or if shape is not 2D.

Source code in georeader/readers/carbonmapper/sources_raster.py

def rasterize_sources(
    sources: Iterable[SourceLike],
    *,
    transform: rasterio.Affine,
    shape: tuple[int, int],
    crs: Union[str, rasterio.crs.CRS],
    buffer_m: float = 0.0,
) -> GeoTensor:
    """Rasterize source points onto a target grid as a binary mask.

    Each source contributes a value of ``1`` at its pixel; if
    ``buffer_m > 0`` a disk of that radius (in metres) is stamped
    instead. Sources falling outside the grid are silently dropped.

    Delegates to
    :func:`georeader.rasterize.rasterize_from_geopandas`.

    Parameters
    ----------
    sources:
        Iterable of :class:`CMSource`, Shapely :class:`Point`, or
        ``(lon, lat)`` tuples — all interpreted as WGS-84 lon/lat.
    transform:
        Affine transform of the target grid.
    shape:
        ``(height, width)`` of the target grid.
    crs:
        CRS of the target grid. Must be projected when
        ``buffer_m > 0``.
    buffer_m:
        Buffer radius in metres applied around each source point.
        ``0`` (default) → ``all_touched`` single-pixel stamp per source.

    Returns
    -------
    GeoTensor
        2D mask of ``shape`` with values in ``{0, 1}``.

    Raises
    ------
    ValueError
        If ``buffer_m > 0`` and ``crs`` is geographic, or if ``shape``
        is not 2D.
    """
    if len(shape) != 2:
        raise ValueError(f"Expected (H, W) shape, got {shape}")
    crs_obj = rasterio.crs.CRS.from_user_input(crs)

    gdf = _sources_gdf(sources)
    if len(gdf) == 0:
        return GeoTensor(
            np.zeros(shape, dtype=np.uint8),
            transform=transform, crs=crs_obj, fill_value_default=0,
        )

    if buffer_m > 0:
        gdf = _apply_buffer(gdf, crs_obj, buffer_m)
        all_touched = False
    else:
        gdf = gdf.to_crs(crs_obj)
        all_touched = True  # stamp the pixel containing each point

    height, width = shape
    window_out = rasterio.windows.Window(0, 0, width=width, height=height)
    return cast(
        GeoTensor,
        rasterize_from_geopandas(
            gdf,
            column="value",
            transform=transform,
            window_out=window_out,
            crs_out=crs_obj,
            fill=0,
            all_touched=all_touched,
        ),
    )

`rasterize_sources_like(sources, data_like, *, buffer_m=0.0)` ¶

Rasterize sources onto an existing :class:GeoData grid.

Thin wrapper around :func:georeader.rasterize.rasterize_geopandas_like.

Source code in georeader/readers/carbonmapper/sources_raster.py

def rasterize_sources_like(
    sources: Iterable[SourceLike],
    data_like: GeoData,
    *,
    buffer_m: float = 0.0,
) -> GeoTensor:
    """Rasterize sources onto an existing :class:`GeoData` grid.

    Thin wrapper around
    :func:`georeader.rasterize.rasterize_geopandas_like`.
    """
    crs_obj = rasterio.crs.CRS.from_user_input(data_like.crs)
    gdf = _sources_gdf(sources)
    if len(gdf) == 0:
        return GeoTensor(
            np.zeros(data_like.shape[-2:], dtype=np.uint8),
            transform=data_like.transform,
            crs=crs_obj,
            fill_value_default=0,
        )

    if buffer_m > 0:
        gdf = _apply_buffer(gdf, crs_obj, buffer_m)
        all_touched = False
    else:
        gdf = gdf.to_crs(crs_obj)
        all_touched = True

    return cast(
        GeoTensor,
        rasterize_geopandas_like(
            gdf, data_like=data_like, column="value",
            fill=0, all_touched=all_touched,
        ),
    )

config.py¶

Lightweight credentials and configuration handler for the Carbon Mapper Data Platform API.

Credentials can be supplied in three ways (checked in priority order):

Environment variables — set CARBONMAPPER_TOKEN (access token), CARBONMAPPER_EMAIL and CARBONMAPPER_PASSWORD (login credentials).
Config file — a JSON file at one of the well-known paths listed in :data:CONFIG_SEARCH_PATHS, or a custom path passed to :meth:CarbonMapperConfig.load. The canonical location matches the sibling readers (emit.py / S2_SAFE_reader.py): ~/.georeader/auth_carbonmapper.json.
Explicit arguments — pass token= directly to API functions in download.py.

If no config file exists when :meth:CarbonMapperConfig.load is called without an explicit path and no env-var credentials are set, a placeholder ~/.georeader/auth_carbonmapper.json is auto-created with stub values so users have a clear edit target.

Quick start¶

from georeader.readers.carbonmapper.config import CarbonMapperConfig cfg = CarbonMapperConfig.load() token = cfg.get_token() # resolves from env var or file

— or — store credentials in the default config file:¶

cfg.email = "user@example.com" cfg.password = "s3cret" cfg.save() # writes to ~/.georeader/auth_carbonmapper.json

References¶

API docs : https://api.carbonmapper.org/api/v1/docs
Registration : https://data.carbonmapper.org

`CarbonMapperConfig` ¶

Simple credentials and configuration container for the Carbon Mapper API.

Attributes¶

token: A pre-obtained JWT bearer token. If set, it takes precedence over email / password when :meth:get_token is called. email: Registered Carbon Mapper account e-mail address. password: Account password. Stored only in memory or in the config file on disk — never sent anywhere except the token endpoint. extra: Any additional key/value pairs loaded from or saved to the config file (for forward compatibility).

Examples¶

Load from environment or disk and retrieve a usable token:

cfg = CarbonMapperConfig.load() token = cfg.get_token() # may return None if no credentials found if token: ... data = get_plumes_annotated(plume_gas="CH4", token=token)

Persist credentials to the default config file:

cfg = CarbonMapperConfig(email="user@example.com", password="s3cret") cfg.save()

Reset (delete) the stored config file:

CarbonMapperConfig.reset()

Source code in georeader/readers/carbonmapper/config.py

class CarbonMapperConfig:
    """Simple credentials and configuration container for the Carbon Mapper API.

    Attributes
    ----------
    token:
        A pre-obtained JWT bearer token.  If set, it takes precedence over
        *email* / *password* when :meth:`get_token` is called.
    email:
        Registered Carbon Mapper account e-mail address.
    password:
        Account password.  Stored only in memory or in the config file on
        disk — never sent anywhere except the token endpoint.
    extra:
        Any additional key/value pairs loaded from or saved to the config
        file (for forward compatibility).

    Examples
    --------
    Load from environment or disk and retrieve a usable token:

    >>> cfg = CarbonMapperConfig.load()
    >>> token = cfg.get_token()  # may return None if no credentials found
    >>> if token:
    ...     data = get_plumes_annotated(plume_gas="CH4", token=token)

    Persist credentials to the default config file:

    >>> cfg = CarbonMapperConfig(email="user@example.com", password="s3cret")
    >>> cfg.save()

    Reset (delete) the stored config file:

    >>> CarbonMapperConfig.reset()
    """

    def __init__(
        self,
        *,
        token: str | None = None,
        email: str | None = None,
        password: str | None = None,
        **extra: Any,
    ) -> None:
        self.token = token
        self.email = email
        self.password = password
        self.extra: dict[str, Any] = extra

    # ------------------------------------------------------------------ #
    # Class-level factory / persistence methods                            #
    # ------------------------------------------------------------------ #

    @classmethod
    def from_env(cls) -> "CarbonMapperConfig":
        """Build a :class:`CarbonMapperConfig` purely from environment variables.

        Reads :envvar:`CARBONMAPPER_TOKEN`, :envvar:`CARBONMAPPER_EMAIL`,
        and :envvar:`CARBONMAPPER_PASSWORD`.  Fields that are absent from
        the environment are left as ``None``.

        Returns
        -------
        CarbonMapperConfig
            A new config object populated from the environment.

        Examples
        --------
        >>> import os
        >>> os.environ["CARBONMAPPER_TOKEN"] = "eyJ..."
        >>> cfg = CarbonMapperConfig.from_env()
        >>> cfg.token
        'eyJ...'
        """
        return cls(
            token=os.environ.get(_ENV_TOKEN),
            email=os.environ.get(_ENV_EMAIL),
            password=os.environ.get(_ENV_PASSWORD),
        )

    @classmethod
    def from_file(cls, path: Path | str) -> "CarbonMapperConfig":
        """Load a :class:`CarbonMapperConfig` from a specific JSON file.

        Parameters
        ----------
        path:
            Path to a JSON config file containing any combination of the
            keys ``"token"``, ``"email"``, ``"password"``, plus any extra
            fields.

        Returns
        -------
        CarbonMapperConfig
            Config populated from the file.

        Raises
        ------
        FileNotFoundError
            If *path* does not exist.
        json.JSONDecodeError
            If the file cannot be parsed as JSON.

        Examples
        --------
        >>> cfg = CarbonMapperConfig.from_file("~/.georeader/auth_carbonmapper.json")
        """
        path = Path(path).expanduser().resolve()
        with path.open() as fh:
            data: dict[str, Any] = json.load(fh)
        token = data.pop("token", None)
        email = data.pop("email", None) or data.pop("username", None)
        password = data.pop("password", None)
        # Filter stub values — if the user hasn't yet edited a freshly
        # auto-created placeholder, treat the fields as un-set rather
        # than letting ``"SET-EMAIL"`` flow into has_credentials() as
        # if it were a real value.
        if email == _PLACEHOLDER_EMAIL:
            email = None
        if password == _PLACEHOLDER_PASSWORD:
            password = None
        return cls(token=token, email=email, password=password, **data)

    @classmethod
    def load(
        cls,
        path: Path | str | None = None,
        *,
        create_placeholder: bool = True,
    ) -> "CarbonMapperConfig":
        """Load config using the standard resolution order.

        Resolution order
        ~~~~~~~~~~~~~~~~
        1. If *path* is given, load that file.
        2. Otherwise search :data:`CONFIG_SEARCH_PATHS` for the first file
           that exists.
        3. Overlay environment variables — env values overwrite file values.
        4. If still nothing is configured (no file found, no env vars set)
           AND ``create_placeholder`` is True, write a stub config to
           :data:`DEFAULT_SAVE_PATH` with ``SET-EMAIL`` / ``SET-PASSWORD``
           placeholders so users have a clear edit target. Matches the
           ``emit.py`` / ``S2_SAFE_reader.py`` behaviour.

        Parameters
        ----------
        path:
            Optional explicit path to a config file.  Skips the search
            when provided.
        create_placeholder:
            When ``True`` (default), auto-create a stub config file at
            :data:`DEFAULT_SAVE_PATH` if no credentials could be
            resolved. Set to ``False`` in tests / non-interactive
            contexts to keep the filesystem untouched.

        Returns
        -------
        CarbonMapperConfig
            The resolved config.  Fields without a value (from file *and*
            env) are ``None``.

        Examples
        --------
        >>> cfg = CarbonMapperConfig.load()
        >>> print(cfg.email)   # None if not configured

        >>> cfg = CarbonMapperConfig.load("~/my_project/.carbonmapper.json")
        """
        cfg: CarbonMapperConfig | None = None
        loaded_from_file = False

        # 1. Explicit path
        if path is not None:
            resolved = Path(path).expanduser().resolve()
            if resolved.exists():
                try:
                    cfg = cls.from_file(resolved)
                    loaded_from_file = True
                    logger.debug("Loaded Carbon Mapper config from %s", resolved)
                except Exception as exc:
                    logger.warning("Failed to load config from %s: %s", resolved, exc)
            else:
                logger.warning("Config path %s does not exist; ignoring.", resolved)

        # 2. Search well-known paths
        if cfg is None:
            for candidate in CONFIG_SEARCH_PATHS:
                resolved_candidate = candidate.expanduser().resolve()
                if resolved_candidate.exists():
                    try:
                        cfg = cls.from_file(resolved_candidate)
                        loaded_from_file = True
                        logger.debug("Loaded Carbon Mapper config from %s", resolved_candidate)
                        break
                    except Exception as exc:
                        logger.warning(
                            "Failed to load config from %s: %s",
                            resolved_candidate,
                            exc,
                        )

        if cfg is None:
            cfg = cls()

        # 3. Overlay environment variables (env takes priority over file)
        env_token = os.environ.get(_ENV_TOKEN)
        env_email = os.environ.get(_ENV_EMAIL)
        env_password = os.environ.get(_ENV_PASSWORD)
        if env_token:
            cfg.token = env_token
        if env_email:
            cfg.email = env_email
        if env_password:
            cfg.password = env_password

        # 4. Placeholder — only when caller didn't pass an explicit path,
        #    no config file was found, and env vars didn't supply creds.
        if (
            create_placeholder
            and path is None
            and not loaded_from_file
            and not cfg.has_credentials()
        ):
            _create_placeholder_config()

        return cfg

    # ------------------------------------------------------------------ #
    # Persistence                                                          #
    # ------------------------------------------------------------------ #

    def save(self, path: Path | str | None = None) -> Path:
        """Persist the config to a JSON file.

        Parameters
        ----------
        path:
            Destination file path.  Defaults to
            :data:`DEFAULT_SAVE_PATH`
            (``~/.georeader/auth_carbonmapper.json``), matching the
            sibling-reader convention (emit, S2). User-level location
            outside the working tree so credentials are never
            accidentally committed.

        Returns
        -------
        Path
            The resolved path of the file that was written.

        Examples
        --------
        >>> cfg = CarbonMapperConfig(email="user@example.com", password="s3cret")
        >>> saved_path = cfg.save()
        >>> print(saved_path)
        /home/user/.georeader/auth_carbonmapper.json
        """
        if path is not None:
            dest = Path(path).expanduser().resolve()
        else:
            dest = DEFAULT_SAVE_PATH.expanduser().resolve()
        dest.parent.mkdir(parents=True, exist_ok=True)
        data: dict[str, Any] = {**self.extra}
        if self.token is not None:
            data["token"] = self.token
        if self.email is not None:
            data["email"] = self.email
        if self.password is not None:
            data["password"] = self.password
        dest.write_text(json.dumps(data, indent=2))
        try:
            os.chmod(dest, 0o600)
        except PermissionError:
            logger.warning(
                "Carbon Mapper config saved to %s but restrictive permissions "
                "(0o600) could not be set due to insufficient permissions.",
                dest,
            )
        except OSError as exc:
            logger.warning(
                "Carbon Mapper config saved to %s but setting restrictive "
                "permissions (0o600) failed: %s",
                dest,
                exc,
            )
        logger.info("Carbon Mapper config saved to %s", dest)
        return dest

    @classmethod
    def reset(cls, path: Path | str | None = None) -> None:
        """Delete the stored config file, if it exists.

        Parameters
        ----------
        path:
            Path to the config file to remove.  Defaults to
            :data:`DEFAULT_SAVE_PATH`
            (``~/.georeader/auth_carbonmapper.json``).

        Examples
        --------
        >>> CarbonMapperConfig.reset()  # removes ~/.georeader/auth_carbonmapper.json
        """
        dest = (
            Path(path).expanduser().resolve()
            if path is not None
            else DEFAULT_SAVE_PATH.expanduser().resolve()
        )
        if dest.exists():
            dest.unlink()
            logger.info("Carbon Mapper config removed: %s", dest)
        else:
            logger.debug("No config file to remove at %s", dest)

    # ------------------------------------------------------------------ #
    # Token resolution                                                     #
    # ------------------------------------------------------------------ #

    def get_token(self) -> str | None:
        """Return the best available bearer token.

        If :attr:`token` is set, it is returned directly.  Otherwise
        ``None`` is returned — callers that need a fresh token should call
        :meth:`refresh_access_token` or
        :func:`~georeader.readers.carbonmapper.download.obtain_token`
        with :attr:`email` and :attr:`password`.

        Returns
        -------
        str or None
            A JWT bearer token string, or ``None`` if none is configured.

        Examples
        --------
        >>> cfg = CarbonMapperConfig.load()
        >>> token = cfg.get_token()
        >>> if token is None:
        ...     token = cfg.refresh_access_token()
        """
        return self.token

    def refresh_access_token(self) -> str:
        """Obtain a fresh JWT access token using stored email/password.

        Calls :func:`~georeader.readers.carbonmapper.download.obtain_token` with the
        stored :attr:`email` and :attr:`password`, updates :attr:`token`
        in-place, and returns the new access token.

        Returns
        -------
        str
            The new JWT access token.

        Raises
        ------
        ValueError
            If *email* or *password* is not set.
        requests.HTTPError
            If the Carbon Mapper API rejects the credentials.

        Examples
        --------
        >>> cfg = CarbonMapperConfig.load()  # ~/.georeader/auth_carbonmapper.json
        >>> token = cfg.refresh_access_token()
        """
        if not self.email or not self.password:
            raise ValueError(
                "Cannot refresh token: email and password are required. "
                "Provide them via config file, environment variables, or "
                "constructor arguments."
            )
        from georeader.readers.carbonmapper.download import obtain_token

        tokens = obtain_token(self.email, self.password)
        self.token = tokens["access"]
        self.extra["refresh"] = tokens.get("refresh")
        logger.info("Carbon Mapper access token refreshed for %s", self.email)
        return self.token

    def has_credentials(self) -> bool:
        """Return ``True`` if any usable credentials are present.

        A config is considered to have credentials when at least one of the
        following is set: :attr:`token`, or both :attr:`email` *and*
        :attr:`password`.

        Examples
        --------
        >>> cfg = CarbonMapperConfig(email="u@example.com", password="pw")
        >>> cfg.has_credentials()
        True
        >>> CarbonMapperConfig().has_credentials()
        False
        """
        return bool(self.token) or bool(self.email and self.password)

    # ------------------------------------------------------------------ #
    # String representations                                               #
    # ------------------------------------------------------------------ #

    def __repr__(self) -> str:
        return (
            f"CarbonMapperConfig("
            f"email={self.email!r}, "
            f"has_token={self.token is not None}, "
            f"has_password={self.password is not None}"
            f")"
        )

`from_env()` `classmethod` ¶

Build a :class:CarbonMapperConfig purely from environment variables.

Reads :envvar:CARBONMAPPER_TOKEN, :envvar:CARBONMAPPER_EMAIL, and :envvar:CARBONMAPPER_PASSWORD. Fields that are absent from the environment are left as None.

Returns¶

CarbonMapperConfig A new config object populated from the environment.

Examples¶

import os os.environ["CARBONMAPPER_TOKEN"] = "eyJ..." cfg = CarbonMapperConfig.from_env() cfg.token 'eyJ...'

Source code in georeader/readers/carbonmapper/config.py

@classmethod
def from_env(cls) -> "CarbonMapperConfig":
    """Build a :class:`CarbonMapperConfig` purely from environment variables.

    Reads :envvar:`CARBONMAPPER_TOKEN`, :envvar:`CARBONMAPPER_EMAIL`,
    and :envvar:`CARBONMAPPER_PASSWORD`.  Fields that are absent from
    the environment are left as ``None``.

    Returns
    -------
    CarbonMapperConfig
        A new config object populated from the environment.

    Examples
    --------
    >>> import os
    >>> os.environ["CARBONMAPPER_TOKEN"] = "eyJ..."
    >>> cfg = CarbonMapperConfig.from_env()
    >>> cfg.token
    'eyJ...'
    """
    return cls(
        token=os.environ.get(_ENV_TOKEN),
        email=os.environ.get(_ENV_EMAIL),
        password=os.environ.get(_ENV_PASSWORD),
    )

`from_file(path)` `classmethod` ¶

Load a :class:CarbonMapperConfig from a specific JSON file.

Parameters¶

path: Path to a JSON config file containing any combination of the keys "token", "email", "password", plus any extra fields.

Returns¶

CarbonMapperConfig Config populated from the file.

Raises¶

FileNotFoundError If path does not exist. json.JSONDecodeError If the file cannot be parsed as JSON.

Examples¶

cfg = CarbonMapperConfig.from_file("~/.georeader/auth_carbonmapper.json")

Source code in georeader/readers/carbonmapper/config.py

@classmethod
def from_file(cls, path: Path | str) -> "CarbonMapperConfig":
    """Load a :class:`CarbonMapperConfig` from a specific JSON file.

    Parameters
    ----------
    path:
        Path to a JSON config file containing any combination of the
        keys ``"token"``, ``"email"``, ``"password"``, plus any extra
        fields.

    Returns
    -------
    CarbonMapperConfig
        Config populated from the file.

    Raises
    ------
    FileNotFoundError
        If *path* does not exist.
    json.JSONDecodeError
        If the file cannot be parsed as JSON.

    Examples
    --------
    >>> cfg = CarbonMapperConfig.from_file("~/.georeader/auth_carbonmapper.json")
    """
    path = Path(path).expanduser().resolve()
    with path.open() as fh:
        data: dict[str, Any] = json.load(fh)
    token = data.pop("token", None)
    email = data.pop("email", None) or data.pop("username", None)
    password = data.pop("password", None)
    # Filter stub values — if the user hasn't yet edited a freshly
    # auto-created placeholder, treat the fields as un-set rather
    # than letting ``"SET-EMAIL"`` flow into has_credentials() as
    # if it were a real value.
    if email == _PLACEHOLDER_EMAIL:
        email = None
    if password == _PLACEHOLDER_PASSWORD:
        password = None
    return cls(token=token, email=email, password=password, **data)

`get_token()` ¶

Return the best available bearer token.

If :attr:token is set, it is returned directly. Otherwise None is returned — callers that need a fresh token should call :meth:refresh_access_token or :func:~georeader.readers.carbonmapper.download.obtain_token with :attr:email and :attr:password.

Returns¶

str or None A JWT bearer token string, or None if none is configured.

Examples¶

cfg = CarbonMapperConfig.load() token = cfg.get_token() if token is None: ... token = cfg.refresh_access_token()

Source code in georeader/readers/carbonmapper/config.py

def get_token(self) -> str | None:
    """Return the best available bearer token.

    If :attr:`token` is set, it is returned directly.  Otherwise
    ``None`` is returned — callers that need a fresh token should call
    :meth:`refresh_access_token` or
    :func:`~georeader.readers.carbonmapper.download.obtain_token`
    with :attr:`email` and :attr:`password`.

    Returns
    -------
    str or None
        A JWT bearer token string, or ``None`` if none is configured.

    Examples
    --------
    >>> cfg = CarbonMapperConfig.load()
    >>> token = cfg.get_token()
    >>> if token is None:
    ...     token = cfg.refresh_access_token()
    """
    return self.token

`has_credentials()` ¶

Return True if any usable credentials are present.

A config is considered to have credentials when at least one of the following is set: :attr:token, or both :attr:email and :attr:password.

Examples¶

cfg = CarbonMapperConfig(email="u@example.com", password="pw") cfg.has_credentials() True CarbonMapperConfig().has_credentials() False

Source code in georeader/readers/carbonmapper/config.py

def has_credentials(self) -> bool:
    """Return ``True`` if any usable credentials are present.

    A config is considered to have credentials when at least one of the
    following is set: :attr:`token`, or both :attr:`email` *and*
    :attr:`password`.

    Examples
    --------
    >>> cfg = CarbonMapperConfig(email="u@example.com", password="pw")
    >>> cfg.has_credentials()
    True
    >>> CarbonMapperConfig().has_credentials()
    False
    """
    return bool(self.token) or bool(self.email and self.password)

`load(path=None, *, create_placeholder=True)` `classmethod` ¶

Load config using the standard resolution order.

Resolution order ~~~~~~~~~~~~~~~~ 1. If path is given, load that file. 2. Otherwise search :data:CONFIG_SEARCH_PATHS for the first file that exists. 3. Overlay environment variables — env values overwrite file values. 4. If still nothing is configured (no file found, no env vars set) AND create_placeholder is True, write a stub config to :data:DEFAULT_SAVE_PATH with SET-EMAIL / SET-PASSWORD placeholders so users have a clear edit target. Matches the emit.py / S2_SAFE_reader.py behaviour.

Parameters¶

path: Optional explicit path to a config file. Skips the search when provided. create_placeholder: When True (default), auto-create a stub config file at :data:DEFAULT_SAVE_PATH if no credentials could be resolved. Set to False in tests / non-interactive contexts to keep the filesystem untouched.

Returns¶

CarbonMapperConfig The resolved config. Fields without a value (from file and env) are None.

Examples¶

cfg = CarbonMapperConfig.load() print(cfg.email) # None if not configured

cfg = CarbonMapperConfig.load("~/my_project/.carbonmapper.json")

Source code in georeader/readers/carbonmapper/config.py

@classmethod
def load(
    cls,
    path: Path | str | None = None,
    *,
    create_placeholder: bool = True,
) -> "CarbonMapperConfig":
    """Load config using the standard resolution order.

    Resolution order
    ~~~~~~~~~~~~~~~~
    1. If *path* is given, load that file.
    2. Otherwise search :data:`CONFIG_SEARCH_PATHS` for the first file
       that exists.
    3. Overlay environment variables — env values overwrite file values.
    4. If still nothing is configured (no file found, no env vars set)
       AND ``create_placeholder`` is True, write a stub config to
       :data:`DEFAULT_SAVE_PATH` with ``SET-EMAIL`` / ``SET-PASSWORD``
       placeholders so users have a clear edit target. Matches the
       ``emit.py`` / ``S2_SAFE_reader.py`` behaviour.

    Parameters
    ----------
    path:
        Optional explicit path to a config file.  Skips the search
        when provided.
    create_placeholder:
        When ``True`` (default), auto-create a stub config file at
        :data:`DEFAULT_SAVE_PATH` if no credentials could be
        resolved. Set to ``False`` in tests / non-interactive
        contexts to keep the filesystem untouched.

    Returns
    -------
    CarbonMapperConfig
        The resolved config.  Fields without a value (from file *and*
        env) are ``None``.

    Examples
    --------
    >>> cfg = CarbonMapperConfig.load()
    >>> print(cfg.email)   # None if not configured

    >>> cfg = CarbonMapperConfig.load("~/my_project/.carbonmapper.json")
    """
    cfg: CarbonMapperConfig | None = None
    loaded_from_file = False

    # 1. Explicit path
    if path is not None:
        resolved = Path(path).expanduser().resolve()
        if resolved.exists():
            try:
                cfg = cls.from_file(resolved)
                loaded_from_file = True
                logger.debug("Loaded Carbon Mapper config from %s", resolved)
            except Exception as exc:
                logger.warning("Failed to load config from %s: %s", resolved, exc)
        else:
            logger.warning("Config path %s does not exist; ignoring.", resolved)

    # 2. Search well-known paths
    if cfg is None:
        for candidate in CONFIG_SEARCH_PATHS:
            resolved_candidate = candidate.expanduser().resolve()
            if resolved_candidate.exists():
                try:
                    cfg = cls.from_file(resolved_candidate)
                    loaded_from_file = True
                    logger.debug("Loaded Carbon Mapper config from %s", resolved_candidate)
                    break
                except Exception as exc:
                    logger.warning(
                        "Failed to load config from %s: %s",
                        resolved_candidate,
                        exc,
                    )

    if cfg is None:
        cfg = cls()

    # 3. Overlay environment variables (env takes priority over file)
    env_token = os.environ.get(_ENV_TOKEN)
    env_email = os.environ.get(_ENV_EMAIL)
    env_password = os.environ.get(_ENV_PASSWORD)
    if env_token:
        cfg.token = env_token
    if env_email:
        cfg.email = env_email
    if env_password:
        cfg.password = env_password

    # 4. Placeholder — only when caller didn't pass an explicit path,
    #    no config file was found, and env vars didn't supply creds.
    if (
        create_placeholder
        and path is None
        and not loaded_from_file
        and not cfg.has_credentials()
    ):
        _create_placeholder_config()

    return cfg

`refresh_access_token()` ¶

Obtain a fresh JWT access token using stored email/password.

Calls :func:~georeader.readers.carbonmapper.download.obtain_token with the stored :attr:email and :attr:password, updates :attr:token in-place, and returns the new access token.

Returns¶

str The new JWT access token.

Raises¶

ValueError If email or password is not set. requests.HTTPError If the Carbon Mapper API rejects the credentials.

Examples¶

cfg = CarbonMapperConfig.load() # ~/.georeader/auth_carbonmapper.json token = cfg.refresh_access_token()

Source code in georeader/readers/carbonmapper/config.py

def refresh_access_token(self) -> str:
    """Obtain a fresh JWT access token using stored email/password.

    Calls :func:`~georeader.readers.carbonmapper.download.obtain_token` with the
    stored :attr:`email` and :attr:`password`, updates :attr:`token`
    in-place, and returns the new access token.

    Returns
    -------
    str
        The new JWT access token.

    Raises
    ------
    ValueError
        If *email* or *password* is not set.
    requests.HTTPError
        If the Carbon Mapper API rejects the credentials.

    Examples
    --------
    >>> cfg = CarbonMapperConfig.load()  # ~/.georeader/auth_carbonmapper.json
    >>> token = cfg.refresh_access_token()
    """
    if not self.email or not self.password:
        raise ValueError(
            "Cannot refresh token: email and password are required. "
            "Provide them via config file, environment variables, or "
            "constructor arguments."
        )
    from georeader.readers.carbonmapper.download import obtain_token

    tokens = obtain_token(self.email, self.password)
    self.token = tokens["access"]
    self.extra["refresh"] = tokens.get("refresh")
    logger.info("Carbon Mapper access token refreshed for %s", self.email)
    return self.token

`reset(path=None)` `classmethod` ¶

Delete the stored config file, if it exists.

Parameters¶

path: Path to the config file to remove. Defaults to :data:DEFAULT_SAVE_PATH (~/.georeader/auth_carbonmapper.json).

Examples¶

CarbonMapperConfig.reset() # removes ~/.georeader/auth_carbonmapper.json

Source code in georeader/readers/carbonmapper/config.py

@classmethod
def reset(cls, path: Path | str | None = None) -> None:
    """Delete the stored config file, if it exists.

    Parameters
    ----------
    path:
        Path to the config file to remove.  Defaults to
        :data:`DEFAULT_SAVE_PATH`
        (``~/.georeader/auth_carbonmapper.json``).

    Examples
    --------
    >>> CarbonMapperConfig.reset()  # removes ~/.georeader/auth_carbonmapper.json
    """
    dest = (
        Path(path).expanduser().resolve()
        if path is not None
        else DEFAULT_SAVE_PATH.expanduser().resolve()
    )
    if dest.exists():
        dest.unlink()
        logger.info("Carbon Mapper config removed: %s", dest)
    else:
        logger.debug("No config file to remove at %s", dest)

`save(path=None)` ¶

Persist the config to a JSON file.

Parameters¶

path: Destination file path. Defaults to :data:DEFAULT_SAVE_PATH (~/.georeader/auth_carbonmapper.json), matching the sibling-reader convention (emit, S2). User-level location outside the working tree so credentials are never accidentally committed.

Returns¶

Path The resolved path of the file that was written.

Examples¶

cfg = CarbonMapperConfig(email="user@example.com", password="s3cret") saved_path = cfg.save() print(saved_path) /home/user/.georeader/auth_carbonmapper.json

Source code in georeader/readers/carbonmapper/config.py

def save(self, path: Path | str | None = None) -> Path:
    """Persist the config to a JSON file.

    Parameters
    ----------
    path:
        Destination file path.  Defaults to
        :data:`DEFAULT_SAVE_PATH`
        (``~/.georeader/auth_carbonmapper.json``), matching the
        sibling-reader convention (emit, S2). User-level location
        outside the working tree so credentials are never
        accidentally committed.

    Returns
    -------
    Path
        The resolved path of the file that was written.

    Examples
    --------
    >>> cfg = CarbonMapperConfig(email="user@example.com", password="s3cret")
    >>> saved_path = cfg.save()
    >>> print(saved_path)
    /home/user/.georeader/auth_carbonmapper.json
    """
    if path is not None:
        dest = Path(path).expanduser().resolve()
    else:
        dest = DEFAULT_SAVE_PATH.expanduser().resolve()
    dest.parent.mkdir(parents=True, exist_ok=True)
    data: dict[str, Any] = {**self.extra}
    if self.token is not None:
        data["token"] = self.token
    if self.email is not None:
        data["email"] = self.email
    if self.password is not None:
        data["password"] = self.password
    dest.write_text(json.dumps(data, indent=2))
    try:
        os.chmod(dest, 0o600)
    except PermissionError:
        logger.warning(
            "Carbon Mapper config saved to %s but restrictive permissions "
            "(0o600) could not be set due to insufficient permissions.",
            dest,
        )
    except OSError as exc:
        logger.warning(
            "Carbon Mapper config saved to %s but setting restrictive "
            "permissions (0o600) failed: %s",
            dest,
            exc,
        )
    logger.info("Carbon Mapper config saved to %s", dest)
    return dest

download.py¶

Carbon Mapper Data Platform API client for the marsml pipeline.

Provides typed wrappers around three Carbon Mapper APIs:

1. **REST Catalog API**  — plumes, sources, scenes, plume CSV, assets
2. **STAC API**          — spatiotemporal search across collections
3. **Asset Download**    — GeoTIFF retrievals, RGB imagery, plume PNGs

Authentication¶

Most read endpoints work without a token, but some (scenes, related plumes, STAC tokens) require a Bearer token. Use :func:obtain_token or :meth:~georeader.readers.carbonmapper.config.CarbonMapperConfig.refresh_access_token to obtain one from credentials in ~/.georeader/auth_carbonmapper.json.

References¶

API Docs : https://api.carbonmapper.org/api/v1/docs
STAC Root : https://api.carbonmapper.org/api/v1/stac/
Registration : https://data.carbonmapper.org

`obtain_token(email, password)` ¶

Exchange credentials for a JWT access/refresh token pair.

Parameters¶

email: Registered Carbon Mapper account e-mail address. password: Account password.

Returns¶

dict A mapping with at least two keys:

- ``"access"``  — short-lived JWT bearer token (use in API calls).
- ``"refresh"`` — long-lived refresh token (use with :func:`refresh_token`).

Examples¶

tokens = obtain_token("user@example.com", "s3cret") access_token = tokens["access"] data = get_plumes_annotated(plume_gas="CH4", limit=5, token=access_token)

Source code in georeader/readers/carbonmapper/download.py

def obtain_token(email: str, password: str) -> dict:
    """
    Exchange credentials for a JWT access/refresh token pair.

    Parameters
    ----------
    email:
        Registered Carbon Mapper account e-mail address.
    password:
        Account password.

    Returns
    -------
    dict
        A mapping with at least two keys:

        - ``"access"``  — short-lived JWT bearer token (use in API calls).
        - ``"refresh"`` — long-lived refresh token (use with :func:`refresh_token`).

    Examples
    --------
    >>> tokens = obtain_token("user@example.com", "s3cret")
    >>> access_token = tokens["access"]
    >>> data = get_plumes_annotated(plume_gas="CH4", limit=5, token=access_token)
    """
    return _post(f"{BASE_URL}/token/pair", {"email": email, "password": password})

`refresh_token(refresh)` ¶

Refresh an expired access token using a refresh token.

Parameters¶

refresh: The "refresh" value previously returned by :func:obtain_token.

Returns¶

dict A mapping with a new "access" token (and optionally a new "refresh" token if the server rotates them).

Examples¶

tokens = obtain_token("user@example.com", "s3cret") new_tokens = refresh_token(tokens["refresh"]) access_token = new_tokens["access"]

Source code in georeader/readers/carbonmapper/download.py

def refresh_token(refresh: str) -> dict:
    """
    Refresh an expired access token using a refresh token.

    Parameters
    ----------
    refresh:
        The ``"refresh"`` value previously returned by :func:`obtain_token`.

    Returns
    -------
    dict
        A mapping with a new ``"access"`` token (and optionally a new
        ``"refresh"`` token if the server rotates them).

    Examples
    --------
    >>> tokens = obtain_token("user@example.com", "s3cret")
    >>> new_tokens = refresh_token(tokens["refresh"])
    >>> access_token = new_tokens["access"]
    """
    return _post(f"{BASE_URL}/token/refresh", {"refresh": refresh})

`download_asset(asset_key, dest, token=None)` ¶

Download a raster asset (GeoTIFF or PNG) by its storage key.

Parameters¶

asset_key: The path portion after /catalog/asset/. For example::

    l2b-ch4-mf-v1/2016/10/08/ang20161008t211637/ang20161008t211637_l2b-ch4-mf-v1_cmf.tif

Asset keys are available in STAC item ``assets[name]["href"]``
entries and can be derived from the plume ``plume_tif`` /
``con_tif`` / ``rgb_tif`` URLs.

dest: Local file path where the asset will be written. Parent directories are created automatically. token: Optional Bearer token for authenticated access.

Returns¶

Path The resolved path of the downloaded file.

Examples¶

download_asset( ... "l2b-ch4-mf-v1/2016/10/08/ang20161008t211637/ang20161008t211637_l2b-ch4-mf-v1_cmf.tif", ... dest="./retrieval.tif", ... )

.. note:: For plume dicts returned by :func:get_plumes_annotated, prefer :func:download_plume_assets which handles all assets at once.

Source code in georeader/readers/carbonmapper/download.py

def download_asset(asset_key: str, dest: Path | str, token: str | None = None) -> Path:
    """
    Download a raster asset (GeoTIFF or PNG) by its storage key.

    Parameters
    ----------
    asset_key:
        The path portion after ``/catalog/asset/``.  For example::

            l2b-ch4-mf-v1/2016/10/08/ang20161008t211637/ang20161008t211637_l2b-ch4-mf-v1_cmf.tif

        Asset keys are available in STAC item ``assets[name]["href"]``
        entries and can be derived from the plume ``plume_tif`` /
        ``con_tif`` / ``rgb_tif`` URLs.
    dest:
        Local file path where the asset will be written.  Parent
        directories are created automatically.
    token:
        Optional Bearer token for authenticated access.

    Returns
    -------
    Path
        The resolved path of the downloaded file.

    Examples
    --------
    >>> download_asset(
    ...     "l2b-ch4-mf-v1/2016/10/08/ang20161008t211637/ang20161008t211637_l2b-ch4-mf-v1_cmf.tif",
    ...     dest="./retrieval.tif",
    ... )

    .. note::
        For plume dicts returned by :func:`get_plumes_annotated`, prefer
        :func:`download_plume_assets` which handles all assets at once.
    """
    dest = Path(dest)
    url = f"{CATALOG_URL}/asset/{asset_key}"
    resp = requests.get(url, headers=_headers(token), timeout=120, stream=True)
    resp.raise_for_status()
    dest.parent.mkdir(parents=True, exist_ok=True)
    with open(dest, "wb") as f:
        for chunk in resp.iter_content(chunk_size=8192):
            f.write(chunk)
    logger.info("Downloaded %s → %s (%d bytes)", asset_key, dest, dest.stat().st_size)
    return dest

`download_plume_assets(plume, dest_dir)` ¶

Download all available raster assets for a single plume.

Given a plume dict returned by :func:get_plumes_annotated, download every non-null asset URL into dest_dir and return a mapping of asset type to local file path.

Parameters¶

plume: A single plume dict as returned in get_plumes_annotated()["items"]. The function inspects the "plume_png", "plume_tif", "con_tif", "rgb_png", "rgb_tif", and "plume_rgb_png" keys for download URLs. dest_dir: Directory into which assets are downloaded. Created automatically if it does not already exist.

Returns¶

dict[str, Path] Mapping of asset type → local :class:~pathlib.Path for each successfully downloaded asset. Assets that are missing (null) or whose download fails are omitted. Example::

    {
        "plume_png": Path("./plumes/emi20240420t101448p07050-A_plume.png"),
        "plume_tif": Path("./plumes/emi20240420t101448p07050-A_plume.tif"),
        "con_tif": Path("./plumes/emi20240420t101448p07050-A_con.tif"),
        "rgb_png": Path("./plumes/emi20240420t101448p07050-A_rgb.png"),
    }

Examples¶

.. code-block:: python

result = get_plumes_annotated(plume_gas="CH4", limit=1, qualities=["good"])
plume = result["items"][0]
downloaded = download_plume_assets(plume, "./plume_data/")
for asset_type, path in downloaded.items():
    print(asset_type, "→", path)

Source code in georeader/readers/carbonmapper/download.py

def download_plume_assets(plume: dict, dest_dir: Path | str) -> dict[str, Path]:
    """
    Download all available raster assets for a single plume.

    Given a plume dict returned by :func:`get_plumes_annotated`, download
    every non-null asset URL into *dest_dir* and return a mapping of asset
    type to local file path.

    Parameters
    ----------
    plume:
        A single plume dict as returned in
        ``get_plumes_annotated()["items"]``.  The function inspects the
        ``"plume_png"``, ``"plume_tif"``, ``"con_tif"``, ``"rgb_png"``,
        ``"rgb_tif"``, and ``"plume_rgb_png"`` keys for download URLs.
    dest_dir:
        Directory into which assets are downloaded.  Created automatically
        if it does not already exist.

    Returns
    -------
    dict[str, Path]
        Mapping of asset type → local :class:`~pathlib.Path` for each
        successfully downloaded asset.  Assets that are missing (``null``)
        or whose download fails are omitted.  Example::

            {
                "plume_png": Path("./plumes/emi20240420t101448p07050-A_plume.png"),
                "plume_tif": Path("./plumes/emi20240420t101448p07050-A_plume.tif"),
                "con_tif": Path("./plumes/emi20240420t101448p07050-A_con.tif"),
                "rgb_png": Path("./plumes/emi20240420t101448p07050-A_rgb.png"),
            }

    Examples
    --------
    .. code-block:: python

        result = get_plumes_annotated(plume_gas="CH4", limit=1, qualities=["good"])
        plume = result["items"][0]
        downloaded = download_plume_assets(plume, "./plume_data/")
        for asset_type, path in downloaded.items():
            print(asset_type, "→", path)
    """
    dest_dir = Path(dest_dir)
    dest_dir.mkdir(parents=True, exist_ok=True)
    plume_name = plume.get("plume_id", "unknown")
    downloaded: dict[str, Path] = {}

    asset_keys = ["plume_png", "plume_tif", "con_tif", "rgb_png", "rgb_tif", "plume_rgb_png"]
    for key in asset_keys:
        url = plume.get(key)
        if not url:
            continue
        suffix = ".tif" if key.endswith("tif") else ".png"
        short = key.replace("_tif", "").replace("_png", "")
        local = dest_dir / f"{plume_name}_{short}{suffix}"
        try:
            resp = requests.get(url, timeout=120, stream=True)
            resp.raise_for_status()
            with open(local, "wb") as f:
                for chunk in resp.iter_content(8192):
                    f.write(chunk)
            downloaded[key] = local
            logger.info("  %s → %s", key, local)
        except requests.RequestException as exc:
            logger.warning("  %s download failed: %s", key, exc)
    return downloaded

`stac_search(*, collections=None, bbox=None, datetime_range=None, ids=None, limit=10, token=None)` ¶

Cross-collection STAC item search.

Searches across one or more STAC collections using spatial and temporal filters and returns matching items as a GeoJSON FeatureCollection.

Parameters¶

collections: List of STAC collection IDs to search. If None, all collections are searched. Example: ["l2b-ch4-mfa-v3", "l4a-combined-ch4-v3a"]. bbox: Bounding-box spatial filter as (west_lon, south_lat, east_lon, north_lat) in WGS 84. datetime_range: RFC 3339 time interval string, e.g. "2024-01-01T00:00:00Z/2024-06-01T00:00:00Z". limit: Maximum number of items to return. token: Optional Bearer token for authenticated requests.

Returns¶

dict A GeoJSON FeatureCollection mapping. Key fields:

- ``"type"``     — ``"FeatureCollection"``.
- ``"features"`` — list of STAC item GeoJSON Features.  Each
  Feature has:

  - ``"id"``         — item ID.
  - ``"geometry"``   — GeoJSON geometry of the scene footprint.
  - ``"properties"`` — item metadata (datetime, collection, etc.).
  - ``"assets"``     — dict of named assets, each with an
    ``"href"`` download URL and media type.

- ``"context"``  — pagination info (``matched``, ``returned``).

Examples¶

Search CH4 retrievals in the Permian Basin:

result = stac_search( ... collections=["l2b-ch4-mfa-v3"], ... bbox=(-104.5, 31.0, -101.5, 33.5), ... datetime_range="2024-01-01T00:00:00Z/2024-06-01T00:00:00Z", ... limit=5, ... ) for feat in result["features"]: ... print(feat["id"], list(feat["assets"].keys()))

Search across multiple collections simultaneously:

result = stac_search( ... collections=["l4a-combined-ch4-v3a", "l2b-rgb-v3a"], ... bbox=(-104.5, 31.0, -101.5, 33.5), ... limit=3, ... )

Source code in georeader/readers/carbonmapper/download.py

def stac_search(
    *,
    collections: list[str] | None = None,
    bbox: tuple[float, float, float, float] | None = None,
    datetime_range: str | None = None,
    ids: list[str] | None = None,
    limit: int = 10,
    token: str | None = None,
) -> dict:
    """
    Cross-collection STAC item search.

    Searches across one or more STAC collections using spatial and
    temporal filters and returns matching items as a GeoJSON
    FeatureCollection.

    Parameters
    ----------
    collections:
        List of STAC collection IDs to search.  If ``None``, all
        collections are searched.  Example:
        ``["l2b-ch4-mfa-v3", "l4a-combined-ch4-v3a"]``.
    bbox:
        Bounding-box spatial filter as
        ``(west_lon, south_lat, east_lon, north_lat)`` in WGS 84.
    datetime_range:
        RFC 3339 time interval string, e.g.
        ``"2024-01-01T00:00:00Z/2024-06-01T00:00:00Z"``.
    limit:
        Maximum number of items to return.
    token:
        Optional Bearer token for authenticated requests.

    Returns
    -------
    dict
        A GeoJSON FeatureCollection mapping.  Key fields:

        - ``"type"``     — ``"FeatureCollection"``.
        - ``"features"`` — list of STAC item GeoJSON Features.  Each
          Feature has:

          - ``"id"``         — item ID.
          - ``"geometry"``   — GeoJSON geometry of the scene footprint.
          - ``"properties"`` — item metadata (datetime, collection, etc.).
          - ``"assets"``     — dict of named assets, each with an
            ``"href"`` download URL and media type.

        - ``"context"``  — pagination info (``matched``, ``returned``).

    Examples
    --------
    Search CH4 retrievals in the Permian Basin:

    >>> result = stac_search(
    ...     collections=["l2b-ch4-mfa-v3"],
    ...     bbox=(-104.5, 31.0, -101.5, 33.5),
    ...     datetime_range="2024-01-01T00:00:00Z/2024-06-01T00:00:00Z",
    ...     limit=5,
    ... )
    >>> for feat in result["features"]:
    ...     print(feat["id"], list(feat["assets"].keys()))

    Search across multiple collections simultaneously:

    >>> result = stac_search(
    ...     collections=["l4a-combined-ch4-v3a", "l2b-rgb-v3a"],
    ...     bbox=(-104.5, 31.0, -101.5, 33.5),
    ...     limit=3,
    ... )
    """
    params: dict[str, Any] = {"limit": limit}
    if collections:
        params["collections"] = ",".join(collections)
    if ids:
        params["ids"] = ",".join(ids)
    params.update(_stac_bbox_param(bbox))
    if datetime_range:
        params["datetime"] = datetime_range
    return cast(dict, _get(f"{STAC_URL}/search", params=params, token=token))

`stac_get_items(collection_id, *, limit=10, bbox=None, datetime_range=None, token=None)` ¶

Get items from a STAC collection (OGC API Features compliant).

Parameters¶

collection_id: Identifier of the STAC collection to query, e.g. "l4a-combined-ch4-v3a" or "l2b-rgb-v3a". limit: Maximum number of items to return. bbox: Bounding-box spatial filter as (west_lon, south_lat, east_lon, north_lat) in WGS 84. datetime_range: RFC 3339 time interval string. token: Optional Bearer token for authenticated requests.

Returns¶

dict A GeoJSON FeatureCollection. Each Feature has "assets" containing download links for GeoTIFFs, PNGs, and other raster products, with "href" and media-type annotations.

Examples¶

items = stac_get_items("l4a-combined-ch4-v3a", limit=5) for feat in items["features"]: ... print(feat["id"], feat.get("properties", {}).get("datetime"))

Source code in georeader/readers/carbonmapper/download.py

def stac_get_items(
    collection_id: str,
    *,
    limit: int = 10,
    bbox: tuple[float, float, float, float] | None = None,
    datetime_range: str | None = None,
    token: str | None = None,
) -> dict:
    """
    Get items from a STAC collection (OGC API Features compliant).

    Parameters
    ----------
    collection_id:
        Identifier of the STAC collection to query, e.g.
        ``"l4a-combined-ch4-v3a"`` or ``"l2b-rgb-v3a"``.
    limit:
        Maximum number of items to return.
    bbox:
        Bounding-box spatial filter as
        ``(west_lon, south_lat, east_lon, north_lat)`` in WGS 84.
    datetime_range:
        RFC 3339 time interval string.
    token:
        Optional Bearer token for authenticated requests.

    Returns
    -------
    dict
        A GeoJSON FeatureCollection.  Each Feature has ``"assets"``
        containing download links for GeoTIFFs, PNGs, and other raster
        products, with ``"href"`` and media-type annotations.

    Examples
    --------
    >>> items = stac_get_items("l4a-combined-ch4-v3a", limit=5)
    >>> for feat in items["features"]:
    ...     print(feat["id"], feat.get("properties", {}).get("datetime"))
    """
    params: dict[str, Any] = {"limit": limit}
    params.update(_stac_bbox_param(bbox))
    if datetime_range:
        params["datetime"] = datetime_range
    return cast(dict, _get(f"{STAC_URL}/collections/{collection_id}/items", params=params, token=token))

Satellite Data Readers¶

Sentinel-2 Reader¶

API Reference¶

Sentinel-2 Product Levels¶

Spectral Bands¶

Data Access¶

Quick Start Examples¶

Classes¶

See Also¶

References¶

S2Image ¶

__init__(s2folder, polygon=None, granules=None, out_res=10, window_focus=None, bands=None, metadata_msi=None) ¶

cache_product_to_local_dir(path_dest=None, print_progress=True, format_bands=None) ¶

get_reader(band_names, overview_level=None) ¶

quantification_value() ¶

read_from_band_names(band_names) ¶

solar_irradiance() ¶

S2ImageL1C ¶

cache_product_to_local_dir(path_dest=None, print_progress=True, format_bands=None) ¶

read_metadata_tl() ¶

S2ImageL2A ¶

s2loader(s2folder, out_res=10, bands=None, window_focus=None, granules=None, polygon=None, metadata_msi=None) ¶

s2_public_bucket_path(s2file, check_exists=False, mode='gcp') ¶

read_srf(satellite, srf_file=SRF_FILE_DEFAULT, cache=True) ¶

Proba-V Reader¶

API Reference¶

ProbaV ¶

load_mask(boundless=True) ¶

load_sm(boundless=True) ¶

save_bands(img) ¶

ProbaVRadiometry ¶

ProbaVSM ¶

SPOT-VGT Reader¶

API Reference¶

SpotVGT ¶

load_mask(boundless=True) ¶

load_sm(boundless=True) ¶

PRISMA Reader¶

API Reference¶

Data Format Overview¶

Dual-Sensor Configuration¶

Radiometric Units¶

Spectral Characteristics¶

Examples¶

See Also¶

References¶

PRISMA ¶

PRISMA Data Model¶

Dual Sensor Architecture¶

Attributes¶

Lazy-Loaded Attributes¶

Examples¶

See Also¶

References¶

load_raw(swir_flag) ¶

load_wavelengths(wavelengths, as_reflectance=True, raw=True, resolution_dst=30, dst_crs=None, fill_value_default=-1) ¶

EMIT Reader¶

API Reference¶

Data Format Overview¶

GLT Orthorectification Process¶

Radiometric Units¶

Key Classes and Functions¶

Requirements¶

Examples¶

References¶

EMITImage ¶

EMIT Data Model¶

Spectral Characteristics¶

Attributes¶

Lazy-Loaded Properties¶

Examples¶

See Also¶

References¶

mask_bands property ¶

mean_sza property ¶

mean_vza property ¶

nc_ds_l2amask property ¶

nc_ds_obs property ¶

observation_bands property ¶

percentage_clear property ¶

`S2Image` ¶

`init(s2folder, polygon=None, granules=None, out_res=10, window_focus=None, bands=None, metadata_msi=None)` ¶

`cache_product_to_local_dir(path_dest=None, print_progress=True, format_bands=None)` ¶

`get_reader(band_names, overview_level=None)` ¶

`quantification_value()` ¶

`read_from_band_names(band_names)` ¶

`solar_irradiance()` ¶

`S2ImageL1C` ¶

`cache_product_to_local_dir(path_dest=None, print_progress=True, format_bands=None)` ¶

`read_metadata_tl()` ¶

`S2ImageL2A` ¶

`s2loader(s2folder, out_res=10, bands=None, window_focus=None, granules=None, polygon=None, metadata_msi=None)` ¶

`s2_public_bucket_path(s2file, check_exists=False, mode='gcp')` ¶

`read_srf(satellite, srf_file=SRF_FILE_DEFAULT, cache=True)` ¶

`ProbaV` ¶

`load_mask(boundless=True)` ¶

`load_sm(boundless=True)` ¶

`save_bands(img)` ¶

`ProbaVRadiometry` ¶

`ProbaVSM` ¶

`SpotVGT` ¶

`load_mask(boundless=True)` ¶

`load_sm(boundless=True)` ¶

`PRISMA` ¶

`load_raw(swir_flag)` ¶

`load_wavelengths(wavelengths, as_reflectance=True, raw=True, resolution_dst=30, dst_crs=None, fill_value_default=-1)` ¶

`EMITImage` ¶

`mask_bands` `property` ¶

`mean_sza` `property` ¶

`mean_vza` `property` ¶

`nc_ds_l2amask` `property` ¶

`nc_ds_obs` `property` ¶

`observation_bands` `property` ¶

`percentage_clear` `property` ¶

`shape_raw` `property` ¶

`clear_radiance_cache()` ¶

`footprint(crs=None)` ¶

`georreference(data, fill_value_default=None)` ¶

`invalid_mask_raw(with_buffer=True)` ¶

`load_raw(transpose=True)` ¶

`mask(mask_name='cloud_mask')` ¶

`observation(name)` ¶

`set_band_selection(band_selection=None)` ¶

`sza()` ¶

`to_crs(crs='UTM', resolution_dst_crs=60)` ¶

`validmask(with_buffer=True)` ¶

`vza()` ¶

`water_mask()` ¶

`download_product(link_down, filename=None, display_progress_bar=True, auth=None)` ¶

`get_radiance_link(product_path)` ¶

`get_obs_link(product_path)` ¶

`get_ch4enhancement_link(tile)` ¶

`get_l2amask_link(tile)` ¶

`valid_mask(filename, with_buffer=False, dst_crs='UTM', resolution_dst_crs=60)` ¶

`EnMAP` ¶

`clear_radiance_cache()` ¶

`load_rgb(as_reflectance=True, apply_rpcs=True, dst_crs='EPSG:4326', resolution_dst_crs=None)` ¶

`load_wavelengths(wavelengths, as_reflectance=True)` ¶

`CMTileItem` `dataclass` ¶

`from_stac_item(item)` `classmethod` ¶

`CMAPIError` ¶

`CMPlumeNotFound` ¶

`CMSceneNotPublished` ¶

`CMSourceNotFound` ¶

`get_tile(token, scene_id, *, collection=DEFAULT_L2B_COLLECTION)` ¶

`get_plume(token, plume_id)` ¶