Skip to content

Satellite Data Readers

This module provides specialized readers for various optical satellite missions. All these readers implement the GeoData protocol, which means they provide a consistent interface for spatial operations, data access, and manipulation.

These readers make it easy to work with official data formats from different Earth observation missions, and they can be used with all the functions available in the georeader.read module.

Readers available:

Sentinel-2 Reader

The Sentinel-2 reader provides functionality for reading Sentinel-2 L1C and L2A products in SAFE format. It supports:

  • Direct reading from local files or cloud storage (Google Cloud Storage)
  • Windowed reading for efficient memory usage
  • Conversion from digital numbers to radiance
  • Access to metadata, including viewing geometry and solar angles

Tutorial examples:

API Reference

Sentinel-2 SAFE Product Reader for L1C and L2A Data.

This module provides readers for Sentinel-2 satellite imagery in the SAFE format, supporting both Level-1C (top-of-atmosphere reflectance) and Level-2A (surface reflectance) products. It handles local files and cloud storage (Google Cloud).

Sentinel-2 Product Levels

::

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 SENTINEL-2 PROCESSING LEVELS                             β”‚
β”‚                                                                          β”‚
β”‚   Level-1C (L1C)                      Level-2A (L2A)                     β”‚
β”‚   ─────────────────                   ─────────────────                  β”‚
β”‚                                                                          β”‚
β”‚   β˜€οΈ Sun                               β˜€οΈ Sun                             β”‚
β”‚    β”‚                                   β”‚                                 β”‚
β”‚    β–Ό                                   β–Ό                                 β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”                        β”‚
β”‚   β”‚Atmosphereβ”‚ ◄─ NOT corrected      β”‚Atmosphereβ”‚ ◄─ CORRECTED          β”‚
β”‚   β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜                        β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜                        β”‚
β”‚        β”‚                                  β”‚                              β”‚
β”‚        β–Ό                                  β–Ό                              β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”                        β”‚
β”‚   β”‚ Surface β”‚                        β”‚ Surface β”‚                        β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                        β”‚
β”‚        β”‚                                  β”‚                              β”‚
β”‚        β–Ό πŸ›°οΈ                              β–Ό πŸ›°οΈ                           β”‚
β”‚                                                                          β”‚
β”‚   TOA Reflectance                     BOA Reflectance                   β”‚
β”‚   - Includes atmospheric effects      - Surface reflectance             β”‚
β”‚   - Globally available                - Atmospheric correction applied  β”‚
β”‚   - Can convert to radiance           - Scene Classification (SCL)     β”‚
β”‚   - 13 bands (incl. B10 cirrus)       - 12 bands (no B10)              β”‚
β”‚                                                                          β”‚
β”‚   Use for:                            Use for:                          β”‚
β”‚   - Radiance-based analysis           - Land cover mapping              β”‚
β”‚   - Custom atmospheric correction     - Vegetation indices (NDVI)       β”‚
β”‚   - Cloud studies (B10)               - Change detection                β”‚
β”‚                                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Spectral Bands

::

Band β”‚ Central Ξ» β”‚ Bandwidth β”‚ Resolution β”‚ L1C β”‚ L2A β”‚ Description
─────┼───────────┼───────────┼────────────┼─────┼─────┼─────────────────────
B01  β”‚   443 nm  β”‚   20 nm   β”‚    60m     β”‚  βœ“  β”‚  βœ“  β”‚ Coastal/Aerosol
B02  β”‚   490 nm  β”‚   65 nm   β”‚    10m     β”‚  βœ“  β”‚  βœ“  β”‚ Blue
B03  β”‚   560 nm  β”‚   35 nm   β”‚    10m     β”‚  βœ“  β”‚  βœ“  β”‚ Green
B04  β”‚   665 nm  β”‚   30 nm   β”‚    10m     β”‚  βœ“  β”‚  βœ“  β”‚ Red
B05  β”‚   705 nm  β”‚   15 nm   β”‚    20m     β”‚  βœ“  β”‚  βœ“  β”‚ Red Edge 1
B06  β”‚   740 nm  β”‚   15 nm   β”‚    20m     β”‚  βœ“  β”‚  βœ“  β”‚ Red Edge 2
B07  β”‚   783 nm  β”‚   20 nm   β”‚    20m     β”‚  βœ“  β”‚  βœ“  β”‚ Red Edge 3
B08  β”‚   842 nm  β”‚  115 nm   β”‚    10m     β”‚  βœ“  β”‚  βœ“  β”‚ NIR
B8A  β”‚   865 nm  β”‚   20 nm   β”‚    20m     β”‚  βœ“  β”‚  βœ“  β”‚ NIR Narrow
B09  β”‚   945 nm  β”‚   20 nm   β”‚    60m     β”‚  βœ“  β”‚  βœ“  β”‚ Water Vapour
B10  β”‚  1375 nm  β”‚   30 nm   β”‚    60m     β”‚  βœ“  β”‚  βœ—  β”‚ Cirrus (L1C only)
B11  β”‚  1610 nm  β”‚   90 nm   β”‚    20m     β”‚  βœ“  β”‚  βœ“  β”‚ SWIR 1
B12  β”‚  2190 nm  β”‚  180 nm   β”‚    20m     β”‚  βœ“  β”‚  βœ“  β”‚ SWIR 2

Data Access

Products can be loaded from:

  1. Local SAFE folders::

    s2 = S2ImageL2A("/data/S2A_MSIL2A_20240115T...SAFE")

  2. Google Cloud Public Bucket (free, no auth)::

    path = "gs://gcp-public-data-sentinel-2/tiles/32/T/QM/..." s2 = S2ImageL2A(path)

  3. Other cloud storage (via fsspec)::

    s2 = S2ImageL2A("s3://bucket/S2A_MSIL2A_...SAFE", requester_pays=True)

Quick Start Examples

Load L2A surface reflectance (most common)::

from georeader.readers.S2_SAFE_reader import S2ImageL2A
from shapely.geometry import box

# Define area of interest in WGS84
aoi = box(-3.75, 40.40, -3.65, 40.50)  # Madrid area

# Load from Google Cloud public bucket
s2 = S2ImageL2A(
    "gs://gcp-public-data-sentinel-2/L2/tiles/30/T/VK/"
    "S2A_MSIL2A_20240115T110351_N0510_R094_T30TVK_20240115T144512.SAFE",
    polygon=aoi,
    out_res=10,  # 10m resolution
    bands=["B04", "B03", "B02", "B08"]  # RGBNIR
)

# Load as GeoTensor
gt = s2.load()
print(f"Shape: {gt.shape}")  # (4, H, W)
print(f"CRS: {gt.crs}")      # EPSG:32630 (UTM 30N)

Load L1C and convert to radiance::

from georeader.readers.S2_SAFE_reader import S2ImageL1C

s2_l1c = S2ImageL1C("/path/to/S2A_MSIL1C_...SAFE", polygon=aoi)

# Read tile metadata for solar angles
s2_l1c.read_metadata_tl()

# Get solar zenith angle
sza = s2_l1c.mean_sza

# Convert DN to at-sensor radiance (W/mΒ²/sr/Β΅m)
radiance = s2_l1c.DN_to_radiance(bands=["B04", "B03", "B02"])

Classes

S2Image Base class with shared functionality (don't use directly) S2ImageL1C Level-1C reader with TOA reflectance and angle accessors S2ImageL2A Level-2A reader with surface reflectance

See Also

georeader.reflectance : Radiance ↔ reflectance conversions georeader.readers.ee_image : Load Sentinel-2 via Google Earth Engine

References

  • ESA Sentinel-2 User Guide: https://sentinel.esa.int/web/sentinel/user-guides/sentinel-2-msi
  • Google Cloud Sentinel-2 Bucket: https://cloud.google.com/storage/docs/public-datasets/sentinel-2
  • Sentinel-2 Radiometric Resolution: https://sentiwiki.copernicus.eu/web/s2-processing

Authors: Gonzalo Mateo-GarcΓ­a, Dan Lopez-Puigdollers

S2Image

Base Sentinel-2 image reader for handling Sentinel-2 satellite products. Do Not use this class directly, use S2ImageL1C or S2ImageL2A instead.

This class provides functionality to read and manipulate Sentinel-2 satellite imagery. It handles the specific format and metadata of Sentinel-2 products, supporting operations like loading bands, masks, and converting digital numbers to radiance.

Parameters:

Name Type Description Default
s2folder str

Path to the Sentinel-2 SAFE product folder.

required
polygon Optional[Polygon]

Polygon defining the area of interest in EPSG:4326. Defaults to None (entire image).

None
granules Optional[Dict[str, str]]

Dictionary mapping band names to file paths. Defaults to None (automatically discovered).

None
out_res int

Output resolution in meters. Must be one of 10, 20, or 60. Defaults to 10.

10
window_focus Optional[Window]

Window to focus on a specific region of the image. Defaults to None (entire image).

None
bands Optional[List[str]]

List of bands to read. If None, all available bands will be loaded based on the product type.

None
metadata_msi Optional[str]

Path to metadata file. If None, it is assumed to be in the SAFE folder.

None

Attributes:

Name Type Description
mission str

Mission identifier (e.g., 'S2A', 'S2B').

producttype str

Product type identifier (e.g., 'MSIL1C', 'MSIL2A').

pdgs str

PDGS Processing Baseline number.

relorbitnum str

Relative Orbit number.

tile_number_field str

Tile Number field.

product_discriminator str

Product Discriminator.

name str

Base name of the product.

folder str

Path to the product folder.

datetime datetime

Acquisition datetime.

metadata_msi str

Path to the MSI metadata file.

out_res int

Output resolution in meters.

bands List[str]

List of bands to read.

dims Tuple[str]

Names of the dimensions ("band", "y", "x").

fill_value_default int

Default fill value (typically 0).

band_check str

Band used as template for reading.

granule_readers Dict[str, RasterioReader]

Dictionary of readers for each band.

window_focus Window

Current window focus.

transform

Affine transform for the window.

crs

Coordinate reference system.

shape

Shape of the data (bands, height, width).

bounds

Bounds of the window.

res Tuple[float, float]

Resolution of the data.

Source code in georeader/readers/S2_SAFE_reader.py
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
class S2Image:
    """
    Base Sentinel-2 image reader for handling Sentinel-2 satellite products.
    Do Not use this class directly, use S2ImageL1C or S2ImageL2A instead.

    This class provides functionality to read and manipulate Sentinel-2 satellite imagery.
    It handles the specific format and metadata of Sentinel-2 products, supporting operations
    like loading bands, masks, and converting digital numbers to radiance.

    Args:
        s2folder (str): Path to the Sentinel-2 SAFE product folder.
        polygon (Optional[Polygon]): Polygon defining the area of interest in EPSG:4326.
            Defaults to None (entire image).
        granules (Optional[Dict[str, str]]): Dictionary mapping band names to file paths.
            Defaults to None (automatically discovered).
        out_res (int): Output resolution in meters. Must be one of 10, 20, or 60. Defaults to 10.
        window_focus (Optional[rasterio.windows.Window]): Window to focus on a specific
            region of the image. Defaults to None (entire image).
        bands (Optional[List[str]]): List of bands to read. If None, all available bands
            will be loaded based on the product type.
        metadata_msi (Optional[str]): Path to metadata file. If None, it is assumed to be
            in the SAFE folder.

    Attributes:
        mission (str): Mission identifier (e.g., 'S2A', 'S2B').
        producttype (str): Product type identifier (e.g., 'MSIL1C', 'MSIL2A').
        pdgs (str): PDGS Processing Baseline number.
        relorbitnum (str): Relative Orbit number.
        tile_number_field (str): Tile Number field.
        product_discriminator (str): Product Discriminator.
        name (str): Base name of the product.
        folder (str): Path to the product folder.
        datetime (datetime): Acquisition datetime.
        metadata_msi (str): Path to the MSI metadata file.
        out_res (int): Output resolution in meters.
        bands (List[str]): List of bands to read.
        dims (Tuple[str]): Names of the dimensions ("band", "y", "x").
        fill_value_default (int): Default fill value (typically 0).
        band_check (str): Band used as template for reading.
        granule_readers (Dict[str, RasterioReader]): Dictionary of readers for each band.
        window_focus (rasterio.windows.Window): Current window focus.
        transform: Affine transform for the window.
        crs: Coordinate reference system.
        shape: Shape of the data (bands, height, width).
        bounds: Bounds of the window.
        res: Resolution of the data.

    """

    def __init__(
        self,
        s2folder: str,
        polygon: Optional[Polygon] = None,
        granules: Optional[Dict[str, str]] = None,
        out_res: int = 10,
        window_focus: Optional[rasterio.windows.Window] = None,
        bands: Optional[List[str]] = None,
        metadata_msi: Optional[str] = None,
    ):
        """
        Sentinel-2 image reader class.

        Args:
            s2folder: name of the SAFE product expects name
            polygon: in CRS EPSG:4326
            granules: dictionary with granule name and path
            out_res: output resolution in meters one of 10, 20, 60 (default 10)
            window_focus: rasterio window to read. All reads will be based on this window
            bands: list of bands to read. If None all bands are read.
            metadata_msi: path to metadata file. If None it is assumed to be in the SAFE folder

        """
        (
            self.mission,
            self.producttype,
            sensing_date_str,
            self.pdgs,
            self.relorbitnum,
            self.tile_number_field,
            self.product_discriminator,
        ) = s2_name_split(s2folder)

        # Remove last trailing slash
        s2folder = (
            s2folder[:-1]
            if (s2folder.endswith("/") or s2folder.endswith("\\"))
            else s2folder
        )
        self.name = os.path.basename(os.path.splitext(s2folder)[0])

        self.folder = s2folder
        self.datetime = datetime.datetime.strptime(
            sensing_date_str, "%Y%m%dT%H%M%S"
        ).replace(tzinfo=datetime.timezone.utc)

        info_granules_metadata = None

        if metadata_msi is None:
            info_granules_metadata = _get_info_granules_metadata(self.folder)
            if info_granules_metadata is not None:
                self.metadata_msi = info_granules_metadata["metadata_msi"]
                if "metadata_tl" in info_granules_metadata:
                    self.metadata_tl = info_granules_metadata["metadata_tl"]
            else:
                self.metadata_msi = os.path.join(
                    self.folder, f"MTD_{self.producttype}.xml"
                ).replace("\\", "/")

        else:
            self.metadata_msi = metadata_msi

        out_res = int(out_res)

        # TODO increase possible out_res to powers of 2 of 10 meters and 60 meters
        # rst = rasterio.open('gs://gcp-public-data-sentinel-2/tiles/49/S/GV/S2B_MSIL1C_20220527T030539_N0400_R075_T49SGV_20220527T051042.SAFE/GRANULE/L1C_T49SGV_A027271_20220527T031740/IMG_DATA/T49SGV_20220527T030539_B02.jp2')
        # rst.overviews(1) -> [2, 4, 8, 16]
        assert out_res in {10, 20, 60}, "Not valid output resolution.Choose 10, 20, 60"

        # Default resolution to read
        self.out_res = out_res

        if bands is None:
            if self.producttype == "MSIL2A":
                self.bands = list(BANDS_S2_L2A)
            else:
                self.bands = list(BANDS_S2)
        else:
            self.bands = normalize_band_names(bands)

        self.dims = ("band", "y", "x")
        self.fill_value_default = 0

        # Select the band that will be used as template when reading
        self.band_check = None
        for band in self.bands:
            if BANDS_RESOLUTION[band] == self.out_res:
                self.band_check = band
                break

        assert (
            self.band_check is not None
        ), f"Not band found of resolution {self.out_res} in {self.bands}"

        # This dict will be filled by the _get_reader function
        self.granule_readers: Dict[str, RasterioReader] = {}
        self.window_focus = window_focus
        self.root_metadata_msi = None
        self._radio_add_offsets = None
        self._solar_irradiance = None
        self._scale_factor_U = None
        self._quantification_value = None

        # The code below could be only triggered if required
        if not granules:
            # This is useful when copying with cache_product_to_local_dir func
            if info_granules_metadata is None:
                info_granules_metadata = _get_info_granules_metadata(self.folder)

            if info_granules_metadata is not None:
                self.granules = info_granules_metadata["granules"]

            else:
                self.load_metadata_msi()
                bands_elms = self.root_metadata_msi.findall(".//IMAGE_FILE")
                all_granules = [
                    os.path.join(self.folder, b.text + ".jp2").replace("\\", "/")
                    for b in bands_elms
                ]
                if self.producttype == "MSIL2A":
                    self.granules = {j.split("_")[-2]: j for j in all_granules}
                else:
                    self.granules = {
                        j.split("_")[-1].replace(".jp2", ""): j for j in all_granules
                    }
        else:
            self.granules = granules

        self._pol = polygon
        if self._pol is not None:
            self._pol_crs = window_utils.polygon_to_crs(
                self._pol, "EPSG:4326", self.crs
            )
        else:
            self._pol_crs = None

    def cache_product_to_local_dir(
        self,
        path_dest: Optional[str] = None,
        print_progress: bool = True,
        format_bands: Optional[str] = None,
    ) -> "__class__":
        """
        Copy the product to a local directory and return a new instance of the class with the new path

        Args:
            path_dest: path to the destination folder. If None, the current folder ()".") is used
            print_progress: print progress bar. Default True
            format_bands: format of the bands. Default None (keep original format). Options: "COG", "GeoTIFF"

        Returns:
            A new instance of the class pointing to the new path
        """
        if path_dest is None:
            path_dest = "."

        if format_bands is not None:
            assert format_bands in {
                "COG",
                "GeoTIFF",
            }, "Not valid format_bands. Choose 'COG' or 'GeoTIFF'"

        name_with_safe = f"{self.name}.SAFE"
        dest_folder = os.path.join(path_dest, name_with_safe)

        # Copy metadata
        metadata_filename = os.path.basename(self.metadata_msi)
        metadata_output_path = os.path.join(dest_folder, metadata_filename)
        if not os.path.exists(metadata_output_path):
            os.makedirs(dest_folder, exist_ok=True)
            self.load_metadata_msi()
            ET.ElementTree(self.root_metadata_msi).write(metadata_output_path)
            root_metadata_msi = self.root_metadata_msi
        else:
            root_metadata_msi = read_xml(metadata_output_path)

        bands_elms = root_metadata_msi.findall(".//IMAGE_FILE")
        if self.producttype == "MSIL2A":
            granules_name_metadata = {b.text.split("_")[-2]: b.text for b in bands_elms}
        else:
            granules_name_metadata = {b.text.split("_")[-1]: b.text for b in bands_elms}

        new_granules = {}
        with tqdm(total=len(self.bands), disable=not print_progress) as pbar:
            for b in self.bands:
                granule = self.granules[b]
                ext_origin = os.path.splitext(granule)[1]

                if format_bands is not None:
                    if ext_origin.startswith(".tif"):
                        convert = False
                    else:
                        convert = True

                    ext_dst = ".tif"
                else:
                    convert = False
                    ext_dst = ext_origin

                namefile = os.path.splitext(granules_name_metadata[b])[0]
                new_granules[b] = namefile + ext_dst
                new_granules_path = os.path.join(dest_folder, new_granules[b])
                if not os.path.exists(new_granules_path):
                    new_granules_path_tmp = os.path.join(
                        dest_folder, namefile + ext_origin
                    )
                    pbar.set_description(
                        f"Donwloading band {b} from {granule} to {new_granules_path}"
                    )
                    dir_granules_path = os.path.dirname(new_granules_path)
                    os.makedirs(dir_granules_path, exist_ok=True)
                    get_file(granule, new_granules_path_tmp)
                    if convert:
                        image = RasterioReader(new_granules_path_tmp).load().squeeze()
                        if format_bands == "COG":
                            save_cog(image, new_granules_path, descriptions=[b])
                        elif format_bands == "GeoTIFF":
                            save_tiled_geotiff(
                                image, new_granules_path, descriptions=[b]
                            )
                        else:
                            raise NotImplementedError(f"Not implemented {format_bands}")
                        os.remove(new_granules_path_tmp)

                pbar.update(1)

        # Save granules for fast reading
        granules_path = os.path.join(dest_folder, "granules.json").replace("\\", "/")
        if not os.path.exists(granules_path):
            with open(granules_path, "w") as fh:
                json.dump(
                    {"granules": new_granules, "metadata_msi": metadata_filename}, fh
                )

        new_granules_full_path = {
            k: os.path.join(dest_folder, v) for k, v in new_granules.items()
        }

        obj = s2loader(
            s2folder=dest_folder,
            out_res=self.out_res,
            window_focus=self.window_focus,
            bands=self.bands,
            granules=new_granules_full_path,
            polygon=self._pol,
            metadata_msi=metadata_output_path,
        )
        obj.root_metadata_msi = root_metadata_msi
        return obj

    def DN_to_radiance(self, dn_data: Optional[GeoTensor] = None) -> GeoTensor:
        return DN_to_radiance(self, dn_data)

    def load_metadata_msi(self) -> ET.Element:
        if self.root_metadata_msi is None:
            self.root_metadata_msi = read_xml(self.metadata_msi)
        return self.root_metadata_msi

    def footprint(self, crs: Optional[str] = None) -> Polygon:
        if self._pol_crs is None:
            self.load_metadata_msi()
            footprint_txt = self.root_metadata_msi.findall(".//EXT_POS_LIST")[0].text
            coords_split = footprint_txt.split(" ")[:-1]
            self._pol = Polygon(
                [
                    (float(lngstr), float(latstr))
                    for latstr, lngstr in zip(coords_split[::2], coords_split[1::2])
                ]
            )
            self._pol_crs = window_utils.polygon_to_crs(
                self._pol, "EPSG:4326", self.crs
            )

        pol_window = window_utils.window_polygon(
            self._get_reader().window_focus, self.transform
        )

        pol = self._pol_crs.intersection(pol_window)

        if (crs is None) or window_utils.compare_crs(self.crs, crs):
            return pol

        return window_utils.polygon_to_crs(pol, self.crs, crs)

    def radio_add_offsets(self) -> Dict[str, float]:
        if self._radio_add_offsets is None:
            self.load_metadata_msi()
            radio_add_offsets = self.root_metadata_msi.findall(".//RADIO_ADD_OFFSET")
            if len(radio_add_offsets) == 0:
                self._radio_add_offsets = {b: 0 for b in BANDS_S2}
            else:
                self._radio_add_offsets = {
                    BANDS_S2[int(r.attrib["band_id"])]: int(r.text)
                    for r in radio_add_offsets
                }

        return self._radio_add_offsets

    def solar_irradiance(self) -> Dict[str, float]:
        """
        Returns solar irradiance per nanometer: W/mΒ²/nm

        Reads solar irradiance from metadata_msi:
            <SOLAR_IRRADIANCE bandId="0" unit="W/mΒ²/Β΅m">1874.3</SOLAR_IRRADIANCE>
        """
        if self._solar_irradiance is None:
            self.load_metadata_msi()
            sr = self.root_metadata_msi.findall(".//SOLAR_IRRADIANCE")
            self._solar_irradiance = {
                BANDS_S2[int(r.attrib["bandId"])]: float(r.text) / 1_000 for r in sr
            }

        return self._solar_irradiance

    def scale_factor_U(self) -> float:
        if self._scale_factor_U is None:
            self.load_metadata_msi()
            self._scale_factor_U = float(self.root_metadata_msi.find(".//U").text)

        return self._scale_factor_U

    def quantification_value(self) -> int:
        """Returns the quantification value stored in the metadata msi file (this is always: 10_000)"""
        if self._quantification_value is None:
            self.load_metadata_msi()
            self._quantification_value = int(
                self.root_metadata_msi.find(".//QUANTIFICATION_VALUE").text
            )

        return self._quantification_value

    def get_reader(
        self, band_names: Union[str, List[str]], overview_level: Optional[int] = None
    ) -> RasterioReader:
        """
        Provides a RasterioReader object to read all the bands at the same resolution

        Args:
            band_names: List of band names or band. raises assertion error if bands have different resolution.
            overview_level: level of the pyramid to read (same as in rasterio)

        Returns:
            RasterioReader

        """
        if isinstance(band_names, str):
            band_names = [band_names]

        band_names = normalize_band_names(band_names)

        assert all(
            BANDS_RESOLUTION[band_names[0]] == BANDS_RESOLUTION[b] for b in band_names
        ), f"Bands: {band_names} have different resolution"

        reader = RasterioReader(
            [self.granules[band_name] for band_name in band_names],
            window_focus=None,
            stack=False,
            fill_value_default=self.fill_value_default,
            overview_level=overview_level,
        )
        window_in = read.window_from_bounds(reader, self.bounds)
        window_in_rounded = read.round_outer_window(window_in)
        reader.set_window(window_in_rounded)
        return reader

    def _get_reader(self, band_name: Optional[str] = None) -> RasterioReader:
        if band_name is None:
            band_name = self.band_check

        if band_name not in self.granule_readers:
            # TODO handle different out_res than 10, 20, 60?
            if self.out_res == BANDS_RESOLUTION[band_name]:
                overview_level = None
                has_out_res = True
            elif self.out_res == BANDS_RESOLUTION[band_name] * 2:
                # out_res == 20 and BANDS_RESOLUTION[band_name]==10 -> read from first overview
                overview_level = 0
                has_out_res = True
            elif self.out_res > BANDS_RESOLUTION[band_name]:
                # out_res 60 and BANDS_RESOLUTION[band_name] == 10 or BANDS_RESOLUTION[band_name] == 20
                overview_level = 1 if BANDS_RESOLUTION[band_name] == 10 else 0
                has_out_res = False
            else:
                overview_level = None
                has_out_res = False

            # figure out which window_focus to set

            if band_name == self.band_check:
                window_focus = self.window_focus
                set_window_after = False
            elif has_out_res:
                window_focus = self.window_focus
                set_window_after = False
            else:
                set_window_after = True
                window_focus = None

            self.granule_readers[band_name] = RasterioReader(
                self.granules[band_name],
                window_focus=window_focus,
                fill_value_default=self.fill_value_default,
                overview_level=overview_level,
            )
            if set_window_after:
                window_in = read.window_from_bounds(
                    self.granule_readers[band_name], self.bounds
                )
                window_in_rounded = read.round_outer_window(window_in)
                self.granule_readers[band_name].set_window(window_in_rounded)

        return self.granule_readers[band_name]

    @property
    def dtype(self):
        # This is always np.uint16
        reader_band_check = self._get_reader()
        return reader_band_check.dtype

    @property
    def shape(self):
        reader_band_check = self._get_reader()
        return (len(self.bands),) + reader_band_check.shape[-2:]

    @property
    def transform(self):
        reader_band_check = self._get_reader()
        return reader_band_check.transform

    @property
    def crs(self):
        reader_band_check = self._get_reader()
        return reader_band_check.crs

    @property
    def bounds(self):
        reader_band_check = self._get_reader()
        return reader_band_check.bounds

    @property
    def res(self) -> Tuple[float, float]:
        reader_band_check = self._get_reader()
        return reader_band_check.res

    def __str__(self):
        return self.folder

    def __repr__(self) -> str:
        return f""" 
         {self.folder}
         Transform: {self.transform}
         Shape: {self.shape}
         Resolution: {self.res}
         Bounds: {self.bounds}
         CRS: {self.crs}
         bands: {self.bands}
         fill_value_default: {self.fill_value_default}
        """

    def read_from_band_names(self, band_names: List[str]) -> "__class__":
        """
        Read from band names

        Args:
            band_names: List of band names

        Returns:
            Copy of current object with band names set to band_names
        """
        s2obj = s2loader(
            s2folder=self.folder,
            out_res=self.out_res,
            window_focus=self.window_focus,
            bands=band_names,
            granules=self.granules,
            polygon=self._pol,
            metadata_msi=self.metadata_msi,
        )
        s2obj.root_metadata_msi = self.root_metadata_msi
        return s2obj

    def read_from_window(
        self, window: rasterio.windows.Window, boundless: bool = True
    ) -> "__class__":
        # return GeoTensor(values=self.values, transform=self.transform, crs=self.crs)

        reader_ref = self._get_reader()
        rasterio_reader_ref = reader_ref.read_from_window(
            window=window, boundless=boundless
        )
        s2obj = s2loader(
            s2folder=self.folder,
            out_res=self.out_res,
            window_focus=rasterio_reader_ref.window_focus,
            bands=self.bands,
            granules=self.granules,
            polygon=self._pol,
            metadata_msi=self.metadata_msi,
        )
        # Set band check to avoid re-reading
        s2obj.granule_readers[self.band_check] = rasterio_reader_ref
        s2obj.band_check = self.band_check

        s2obj.root_metadata_msi = self.root_metadata_msi

        return s2obj

    def load(self, boundless: bool = True) -> GeoTensor:
        reader_ref = self._get_reader()
        geotensor_ref = reader_ref.load(boundless=boundless)

        array_out = np.full(
            (len(self.bands),) + geotensor_ref.shape[-2:],
            fill_value=geotensor_ref.fill_value_default,
            dtype=np.int32,
        )

        # Deal with NODATA values
        invalids = (geotensor_ref.values == 0) | (geotensor_ref.values == (2**16) - 1)

        radio_add = self.radio_add_offsets()
        for idx, b in enumerate(self.bands):
            if b == self.band_check:

                # Avoid bug of band names without zero before
                if len(b) == 2:
                    b = f"B0{b[-1]}"

                geotensor_iter = geotensor_ref
            else:
                reader_iter = self._get_reader(b)
                if (
                    np.mean(
                        np.abs(np.array(reader_iter.res) - np.array(geotensor_ref.res))
                    )
                    < 1e-6
                ):
                    geotensor_iter = reader_iter.load(boundless=boundless)
                else:
                    geotensor_iter = read.read_reproject_like(
                        reader_iter, geotensor_ref
                    )

            # Important: Adds radio correction! otherwise images after 2022-01-25 shifted (PROCESSING_BASELINE '04.00' or above)
            array_out[idx] = geotensor_iter.values[0].astype(np.int32) + radio_add[b]

        array_out[:, invalids[0]] = self.fill_value_default

        if np.any(array_out < 0):
            raise ValueError("Negative values found in the image")

        array_out = array_out.astype(np.uint16)

        return GeoTensor(
            values=array_out,
            transform=geotensor_ref.transform,
            crs=geotensor_ref.crs,
            fill_value_default=self.fill_value_default,
        )

    @property
    def values(self) -> np.ndarray:
        return self.load().values

    def load_mask(self) -> GeoTensor:
        reader_ref = self._get_reader()
        geotensor_ref = reader_ref.load(boundless=True)
        geotensor_ref.values = (geotensor_ref.values == 0) | (
            geotensor_ref.values == (2**16) - 1
        )
        return geotensor_ref

__init__(s2folder, polygon=None, granules=None, out_res=10, window_focus=None, bands=None, metadata_msi=None)

Sentinel-2 image reader class.

Parameters:

Name Type Description Default
s2folder str

name of the SAFE product expects name

required
polygon Optional[Polygon]

in CRS EPSG:4326

None
granules Optional[Dict[str, str]]

dictionary with granule name and path

None
out_res int

output resolution in meters one of 10, 20, 60 (default 10)

10
window_focus Optional[Window]

rasterio window to read. All reads will be based on this window

None
bands Optional[List[str]]

list of bands to read. If None all bands are read.

None
metadata_msi Optional[str]

path to metadata file. If None it is assumed to be in the SAFE folder

None
Source code in georeader/readers/S2_SAFE_reader.py
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
def __init__(
    self,
    s2folder: str,
    polygon: Optional[Polygon] = None,
    granules: Optional[Dict[str, str]] = None,
    out_res: int = 10,
    window_focus: Optional[rasterio.windows.Window] = None,
    bands: Optional[List[str]] = None,
    metadata_msi: Optional[str] = None,
):
    """
    Sentinel-2 image reader class.

    Args:
        s2folder: name of the SAFE product expects name
        polygon: in CRS EPSG:4326
        granules: dictionary with granule name and path
        out_res: output resolution in meters one of 10, 20, 60 (default 10)
        window_focus: rasterio window to read. All reads will be based on this window
        bands: list of bands to read. If None all bands are read.
        metadata_msi: path to metadata file. If None it is assumed to be in the SAFE folder

    """
    (
        self.mission,
        self.producttype,
        sensing_date_str,
        self.pdgs,
        self.relorbitnum,
        self.tile_number_field,
        self.product_discriminator,
    ) = s2_name_split(s2folder)

    # Remove last trailing slash
    s2folder = (
        s2folder[:-1]
        if (s2folder.endswith("/") or s2folder.endswith("\\"))
        else s2folder
    )
    self.name = os.path.basename(os.path.splitext(s2folder)[0])

    self.folder = s2folder
    self.datetime = datetime.datetime.strptime(
        sensing_date_str, "%Y%m%dT%H%M%S"
    ).replace(tzinfo=datetime.timezone.utc)

    info_granules_metadata = None

    if metadata_msi is None:
        info_granules_metadata = _get_info_granules_metadata(self.folder)
        if info_granules_metadata is not None:
            self.metadata_msi = info_granules_metadata["metadata_msi"]
            if "metadata_tl" in info_granules_metadata:
                self.metadata_tl = info_granules_metadata["metadata_tl"]
        else:
            self.metadata_msi = os.path.join(
                self.folder, f"MTD_{self.producttype}.xml"
            ).replace("\\", "/")

    else:
        self.metadata_msi = metadata_msi

    out_res = int(out_res)

    # TODO increase possible out_res to powers of 2 of 10 meters and 60 meters
    # rst = rasterio.open('gs://gcp-public-data-sentinel-2/tiles/49/S/GV/S2B_MSIL1C_20220527T030539_N0400_R075_T49SGV_20220527T051042.SAFE/GRANULE/L1C_T49SGV_A027271_20220527T031740/IMG_DATA/T49SGV_20220527T030539_B02.jp2')
    # rst.overviews(1) -> [2, 4, 8, 16]
    assert out_res in {10, 20, 60}, "Not valid output resolution.Choose 10, 20, 60"

    # Default resolution to read
    self.out_res = out_res

    if bands is None:
        if self.producttype == "MSIL2A":
            self.bands = list(BANDS_S2_L2A)
        else:
            self.bands = list(BANDS_S2)
    else:
        self.bands = normalize_band_names(bands)

    self.dims = ("band", "y", "x")
    self.fill_value_default = 0

    # Select the band that will be used as template when reading
    self.band_check = None
    for band in self.bands:
        if BANDS_RESOLUTION[band] == self.out_res:
            self.band_check = band
            break

    assert (
        self.band_check is not None
    ), f"Not band found of resolution {self.out_res} in {self.bands}"

    # This dict will be filled by the _get_reader function
    self.granule_readers: Dict[str, RasterioReader] = {}
    self.window_focus = window_focus
    self.root_metadata_msi = None
    self._radio_add_offsets = None
    self._solar_irradiance = None
    self._scale_factor_U = None
    self._quantification_value = None

    # The code below could be only triggered if required
    if not granules:
        # This is useful when copying with cache_product_to_local_dir func
        if info_granules_metadata is None:
            info_granules_metadata = _get_info_granules_metadata(self.folder)

        if info_granules_metadata is not None:
            self.granules = info_granules_metadata["granules"]

        else:
            self.load_metadata_msi()
            bands_elms = self.root_metadata_msi.findall(".//IMAGE_FILE")
            all_granules = [
                os.path.join(self.folder, b.text + ".jp2").replace("\\", "/")
                for b in bands_elms
            ]
            if self.producttype == "MSIL2A":
                self.granules = {j.split("_")[-2]: j for j in all_granules}
            else:
                self.granules = {
                    j.split("_")[-1].replace(".jp2", ""): j for j in all_granules
                }
    else:
        self.granules = granules

    self._pol = polygon
    if self._pol is not None:
        self._pol_crs = window_utils.polygon_to_crs(
            self._pol, "EPSG:4326", self.crs
        )
    else:
        self._pol_crs = None

cache_product_to_local_dir(path_dest=None, print_progress=True, format_bands=None)

Copy the product to a local directory and return a new instance of the class with the new path

Parameters:

Name Type Description Default
path_dest Optional[str]

path to the destination folder. If None, the current folder ()".") is used

None
print_progress bool

print progress bar. Default True

True
format_bands Optional[str]

format of the bands. Default None (keep original format). Options: "COG", "GeoTIFF"

None

Returns:

Type Description
__class__

A new instance of the class pointing to the new path

Source code in georeader/readers/S2_SAFE_reader.py
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
def cache_product_to_local_dir(
    self,
    path_dest: Optional[str] = None,
    print_progress: bool = True,
    format_bands: Optional[str] = None,
) -> "__class__":
    """
    Copy the product to a local directory and return a new instance of the class with the new path

    Args:
        path_dest: path to the destination folder. If None, the current folder ()".") is used
        print_progress: print progress bar. Default True
        format_bands: format of the bands. Default None (keep original format). Options: "COG", "GeoTIFF"

    Returns:
        A new instance of the class pointing to the new path
    """
    if path_dest is None:
        path_dest = "."

    if format_bands is not None:
        assert format_bands in {
            "COG",
            "GeoTIFF",
        }, "Not valid format_bands. Choose 'COG' or 'GeoTIFF'"

    name_with_safe = f"{self.name}.SAFE"
    dest_folder = os.path.join(path_dest, name_with_safe)

    # Copy metadata
    metadata_filename = os.path.basename(self.metadata_msi)
    metadata_output_path = os.path.join(dest_folder, metadata_filename)
    if not os.path.exists(metadata_output_path):
        os.makedirs(dest_folder, exist_ok=True)
        self.load_metadata_msi()
        ET.ElementTree(self.root_metadata_msi).write(metadata_output_path)
        root_metadata_msi = self.root_metadata_msi
    else:
        root_metadata_msi = read_xml(metadata_output_path)

    bands_elms = root_metadata_msi.findall(".//IMAGE_FILE")
    if self.producttype == "MSIL2A":
        granules_name_metadata = {b.text.split("_")[-2]: b.text for b in bands_elms}
    else:
        granules_name_metadata = {b.text.split("_")[-1]: b.text for b in bands_elms}

    new_granules = {}
    with tqdm(total=len(self.bands), disable=not print_progress) as pbar:
        for b in self.bands:
            granule = self.granules[b]
            ext_origin = os.path.splitext(granule)[1]

            if format_bands is not None:
                if ext_origin.startswith(".tif"):
                    convert = False
                else:
                    convert = True

                ext_dst = ".tif"
            else:
                convert = False
                ext_dst = ext_origin

            namefile = os.path.splitext(granules_name_metadata[b])[0]
            new_granules[b] = namefile + ext_dst
            new_granules_path = os.path.join(dest_folder, new_granules[b])
            if not os.path.exists(new_granules_path):
                new_granules_path_tmp = os.path.join(
                    dest_folder, namefile + ext_origin
                )
                pbar.set_description(
                    f"Donwloading band {b} from {granule} to {new_granules_path}"
                )
                dir_granules_path = os.path.dirname(new_granules_path)
                os.makedirs(dir_granules_path, exist_ok=True)
                get_file(granule, new_granules_path_tmp)
                if convert:
                    image = RasterioReader(new_granules_path_tmp).load().squeeze()
                    if format_bands == "COG":
                        save_cog(image, new_granules_path, descriptions=[b])
                    elif format_bands == "GeoTIFF":
                        save_tiled_geotiff(
                            image, new_granules_path, descriptions=[b]
                        )
                    else:
                        raise NotImplementedError(f"Not implemented {format_bands}")
                    os.remove(new_granules_path_tmp)

            pbar.update(1)

    # Save granules for fast reading
    granules_path = os.path.join(dest_folder, "granules.json").replace("\\", "/")
    if not os.path.exists(granules_path):
        with open(granules_path, "w") as fh:
            json.dump(
                {"granules": new_granules, "metadata_msi": metadata_filename}, fh
            )

    new_granules_full_path = {
        k: os.path.join(dest_folder, v) for k, v in new_granules.items()
    }

    obj = s2loader(
        s2folder=dest_folder,
        out_res=self.out_res,
        window_focus=self.window_focus,
        bands=self.bands,
        granules=new_granules_full_path,
        polygon=self._pol,
        metadata_msi=metadata_output_path,
    )
    obj.root_metadata_msi = root_metadata_msi
    return obj

get_reader(band_names, overview_level=None)

Provides a RasterioReader object to read all the bands at the same resolution

Parameters:

Name Type Description Default
band_names Union[str, List[str]]

List of band names or band. raises assertion error if bands have different resolution.

required
overview_level Optional[int]

level of the pyramid to read (same as in rasterio)

None

Returns:

Type Description
RasterioReader

RasterioReader

Source code in georeader/readers/S2_SAFE_reader.py
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
def get_reader(
    self, band_names: Union[str, List[str]], overview_level: Optional[int] = None
) -> RasterioReader:
    """
    Provides a RasterioReader object to read all the bands at the same resolution

    Args:
        band_names: List of band names or band. raises assertion error if bands have different resolution.
        overview_level: level of the pyramid to read (same as in rasterio)

    Returns:
        RasterioReader

    """
    if isinstance(band_names, str):
        band_names = [band_names]

    band_names = normalize_band_names(band_names)

    assert all(
        BANDS_RESOLUTION[band_names[0]] == BANDS_RESOLUTION[b] for b in band_names
    ), f"Bands: {band_names} have different resolution"

    reader = RasterioReader(
        [self.granules[band_name] for band_name in band_names],
        window_focus=None,
        stack=False,
        fill_value_default=self.fill_value_default,
        overview_level=overview_level,
    )
    window_in = read.window_from_bounds(reader, self.bounds)
    window_in_rounded = read.round_outer_window(window_in)
    reader.set_window(window_in_rounded)
    return reader

quantification_value()

Returns the quantification value stored in the metadata msi file (this is always: 10_000)

Source code in georeader/readers/S2_SAFE_reader.py
665
666
667
668
669
670
671
672
673
def quantification_value(self) -> int:
    """Returns the quantification value stored in the metadata msi file (this is always: 10_000)"""
    if self._quantification_value is None:
        self.load_metadata_msi()
        self._quantification_value = int(
            self.root_metadata_msi.find(".//QUANTIFICATION_VALUE").text
        )

    return self._quantification_value

read_from_band_names(band_names)

Read from band names

Parameters:

Name Type Description Default
band_names List[str]

List of band names

required

Returns:

Type Description
__class__

Copy of current object with band names set to band_names

Source code in georeader/readers/S2_SAFE_reader.py
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
def read_from_band_names(self, band_names: List[str]) -> "__class__":
    """
    Read from band names

    Args:
        band_names: List of band names

    Returns:
        Copy of current object with band names set to band_names
    """
    s2obj = s2loader(
        s2folder=self.folder,
        out_res=self.out_res,
        window_focus=self.window_focus,
        bands=band_names,
        granules=self.granules,
        polygon=self._pol,
        metadata_msi=self.metadata_msi,
    )
    s2obj.root_metadata_msi = self.root_metadata_msi
    return s2obj

solar_irradiance()

Returns solar irradiance per nanometer: W/mΒ²/nm

Source code in georeader/readers/S2_SAFE_reader.py
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
def solar_irradiance(self) -> Dict[str, float]:
    """
    Returns solar irradiance per nanometer: W/mΒ²/nm

    Reads solar irradiance from metadata_msi:
        <SOLAR_IRRADIANCE bandId="0" unit="W/mΒ²/Β΅m">1874.3</SOLAR_IRRADIANCE>
    """
    if self._solar_irradiance is None:
        self.load_metadata_msi()
        sr = self.root_metadata_msi.findall(".//SOLAR_IRRADIANCE")
        self._solar_irradiance = {
            BANDS_S2[int(r.attrib["bandId"])]: float(r.text) / 1_000 for r in sr
        }

    return self._solar_irradiance

S2ImageL1C

Bases: S2Image

Sentinel-2 Level 1C (top of atmosphere reflectance) image reader.

This class extends the base S2Image class to handle Sentinel-2 Level 1C products, which provide calibrated and orthorectified top of atmosphere reflectance data. It also provides methods to access viewing and solar angle information.

Parameters:

Name Type Description Default
s2folder str

Path to the Sentinel-2 SAFE product folder.

required
granules Dict[str, str]

Dictionary mapping band names to file paths.

required
polygon Polygon

Polygon defining the area of interest in EPSG:4326.

required
out_res int

Output resolution in meters. Must be one of 10, 20, or 60. Defaults to 10.

10
window_focus Optional[Window]

Window to focus on a specific region of the image. Defaults to None (entire image).

None
bands Optional[List[str]]

List of bands to read. If None, all available bands will be loaded.

None
metadata_msi Optional[str]

Path to metadata file. If None, it is assumed to be in the SAFE folder.

None

Attributes:

Name Type Description
Additional to S2Image attributes
granule_folder str

Path to the granule folder.

msk_clouds_file str

Path to the cloud mask file.

metadata_tl str

Path to the TL metadata file.

root_metadata_tl

Root element of the TL metadata XML.

tileId str

Tile identifier.

satId str

Satellite identifier.

procLevel str

Processing level.

dimsByRes Dict

Dimensions by resolution.

ulxyByRes Dict

Upper-left coordinates by resolution.

tileAnglesNode Dict

Tile angles node from metadata.

mean_sza float

Mean solar zenith angle.

mean_saa float

Mean solar azimuth angle.

mean_vza Dict[str, float]

Mean viewing zenith angle per band.

mean_vaa Dict[str, float]

Mean viewing azimuth angle per band.

vaa Dict[str, GeoTensor]

Viewing azimuth angle as GeoTensor per band.

vza Dict[str, GeoTensor]

Viewing zenith angle as GeoTensor per band.

saa GeoTensor

Solar azimuth angle as GeoTensor.

sza GeoTensor

Solar zenith angle as GeoTensor.

anglesULXY Tuple[float, float]

Upper-left coordinates of the angle grids.

Examples:

>>> # Initialize the S2ImageL1C reader with a data path
>>> s2_l1c = S2ImageL1C('/path/to/S2A_MSIL1C_20170717T235959_N0205_R072_T01WCP_20170718T000256.SAFE',
...                     granules=granules_dict, polygon=aoi_polygon)
>>> # Load all bands
>>> l1c_data = s2_l1c.load()
>>> # Read angle information
>>> s2_l1c.read_metadata_tl()
>>> solar_zenith = s2_l1c.sza
>>> # Convert to radiance
>>> radiance_data = s2_l1c.DN_to_radiance()
Source code in georeader/readers/S2_SAFE_reader.py
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
class S2ImageL1C(S2Image):
    """
    Sentinel-2 Level 1C (top of atmosphere reflectance) image reader.

    This class extends the base S2Image class to handle Sentinel-2 Level 1C products,
    which provide calibrated and orthorectified top of atmosphere reflectance data.
    It also provides methods to access viewing and solar angle information.

    Args:
        s2folder (str): Path to the Sentinel-2 SAFE product folder.
        granules (Dict[str, str]): Dictionary mapping band names to file paths.
        polygon (Polygon): Polygon defining the area of interest in EPSG:4326.
        out_res (int): Output resolution in meters. Must be one of 10, 20, or 60. Defaults to 10.
        window_focus (Optional[rasterio.windows.Window]): Window to focus on a specific
            region of the image. Defaults to None (entire image).
        bands (Optional[List[str]]): List of bands to read. If None, all available bands will be loaded.
        metadata_msi (Optional[str]): Path to metadata file. If None, it is assumed to be
            in the SAFE folder.

    Attributes:
        Additional to S2Image attributes:
        granule_folder (str): Path to the granule folder.
        msk_clouds_file (str): Path to the cloud mask file.
        metadata_tl (str): Path to the TL metadata file.
        root_metadata_tl: Root element of the TL metadata XML.
        tileId (str): Tile identifier.
        satId (str): Satellite identifier.
        procLevel (str): Processing level.
        dimsByRes (Dict): Dimensions by resolution.
        ulxyByRes (Dict): Upper-left coordinates by resolution.
        tileAnglesNode: Tile angles node from metadata.
        mean_sza (float): Mean solar zenith angle.
        mean_saa (float): Mean solar azimuth angle.
        mean_vza (Dict[str, float]): Mean viewing zenith angle per band.
        mean_vaa (Dict[str, float]): Mean viewing azimuth angle per band.
        vaa (Dict[str, GeoTensor]): Viewing azimuth angle as GeoTensor per band.
        vza (Dict[str, GeoTensor]): Viewing zenith angle as GeoTensor per band.
        saa (GeoTensor): Solar azimuth angle as GeoTensor.
        sza (GeoTensor): Solar zenith angle as GeoTensor.
        anglesULXY (Tuple[float, float]): Upper-left coordinates of the angle grids.

    Examples:
        >>> # Initialize the S2ImageL1C reader with a data path
        >>> s2_l1c = S2ImageL1C('/path/to/S2A_MSIL1C_20170717T235959_N0205_R072_T01WCP_20170718T000256.SAFE',
        ...                     granules=granules_dict, polygon=aoi_polygon)
        >>> # Load all bands
        >>> l1c_data = s2_l1c.load()
        >>> # Read angle information
        >>> s2_l1c.read_metadata_tl()
        >>> solar_zenith = s2_l1c.sza
        >>> # Convert to radiance
        >>> radiance_data = s2_l1c.DN_to_radiance()
    """

    def __init__(
        self,
        s2folder,
        granules: Dict[str, str],
        polygon: Polygon,
        out_res: int = 10,
        window_focus: Optional[rasterio.windows.Window] = None,
        bands: Optional[List[str]] = None,
        metadata_msi: Optional[str] = None,
    ):
        super(S2ImageL1C, self).__init__(
            s2folder=s2folder,
            granules=granules,
            polygon=polygon,
            out_res=out_res,
            bands=bands,
            window_focus=window_focus,
            metadata_msi=metadata_msi,
        )

        assert (
            self.producttype == "MSIL1C"
        ), f"Unexpected product type {self.producttype} in image {self.folder}"

        first_granule = self.granules[list(self.granules.keys())[0]]
        self.granule_folder = os.path.dirname(os.path.dirname(first_granule))
        self.msk_clouds_file = os.path.join(
            self.granule_folder, "MSK_CLOUDS_B00.gml"
        ).replace("\\", "/")
        if not hasattr(self, "metadata_tl"):
            self.metadata_tl = os.path.join(self.granule_folder, "MTD_TL.xml").replace(
                "\\", "/"
            )

        self.root_metadata_tl = None

        # Granule in L1C does not include TCI
        # Assert bands in self.granule are ordered as in BANDS_S2
        # assert all(granule[-7:-4] == bname for bname, granule in zip(BANDS_S2, self.granule)), f"some granules are not in the expected order {self.granule}"

    def read_from_window(
        self, window: rasterio.windows.Window, boundless: bool = True
    ) -> "__class__":
        out = super().read_from_window(window, boundless=boundless)

        if self.root_metadata_tl is None:
            return out

        # copy all metadata from the original image
        for atribute in [
            "tileId",
            "root_metadata_tl",
            "satId",
            "procLevel",
            "dimsByRes",
            "ulxyByRes",
            "tileAnglesNode",
            "mean_sza",
            "mean_saa",
            "mean_vza",
            "mean_vaa",
            "vaa",
            "vza",
            "saa",
            "sza",
            "anglesULXY",
        ]:
            setattr(out, atribute, getattr(self, atribute))

        return out

    def cache_product_to_local_dir(
        self,
        path_dest: Optional[str] = None,
        print_progress: bool = True,
        format_bands: Optional[str] = None,
    ) -> "__class__":
        """
        Overrides the parent method to copy the MTD_TL.xml file

        Args:
            path_dest (Optional[str], optional): path to the destination folder. Defaults to None.
            print_progress (bool, optional): whether to print progress. Defaults to True.

        Returns:
            __class__: the cached object
        """
        new_obj = super().cache_product_to_local_dir(
            path_dest=path_dest,
            print_progress=print_progress,
            format_bands=format_bands,
        )

        if os.path.exists(new_obj.metadata_tl):
            # the cached product already exists. returns
            return new_obj

        if self.root_metadata_tl is not None:
            new_obj.root_metadata_tl = self.root_metadata_tl
            ET.ElementTree(new_obj.metadata_tl).write(new_obj.metadata_tl)
            # copy all metadata from the original image
            for atribute in [
                "tileId",
                "root_metadata_tl",
                "satId",
                "procLevel",
                "dimsByRes",
                "ulxyByRes",
                "tileAnglesNode",
                "mean_sza",
                "mean_saa",
                "mean_vza",
                "mean_vaa",
                "vaa",
                "vza",
                "saa",
                "sza",
                "anglesULXY",
            ]:
                if hasattr(self, atribute):
                    setattr(new_obj, atribute, getattr(self, atribute))
        else:
            get_file(self.metadata_tl, new_obj.metadata_tl)

        granule_folder_rel = new_obj.granule_folder.replace("\\", "/").replace(
            new_obj.folder.replace("\\", "/") + "/", ""
        )
        # Add metadata_tl to granules.json
        granules_path = os.path.join(new_obj.folder, "granules.json").replace("\\", "/")
        with open(granules_path, "r") as fh:
            info_granules_metadata = json.load(fh)
        info_granules_metadata["metadata_tl"] = os.path.join(
            granule_folder_rel, "MTD_TL.xml"
        ).replace("\\", "/")
        with open(granules_path, "w") as f:
            json.dump(info_granules_metadata, f)

        return new_obj

    def read_metadata_tl(self):
        """
        Read metadata TILE to parse information about the acquisition and properties of GRANULE bands.

        It populates the following attributes:
            - mean_sza
            - mean_saa
            - mean_vza
            - mean_vaa
            - vaa
            - vza
            - saa
            - sza
            - anglesULXY
            - tileId
            - satId
            - procLevel
            - epsg_code
            - dimsByRes
            - ulxyByRes
            - tileAnglesNode
            - root_metadata_tl

        """
        if self.root_metadata_tl is not None:
            return

        self.root_metadata_tl = read_xml(self.metadata_tl)

        # Stoopid XML namespace prefix
        nsPrefix = self.root_metadata_tl.tag[: self.root_metadata_tl.tag.index("}") + 1]
        nsDict = {"n1": nsPrefix[1:-1]}

        self.mean_sza = float(
            self.root_metadata_tl.find(".//Mean_Sun_Angle/ZENITH_ANGLE").text
        )
        self.mean_saa = float(
            self.root_metadata_tl.find(".//Mean_Sun_Angle/AZIMUTH_ANGLE").text
        )

        generalInfoNode = self.root_metadata_tl.find("n1:General_Info", nsDict)
        # N.B. I am still not entirely convinced that this SENSING_TIME is really
        # the acquisition time, but the documentation is rubbish.
        sensingTimeNode = generalInfoNode.find("SENSING_TIME")
        sensingTimeStr = sensingTimeNode.text.strip()
        # self.datetime = datetime.datetime.strptime(sensingTimeStr, "%Y-%m-%dT%H:%M:%S.%fZ")
        tileIdNode = generalInfoNode.find("TILE_ID")
        tileIdFullStr = tileIdNode.text.strip()
        self.tileId = tileIdFullStr.split("_")[-2]
        self.satId = tileIdFullStr[:3]
        self.procLevel = tileIdFullStr[
            13:16
        ]  # Not sure whether to use absolute pos or split by '_'....

        geomInfoNode = self.root_metadata_tl.find("n1:Geometric_Info", nsDict)
        geocodingNode = geomInfoNode.find("Tile_Geocoding")
        self.epsg_code = geocodingNode.find("HORIZONTAL_CS_CODE").text

        # Dimensions of images at different resolutions.
        self.dimsByRes = {}
        sizeNodeList = geocodingNode.findall("Size")
        for sizeNode in sizeNodeList:
            res = sizeNode.attrib["resolution"]
            nrows = int(sizeNode.find("NROWS").text)
            ncols = int(sizeNode.find("NCOLS").text)
            self.dimsByRes[res] = (nrows, ncols)

        # Upper-left corners of images at different resolutions. As far as I can
        # work out, these coords appear to be the upper left corner of the upper left
        # pixel, i.e. equivalent to GDAL's convention. This also means that they
        # are the same for the different resolutions, which is nice.
        self.ulxyByRes = {}
        posNodeList = geocodingNode.findall("Geoposition")
        for posNode in posNodeList:
            res = posNode.attrib["resolution"]
            ulx = float(posNode.find("ULX").text)
            uly = float(posNode.find("ULY").text)
            self.ulxyByRes[res] = (ulx, uly)

        # Sun and satellite angles.
        # Zenith
        self.tileAnglesNode = geomInfoNode.find("Tile_Angles")
        sunZenithNode = self.tileAnglesNode.find("Sun_Angles_Grid").find("Zenith")
        # <Zenith>
        #  <COL_STEP unit="m">5000</COL_STEP>
        #  <ROW_STEP unit="m">5000</ROW_STEP>
        angleGridXres = float(sunZenithNode.find("COL_STEP").text)
        angleGridYres = float(sunZenithNode.find("ROW_STEP").text)
        sza = self._makeValueArray(sunZenithNode.find("Values_List"))
        mask_nans = np.isnan(sza)
        if np.any(mask_nans):
            from skimage.restoration import inpaint_biharmonic

            sza = inpaint_biharmonic(sza, mask_nans)
        transform_zenith = rasterio.transform.from_origin(
            self.ulxyByRes[str(self.out_res)][0],
            self.ulxyByRes[str(self.out_res)][1],
            angleGridXres,
            angleGridYres,
        )

        self.sza = GeoTensor(sza, transform=transform_zenith, crs=self.epsg_code)

        # Azimuth
        sunAzimuthNode = self.tileAnglesNode.find("Sun_Angles_Grid").find("Azimuth")
        angleGridXres = float(sunAzimuthNode.find("COL_STEP").text)
        angleGridYres = float(sunAzimuthNode.find("ROW_STEP").text)
        saa = self._makeValueArray(sunAzimuthNode.find("Values_List"))
        mask_nans = np.isnan(saa)
        if np.any(mask_nans):
            from skimage.restoration import inpaint_biharmonic

            saa = inpaint_biharmonic(saa, mask_nans)
        transform_azimuth = rasterio.transform.from_origin(
            self.ulxyByRes[str(self.out_res)][0],
            self.ulxyByRes[str(self.out_res)][1],
            angleGridXres,
            angleGridYres,
        )
        self.saa = GeoTensor(saa, transform=transform_azimuth, crs=self.epsg_code)

        # Now build up the viewing angle per grid cell, from the separate layers
        # given for each detector for each band. Initially I am going to keep
        # the bands separate, just to see how that looks.
        # The names of things in the XML suggest that these are view angles,
        # but the numbers suggest that they are angles as seen from the pixel's
        # frame of reference on the ground, i.e. they are in fact what we ultimately want.
        viewingAngleNodeList = self.tileAnglesNode.findall(
            "Viewing_Incidence_Angles_Grids"
        )
        vza = self._buildViewAngleArr(viewingAngleNodeList, "Zenith")
        vaa = self._buildViewAngleArr(viewingAngleNodeList, "Azimuth")

        self.vaa = {}
        for k, varr in vaa.items():
            mask_nans = np.isnan(varr)
            if np.any(mask_nans):
                from skimage.restoration import inpaint_biharmonic

                varr = inpaint_biharmonic(varr, mask_nans)

            self.vaa[k] = GeoTensor(
                varr, transform=transform_azimuth, crs=self.epsg_code
            )

        self.vza = {}
        for k, varr in vza.items():
            mask_nans = np.isnan(varr)
            if np.any(mask_nans):
                from skimage.restoration import inpaint_biharmonic

                varr = inpaint_biharmonic(varr, mask_nans)
            self.vza[k] = GeoTensor(
                varr, transform=transform_zenith, crs=self.epsg_code
            )

        # Make a guess at the coordinates of the angle grids. These are not given
        # explicitly in the XML, and don't line up exactly with the other grids, so I am
        # making a rough estimate. Because the angles don't change rapidly across these
        # distances, it is not important if I am a bit wrong (although it would be nice
        # to be exactly correct!).
        (ulx, uly) = self.ulxyByRes["10"]
        self.anglesULXY = (ulx - angleGridXres / 2.0, uly + angleGridYres / 2.0)

        # Read mean viewing angles for each band.
        self.mean_vaa = {}
        self.mean_vza = {}
        for elm in self.tileAnglesNode.find("Mean_Viewing_Incidence_Angle_List"):
            band_name = BANDS_S2[int(elm.attrib["bandId"])]
            viewing_zenith_angle = float(elm.find("ZENITH_ANGLE").text)
            viewing_azimuth_angle = float(elm.find("AZIMUTH_ANGLE").text)
            self.mean_vza[band_name] = viewing_zenith_angle
            self.mean_vaa[band_name] = viewing_azimuth_angle

    def _buildViewAngleArr(self, viewingAngleNodeList, angleName):
        """
        Build up the named viewing angle array from the various detector strips given as
        separate arrays. I don't really understand this, and may need to re-write it once
        I have worked it out......

        The angleName is one of 'Zenith' or 'Azimuth'.
        Returns a dictionary of 2-d arrays, keyed by the bandId string.
        """
        angleArrDict = {}
        for viewingAngleNode in viewingAngleNodeList:
            band_name = BANDS_S2[int(viewingAngleNode.attrib["bandId"])]
            detectorId = viewingAngleNode.attrib["detectorId"]

            angleNode = viewingAngleNode.find(angleName)
            angleArr = self._makeValueArray(angleNode.find("Values_List"))
            if band_name not in angleArrDict:
                angleArrDict[band_name] = angleArr
            else:
                mask = ~np.isnan(angleArr)
                angleArrDict[band_name][mask] = angleArr[mask]
        return angleArrDict

    @staticmethod
    def _makeValueArray(valuesListNode):
        """
        Take a <Values_List> node from the XML, and return an array of the values contained
        within it. This will be a 2-d numpy array of float32 values (should I pass the dtype in??)

        """
        valuesList = valuesListNode.findall("VALUES")
        vals = []
        for valNode in valuesList:
            text = valNode.text
            vals.append([np.float32(x) for x in text.strip().split()])

        return np.array(vals)

cache_product_to_local_dir(path_dest=None, print_progress=True, format_bands=None)

Overrides the parent method to copy the MTD_TL.xml file

Parameters:

Name Type Description Default
path_dest Optional[str]

path to the destination folder. Defaults to None.

None
print_progress bool

whether to print progress. Defaults to True.

True

Returns:

Name Type Description
__class__ __class__

the cached object

Source code in georeader/readers/S2_SAFE_reader.py
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
def cache_product_to_local_dir(
    self,
    path_dest: Optional[str] = None,
    print_progress: bool = True,
    format_bands: Optional[str] = None,
) -> "__class__":
    """
    Overrides the parent method to copy the MTD_TL.xml file

    Args:
        path_dest (Optional[str], optional): path to the destination folder. Defaults to None.
        print_progress (bool, optional): whether to print progress. Defaults to True.

    Returns:
        __class__: the cached object
    """
    new_obj = super().cache_product_to_local_dir(
        path_dest=path_dest,
        print_progress=print_progress,
        format_bands=format_bands,
    )

    if os.path.exists(new_obj.metadata_tl):
        # the cached product already exists. returns
        return new_obj

    if self.root_metadata_tl is not None:
        new_obj.root_metadata_tl = self.root_metadata_tl
        ET.ElementTree(new_obj.metadata_tl).write(new_obj.metadata_tl)
        # copy all metadata from the original image
        for atribute in [
            "tileId",
            "root_metadata_tl",
            "satId",
            "procLevel",
            "dimsByRes",
            "ulxyByRes",
            "tileAnglesNode",
            "mean_sza",
            "mean_saa",
            "mean_vza",
            "mean_vaa",
            "vaa",
            "vza",
            "saa",
            "sza",
            "anglesULXY",
        ]:
            if hasattr(self, atribute):
                setattr(new_obj, atribute, getattr(self, atribute))
    else:
        get_file(self.metadata_tl, new_obj.metadata_tl)

    granule_folder_rel = new_obj.granule_folder.replace("\\", "/").replace(
        new_obj.folder.replace("\\", "/") + "/", ""
    )
    # Add metadata_tl to granules.json
    granules_path = os.path.join(new_obj.folder, "granules.json").replace("\\", "/")
    with open(granules_path, "r") as fh:
        info_granules_metadata = json.load(fh)
    info_granules_metadata["metadata_tl"] = os.path.join(
        granule_folder_rel, "MTD_TL.xml"
    ).replace("\\", "/")
    with open(granules_path, "w") as f:
        json.dump(info_granules_metadata, f)

    return new_obj

read_metadata_tl()

Read metadata TILE to parse information about the acquisition and properties of GRANULE bands.

It populates the following attributes
  • mean_sza
  • mean_saa
  • mean_vza
  • mean_vaa
  • vaa
  • vza
  • saa
  • sza
  • anglesULXY
  • tileId
  • satId
  • procLevel
  • epsg_code
  • dimsByRes
  • ulxyByRes
  • tileAnglesNode
  • root_metadata_tl
Source code in georeader/readers/S2_SAFE_reader.py
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
def read_metadata_tl(self):
    """
    Read metadata TILE to parse information about the acquisition and properties of GRANULE bands.

    It populates the following attributes:
        - mean_sza
        - mean_saa
        - mean_vza
        - mean_vaa
        - vaa
        - vza
        - saa
        - sza
        - anglesULXY
        - tileId
        - satId
        - procLevel
        - epsg_code
        - dimsByRes
        - ulxyByRes
        - tileAnglesNode
        - root_metadata_tl

    """
    if self.root_metadata_tl is not None:
        return

    self.root_metadata_tl = read_xml(self.metadata_tl)

    # Stoopid XML namespace prefix
    nsPrefix = self.root_metadata_tl.tag[: self.root_metadata_tl.tag.index("}") + 1]
    nsDict = {"n1": nsPrefix[1:-1]}

    self.mean_sza = float(
        self.root_metadata_tl.find(".//Mean_Sun_Angle/ZENITH_ANGLE").text
    )
    self.mean_saa = float(
        self.root_metadata_tl.find(".//Mean_Sun_Angle/AZIMUTH_ANGLE").text
    )

    generalInfoNode = self.root_metadata_tl.find("n1:General_Info", nsDict)
    # N.B. I am still not entirely convinced that this SENSING_TIME is really
    # the acquisition time, but the documentation is rubbish.
    sensingTimeNode = generalInfoNode.find("SENSING_TIME")
    sensingTimeStr = sensingTimeNode.text.strip()
    # self.datetime = datetime.datetime.strptime(sensingTimeStr, "%Y-%m-%dT%H:%M:%S.%fZ")
    tileIdNode = generalInfoNode.find("TILE_ID")
    tileIdFullStr = tileIdNode.text.strip()
    self.tileId = tileIdFullStr.split("_")[-2]
    self.satId = tileIdFullStr[:3]
    self.procLevel = tileIdFullStr[
        13:16
    ]  # Not sure whether to use absolute pos or split by '_'....

    geomInfoNode = self.root_metadata_tl.find("n1:Geometric_Info", nsDict)
    geocodingNode = geomInfoNode.find("Tile_Geocoding")
    self.epsg_code = geocodingNode.find("HORIZONTAL_CS_CODE").text

    # Dimensions of images at different resolutions.
    self.dimsByRes = {}
    sizeNodeList = geocodingNode.findall("Size")
    for sizeNode in sizeNodeList:
        res = sizeNode.attrib["resolution"]
        nrows = int(sizeNode.find("NROWS").text)
        ncols = int(sizeNode.find("NCOLS").text)
        self.dimsByRes[res] = (nrows, ncols)

    # Upper-left corners of images at different resolutions. As far as I can
    # work out, these coords appear to be the upper left corner of the upper left
    # pixel, i.e. equivalent to GDAL's convention. This also means that they
    # are the same for the different resolutions, which is nice.
    self.ulxyByRes = {}
    posNodeList = geocodingNode.findall("Geoposition")
    for posNode in posNodeList:
        res = posNode.attrib["resolution"]
        ulx = float(posNode.find("ULX").text)
        uly = float(posNode.find("ULY").text)
        self.ulxyByRes[res] = (ulx, uly)

    # Sun and satellite angles.
    # Zenith
    self.tileAnglesNode = geomInfoNode.find("Tile_Angles")
    sunZenithNode = self.tileAnglesNode.find("Sun_Angles_Grid").find("Zenith")
    # <Zenith>
    #  <COL_STEP unit="m">5000</COL_STEP>
    #  <ROW_STEP unit="m">5000</ROW_STEP>
    angleGridXres = float(sunZenithNode.find("COL_STEP").text)
    angleGridYres = float(sunZenithNode.find("ROW_STEP").text)
    sza = self._makeValueArray(sunZenithNode.find("Values_List"))
    mask_nans = np.isnan(sza)
    if np.any(mask_nans):
        from skimage.restoration import inpaint_biharmonic

        sza = inpaint_biharmonic(sza, mask_nans)
    transform_zenith = rasterio.transform.from_origin(
        self.ulxyByRes[str(self.out_res)][0],
        self.ulxyByRes[str(self.out_res)][1],
        angleGridXres,
        angleGridYres,
    )

    self.sza = GeoTensor(sza, transform=transform_zenith, crs=self.epsg_code)

    # Azimuth
    sunAzimuthNode = self.tileAnglesNode.find("Sun_Angles_Grid").find("Azimuth")
    angleGridXres = float(sunAzimuthNode.find("COL_STEP").text)
    angleGridYres = float(sunAzimuthNode.find("ROW_STEP").text)
    saa = self._makeValueArray(sunAzimuthNode.find("Values_List"))
    mask_nans = np.isnan(saa)
    if np.any(mask_nans):
        from skimage.restoration import inpaint_biharmonic

        saa = inpaint_biharmonic(saa, mask_nans)
    transform_azimuth = rasterio.transform.from_origin(
        self.ulxyByRes[str(self.out_res)][0],
        self.ulxyByRes[str(self.out_res)][1],
        angleGridXres,
        angleGridYres,
    )
    self.saa = GeoTensor(saa, transform=transform_azimuth, crs=self.epsg_code)

    # Now build up the viewing angle per grid cell, from the separate layers
    # given for each detector for each band. Initially I am going to keep
    # the bands separate, just to see how that looks.
    # The names of things in the XML suggest that these are view angles,
    # but the numbers suggest that they are angles as seen from the pixel's
    # frame of reference on the ground, i.e. they are in fact what we ultimately want.
    viewingAngleNodeList = self.tileAnglesNode.findall(
        "Viewing_Incidence_Angles_Grids"
    )
    vza = self._buildViewAngleArr(viewingAngleNodeList, "Zenith")
    vaa = self._buildViewAngleArr(viewingAngleNodeList, "Azimuth")

    self.vaa = {}
    for k, varr in vaa.items():
        mask_nans = np.isnan(varr)
        if np.any(mask_nans):
            from skimage.restoration import inpaint_biharmonic

            varr = inpaint_biharmonic(varr, mask_nans)

        self.vaa[k] = GeoTensor(
            varr, transform=transform_azimuth, crs=self.epsg_code
        )

    self.vza = {}
    for k, varr in vza.items():
        mask_nans = np.isnan(varr)
        if np.any(mask_nans):
            from skimage.restoration import inpaint_biharmonic

            varr = inpaint_biharmonic(varr, mask_nans)
        self.vza[k] = GeoTensor(
            varr, transform=transform_zenith, crs=self.epsg_code
        )

    # Make a guess at the coordinates of the angle grids. These are not given
    # explicitly in the XML, and don't line up exactly with the other grids, so I am
    # making a rough estimate. Because the angles don't change rapidly across these
    # distances, it is not important if I am a bit wrong (although it would be nice
    # to be exactly correct!).
    (ulx, uly) = self.ulxyByRes["10"]
    self.anglesULXY = (ulx - angleGridXres / 2.0, uly + angleGridYres / 2.0)

    # Read mean viewing angles for each band.
    self.mean_vaa = {}
    self.mean_vza = {}
    for elm in self.tileAnglesNode.find("Mean_Viewing_Incidence_Angle_List"):
        band_name = BANDS_S2[int(elm.attrib["bandId"])]
        viewing_zenith_angle = float(elm.find("ZENITH_ANGLE").text)
        viewing_azimuth_angle = float(elm.find("AZIMUTH_ANGLE").text)
        self.mean_vza[band_name] = viewing_zenith_angle
        self.mean_vaa[band_name] = viewing_azimuth_angle

S2ImageL2A

Bases: S2Image

Sentinel-2 Level 2A (surface reflectance) image reader.

This class extends the base S2Image class to handle Sentinel-2 Level 2A products, which provide surface reflectance data with atmospheric corrections applied.

Parameters:

Name Type Description Default
s2folder str

Path to the Sentinel-2 SAFE product folder.

required
granules Dict[str, str]

Dictionary mapping band names to file paths.

required
polygon Polygon

Polygon defining the area of interest in EPSG:4326.

required
out_res int

Output resolution in meters. Must be one of 10, 20, or 60. Defaults to 10.

10
window_focus Optional[Window]

Window to focus on a specific region of the image. Defaults to None (entire image).

None
bands Optional[List[str]]

List of bands to read. If None, the default L2A bands (excluding B10) will be loaded.

None
metadata_msi Optional[str]

Path to metadata file. If None, it is assumed to be in the SAFE folder.

None

Attributes:

Name Type Description
mission str

Mission identifier (e.g., 'S2A', 'S2B').

producttype str

Product type identifier (e.g., 'MSIL2A').

pdgs str

PDGS Processing Baseline number.

relorbitnum str

Relative Orbit number.

tile_number_field str

Tile Number field.

product_discriminator str

Product Discriminator.

name str

Base name of the product.

folder str

Path to the product folder.

datetime datetime

Acquisition datetime.

metadata_msi str

Path to the MSI metadata file.

out_res int

Output resolution in meters.

bands List[str]

List of bands to read.

dims Tuple[str]

Names of the dimensions ("band", "y", "x").

fill_value_default int

Default fill value (typically 0).

band_check str

Band used as template for reading.

granule_readers Dict[str, RasterioReader]

Dictionary of readers for each band.

window_focus Window

Current window focus.

Examples:

>>> # Initialize the S2ImageL2A reader with a data path
>>> s2_l2a = S2ImageL2A('/path/to/S2A_MSIL2A_20170717T235959_N0205_R072_T01WCP_20170718T000256.SAFE',
...                     granules=granules_dict, polygon=aoi_polygon)
>>> # Load all bands
>>> l2a_data = s2_l2a.load()
Source code in georeader/readers/S2_SAFE_reader.py
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
class S2ImageL2A(S2Image):
    """
    Sentinel-2 Level 2A (surface reflectance) image reader.

    This class extends the base S2Image class to handle Sentinel-2 Level 2A products,
    which provide surface reflectance data with atmospheric corrections applied.

    Args:
        s2folder (str): Path to the Sentinel-2 SAFE product folder.
        granules (Dict[str, str]): Dictionary mapping band names to file paths.
        polygon (Polygon): Polygon defining the area of interest in EPSG:4326.
        out_res (int): Output resolution in meters. Must be one of 10, 20, or 60. Defaults to 10.
        window_focus (Optional[rasterio.windows.Window]): Window to focus on a specific
            region of the image. Defaults to None (entire image).
        bands (Optional[List[str]]): List of bands to read. If None, the default L2A bands
            (excluding B10) will be loaded.
        metadata_msi (Optional[str]): Path to metadata file. If None, it is assumed to be
            in the SAFE folder.

    Attributes:
        mission (str): Mission identifier (e.g., 'S2A', 'S2B').
        producttype (str): Product type identifier (e.g., 'MSIL2A').
        pdgs (str): PDGS Processing Baseline number.
        relorbitnum (str): Relative Orbit number.
        tile_number_field (str): Tile Number field.
        product_discriminator (str): Product Discriminator.
        name (str): Base name of the product.
        folder (str): Path to the product folder.
        datetime (datetime): Acquisition datetime.
        metadata_msi (str): Path to the MSI metadata file.
        out_res (int): Output resolution in meters.
        bands (List[str]): List of bands to read.
        dims (Tuple[str]): Names of the dimensions ("band", "y", "x").
        fill_value_default (int): Default fill value (typically 0).
        band_check (str): Band used as template for reading.
        granule_readers (Dict[str, RasterioReader]): Dictionary of readers for each band.
        window_focus (rasterio.windows.Window): Current window focus.

    Examples:
        >>> # Initialize the S2ImageL2A reader with a data path
        >>> s2_l2a = S2ImageL2A('/path/to/S2A_MSIL2A_20170717T235959_N0205_R072_T01WCP_20170718T000256.SAFE',
        ...                     granules=granules_dict, polygon=aoi_polygon)
        >>> # Load all bands
        >>> l2a_data = s2_l2a.load()
    """

    def __init__(
        self,
        s2folder: str,
        granules: Dict[str, str],
        polygon: Polygon,
        out_res: int = 10,
        window_focus: Optional[rasterio.windows.Window] = None,
        bands: Optional[List[str]] = None,
        metadata_msi: Optional[str] = None,
    ):
        if bands is None:
            bands = BANDS_S2_L2A

        super(S2ImageL2A, self).__init__(
            s2folder=s2folder,
            granules=granules,
            polygon=polygon,
            out_res=out_res,
            bands=bands,
            window_focus=window_focus,
            metadata_msi=metadata_msi,
        )

        assert (
            self.producttype == "MSIL2A"
        ), f"Unexpected product type {self.producttype} in image {self.folder}"

s2loader(s2folder, out_res=10, bands=None, window_focus=None, granules=None, polygon=None, metadata_msi=None)

Loads a S2ImageL2A or S2ImageL1C depending on the product type

Parameters:

Name Type Description Default
s2folder str

.SAFE folder. Expected standard ESA naming convention (see s2_name_split fun)

required
out_res int

default output resolution {10, 20, 60}

10
bands Optional[List[str]]

Bands to read. Default to BANDS_S2 or BANDS_S2_L2A depending on the product type

None
window_focus Optional[Window]

window to read when creating the object

None
granules Optional[Dict[str, str]]

Dict where keys are the band names and values are paths to the band location

None
polygon Optional[Polygon]

polygon with the footprint of the object

None
metadata_msi Optional[str]

path to metadata file

None

Returns:

Type Description
Union[S2ImageL2A, S2ImageL1C]

S2Image reader

Source code in georeader/readers/S2_SAFE_reader.py
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
def s2loader(
    s2folder: str,
    out_res: int = 10,
    bands: Optional[List[str]] = None,
    window_focus: Optional[rasterio.windows.Window] = None,
    granules: Optional[Dict[str, str]] = None,
    polygon: Optional[Polygon] = None,
    metadata_msi: Optional[str] = None,
) -> Union[S2ImageL2A, S2ImageL1C]:
    """
    Loads a S2ImageL2A or S2ImageL1C depending on the product type

    Args:
        s2folder: .SAFE folder. Expected standard ESA naming convention (see s2_name_split fun)
        out_res: default output resolution {10, 20, 60}
        bands: Bands to read. Default to BANDS_S2 or BANDS_S2_L2A depending on the product type
        window_focus: window to read when creating the object
        granules: Dict where keys are the band names and values are paths to the band location
        polygon: polygon with the footprint of the object
        metadata_msi: path to metadata file

    Returns:
        S2Image reader
    """

    _, producttype_nos2, _, _, _, _, _ = s2_name_split(s2folder)

    if producttype_nos2 == "MSIL2A":
        return S2ImageL2A(
            s2folder,
            granules=granules,
            polygon=polygon,
            out_res=out_res,
            bands=bands,
            window_focus=window_focus,
            metadata_msi=metadata_msi,
        )
    elif producttype_nos2 == "MSIL1C":
        return S2ImageL1C(
            s2folder,
            granules=granules,
            polygon=polygon,
            out_res=out_res,
            bands=bands,
            window_focus=window_focus,
            metadata_msi=metadata_msi,
        )

    raise NotImplementedError(f"Don't know how to load {producttype_nos2} products")

s2_public_bucket_path(s2file, check_exists=False, mode='gcp')

Returns the expected patch in the public bucket of the S2 file

Parameters:

Name Type Description Default
s2file str

safe file (e.g. S2B_MSIL1C_20220527T030539_N0400_R075_T49SGV_20220527T051042.SAFE)

required
check_exists bool

check if the file exists in the bucket, This will not work if GOOGLE_APPLICATION_CREDENTIALS and/or GS_USER_PROJECT env variables are not set. Default to False

False
mode str

"gcp" or "rest"

'gcp'

Returns:

Type Description
str

full path to the file (e.g. gs://gcp-public-data-sentinel-2/tiles/49/S/GV/S2B_MSIL1C_20220527T030539_N0400_R075_T49SGV_20220527T051042.SAFE)

Source code in georeader/readers/S2_SAFE_reader.py
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
def s2_public_bucket_path(
    s2file: str, check_exists: bool = False, mode: str = "gcp"
) -> str:
    """
    Returns the expected patch in the public bucket of the S2 file

    Args:
        s2file: safe file (e.g.  S2B_MSIL1C_20220527T030539_N0400_R075_T49SGV_20220527T051042.SAFE)
        check_exists: check if the file exists in the bucket, This will not work if GOOGLE_APPLICATION_CREDENTIALS and/or GS_USER_PROJECT
            env variables are not set. Default to False
        mode: "gcp" or "rest"

    Returns:
        full path to the file (e.g. gs://gcp-public-data-sentinel-2/tiles/49/S/GV/S2B_MSIL1C_20220527T030539_N0400_R075_T49SGV_20220527T051042.SAFE)
    """
    (
        mission,
        producttype,
        sensing_date_str,
        pdgs,
        relorbitnum,
        tile_number_field,
        product_discriminator,
    ) = s2_name_split(s2file)
    s2file = s2file[:-1] if s2file.endswith("/") else s2file

    if not s2file.endswith(".SAFE"):
        s2file += ".SAFE"

    basename = os.path.basename(s2file)
    if mode == "gcp":
        s2folder = f"{FULL_PATH_PUBLIC_BUCKET_SENTINEL_2}tiles/{tile_number_field[:2]}/{tile_number_field[2]}/{tile_number_field[3:]}/{basename}"
    elif mode == "rest":
        s2folder = f"https://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/{tile_number_field[:2]}/{tile_number_field[2]}/{tile_number_field[3:]}/{basename}"
    else:
        raise NotImplementedError(f"Mode {mode} unknown")

    if check_exists and (mode == "gcp"):
        fs = get_filesystem(s2folder)

        if not fs.exists(s2folder):
            raise FileNotFoundError(f"Sentinel-2 file not found in {s2folder}")

    return s2folder

read_srf(satellite, srf_file=SRF_FILE_DEFAULT, cache=True)

Process the spectral response function file. If the file is not provided it downloads it from https://sentinel.esa.int/web/sentinel/user-guides/sentinel-2-msi/document-library/-/asset_publisher/Wk0TKajiISaR/content/sentinel-2a-spectral-responses

This function requires the fsspec package and pandas and openpyxl for reading excel files.

Parameters:

Name Type Description Default
satellite str

satellite name (S2A, S2B or S2C)

required
srf_file str

path to the srf file

SRF_FILE_DEFAULT
cache bool

if True, the srf is cached for future calls. Default True

True

Returns:

Type Description
DataFrame

pd.DataFrame: spectral response function for each of the bands of S2

Source code in georeader/readers/S2_SAFE_reader.py
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
def read_srf(
    satellite: str, srf_file: str = SRF_FILE_DEFAULT, cache: bool = True
) -> pd.DataFrame:
    """
    Process the spectral response function file. If the file is not provided
    it downloads it from https://sentinel.esa.int/web/sentinel/user-guides/sentinel-2-msi/document-library/-/asset_publisher/Wk0TKajiISaR/content/sentinel-2a-spectral-responses

    This function requires the fsspec package and pandas and openpyxl for reading excel files.

    Args:
        satellite (str): satellite name (S2A, S2B or S2C)
        srf_file (str): path to the srf file
        cache (bool): if True, the srf is cached for future calls. Default True

    Returns:
        pd.DataFrame: spectral response function for each of the bands of S2
    """
    assert satellite in ["S2A", "S2B", "S2C"], "satellite must be S2A or S2B"

    if cache:
        global SRF_S2
        if satellite in SRF_S2:
            return SRF_S2[satellite]

    if srf_file == SRF_FILE_DEFAULT:
        # home_dir = os.path.join(os.path.expanduser('~'),".georeader")
        home_dir = os.path.join(os.path.expanduser("~"), ".georeader")
        os.makedirs(home_dir, exist_ok=True)
        srf_filename = os.path.basename(srf_file)

        # Decode the url to get the filename. Also, replace spaces with underscores
        import urllib.parse

        srf_filename = urllib.parse.unquote(srf_filename).replace(" ", "_")

        srf_file_local = os.path.join(home_dir, srf_filename)
        if not os.path.exists(srf_file_local):
            import fsspec

            with fsspec.open(srf_file, "rb") as f:
                with open(srf_file_local, "wb") as f2:
                    f2.write(f.read())
        srf_file = srf_file_local

    srf_s2 = pd.read_excel(srf_file, sheet_name=f"Spectral Responses ({satellite})")

    srf_s2 = srf_s2.set_index("SR_WL")

    # remove rows with all values zero
    any_not_cero = np.any((srf_s2 > 1e-6).values, axis=1)
    srf_s2 = srf_s2.loc[any_not_cero]

    # remove the satellite name from the columns
    srf_s2.columns = [c.replace(f"{satellite}_SR_AV_", "") for c in srf_s2.columns]
    srf_s2.columns = normalize_band_names(srf_s2.columns)

    if cache:
        SRF_S2[satellite] = srf_s2

    return srf_s2

Proba-V Reader

The Proba-V reader enables access to Proba-V Level 2A and Level 3 products. It handles:

  • Reading TOA reflectance from HDF5 files
  • Mask handling for clouds, shadows, and invalid pixels
  • Extraction of metadata and acquisition parameters

Tutorial example:

API Reference

Proba-V reader

Unnoficial Proba-V reader. This reader is based in the Proba-V user manual: https://publications.vito.be/2017-1333-probav-products-user-manual.pdf

Author: Gonzalo Mateo-GarcΓ­a

ProbaV

Proba-V reader for handling Proba-V satellite products.

This class provides functionality to read and manipulate Proba-V satellite imagery products. It handles the specific format and metadata of Proba-V HDF5 files, supporting operations like loading radiometry data, masks, and cloud information.

Parameters:

Name Type Description Default
hdf5_file str

Path to the HDF5 file containing the Proba-V product.

required
window Optional[Window]

Optional window to focus on a specific region of the image. Defaults to None (entire image).

None
level_name str

Processing level of the product, either "LEVEL2A" or "LEVEL3". Defaults to "LEVEL3".

'LEVEL3'

Attributes:

Name Type Description
hdf5_file str

Path to the HDF5 file.

name str

Basename of the HDF5 file.

camera str

Camera ID (for LEVEL2A products).

res_name str

Resolution name identifier (e.g., '100M', '300M', '1KM').

version str

Product version.

toatoc str

Indicator of whether data is TOA (top of atmosphere) or TOC (top of canopy).

real_transform Affine

Affine transform for the full image.

real_shape Tuple[int, int]

Shape of the full image (height, width).

dtype_radiometry

Data type for radiometry data (typically np.float32).

dtype_sm

Data type for SM (status map) data.

metadata Dict[str, Any]

Dictionary with product metadata.

window_focus Window

Current window focus.

window_data Window

Window representing the full data extent.

start_date datetime

Start acquisition date and time.

end_date datetime

End acquisition date and time.

map_projection_wkt str

WKT representation of the map projection.

crs

Coordinate reference system.

level_name str

Processing level identifier.

Examples:

>>> import rasterio.windows
>>> # Initialize the ProbaV reader with a data path
>>> probav_reader = ProbaV('/path/to/probav_product.HDF5')
>>> # Load radiometry data
>>> bands = probav_reader.load_radiometry()
>>> # Get cloud mask
>>> cloud_mask = probav_reader.load_sm_cloud_mask()
>>> # Focus on a specific window
>>> window = rasterio.windows.Window(col_off=100, row_off=100, width=200, height=200)
>>> probav_reader.set_window(window)
Source code in georeader/readers/probav_image_operational.py
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
class ProbaV:
    """
    Proba-V reader for handling Proba-V satellite products.

    This class provides functionality to read and manipulate Proba-V satellite imagery products.
    It handles the specific format and metadata of Proba-V HDF5 files, supporting operations
    like loading radiometry data, masks, and cloud information.

    Args:
        hdf5_file (str): Path to the HDF5 file containing the Proba-V product.
        window (Optional[rasterio.windows.Window]): Optional window to focus on a specific
            region of the image. Defaults to None (entire image).
        level_name (str): Processing level of the product, either "LEVEL2A" or "LEVEL3".
            Defaults to "LEVEL3".

    Attributes:
        hdf5_file (str): Path to the HDF5 file.
        name (str): Basename of the HDF5 file.
        camera (str): Camera ID (for LEVEL2A products).
        res_name (str): Resolution name identifier (e.g., '100M', '300M', '1KM').
        version (str): Product version.
        toatoc (str): Indicator of whether data is TOA (top of atmosphere) or TOC (top of canopy).
        real_transform (rasterio.Affine): Affine transform for the full image.
        real_shape (Tuple[int, int]): Shape of the full image (height, width).
        dtype_radiometry: Data type for radiometry data (typically np.float32).
        dtype_sm: Data type for SM (status map) data.
        metadata (Dict[str, Any]): Dictionary with product metadata.
        window_focus (rasterio.windows.Window): Current window focus.
        window_data (rasterio.windows.Window): Window representing the full data extent.
        start_date (datetime): Start acquisition date and time.
        end_date (datetime): End acquisition date and time.
        map_projection_wkt (str): WKT representation of the map projection.
        crs: Coordinate reference system.
        level_name (str): Processing level identifier.

    Examples:
        >>> import rasterio.windows
        >>> # Initialize the ProbaV reader with a data path
        >>> probav_reader = ProbaV('/path/to/probav_product.HDF5')
        >>> # Load radiometry data
        >>> bands = probav_reader.load_radiometry()
        >>> # Get cloud mask
        >>> cloud_mask = probav_reader.load_sm_cloud_mask()
        >>> # Focus on a specific window
        >>> window = rasterio.windows.Window(col_off=100, row_off=100, width=200, height=200)
        >>> probav_reader.set_window(window)
    """

    def __init__(
        self,
        hdf5_file: str,
        window: Optional[rasterio.windows.Window] = None,
        level_name: str = "LEVEL3",
    ):
        self.hdf5_file = hdf5_file
        self.name = os.path.basename(self.hdf5_file)
        if level_name == "LEVEL2A":
            matches = re.match(
                r"PROBAV_L2A_\d{8}_\d{6}_(\d)_(\d..?M)_(V\d0\d)", self.name
            )
            if matches is not None:
                self.camera, self.res_name, self.version = matches.groups()
            self.toatoc = "TOA"
        elif level_name == "LEVEL3":
            matches = re.match(
                r"PROBAV_S1_(TO.)_.{6}_\d{8}_(\d..?M)_(V\d0\d)", self.name
            )
            if matches is not None:
                self.toatoc, self.res_name, self.version = matches.groups()
        else:
            raise NotImplementedError(f"Unknown level name {level_name}")

        try:
            with h5py.File(self.hdf5_file, "r") as input_f:
                # reference metadata: http://www.vito-eodata.be/PDF/image/PROBAV-Products_User_Manual.pdf
                valores_blue = (
                    input_f[f"{level_name}/RADIOMETRY/BLUE/{self.toatoc}"]
                    .attrs["MAPPING"][3:7]
                    .astype(np.float64)
                )
                self.real_transform = Affine(
                    a=valores_blue[2],
                    b=0,
                    c=valores_blue[0],
                    d=0,
                    e=-valores_blue[3],
                    f=valores_blue[1],
                )
                self.real_shape = input_f[
                    f"{level_name}/RADIOMETRY/BLUE/{self.toatoc}"
                ].shape
                # self.dtype_radiometry = input_f[f"{level_name}/RADIOMETRY/RED/{self.toatoc}"].dtype

                # Set to float because we're converting the image to TOA when reading (see read_radiometry function)
                self.dtype_radiometry = np.float32
                self.dtype_sm = input_f[f"{level_name}/QUALITY/SM"].dtype
                self.metadata = dict(input_f.attrs)
        except OSError as e:
            raise FileNotFoundError("Error opening file %s" % self.hdf5_file)

        if window is None:
            self.window_focus = rasterio.windows.Window(
                row_off=0,
                col_off=0,
                width=self.real_shape[1],
                height=self.real_shape[0],
            )
        else:
            self.window_focus = rasterio.windows.Window(
                row_off=0,
                col_off=0,
                width=self.real_shape[1],
                height=self.real_shape[0],
            )

        self.window_data = rasterio.windows.Window(
            row_off=0, col_off=0, width=self.real_shape[1], height=self.real_shape[0]
        )

        if "OBSERVATION_END_DATE" in self.metadata:
            self.end_date = datetime.strptime(
                " ".join(
                    self.metadata["OBSERVATION_END_DATE"].astype(str).tolist()
                    + self.metadata["OBSERVATION_END_TIME"].astype(str).tolist()
                ),
                "%Y-%m-%d %H:%M:%S",
            ).replace(tzinfo=timezone.utc)
            self.start_date = datetime.strptime(
                " ".join(
                    self.metadata["OBSERVATION_START_DATE"].astype(str).tolist()
                    + self.metadata["OBSERVATION_START_TIME"].astype(str).tolist()
                ),
                "%Y-%m-%d %H:%M:%S",
            ).replace(tzinfo=timezone.utc)
            self.map_projection_wkt = " ".join(
                self.metadata["MAP_PROJECTION_WKT"].astype(str).tolist()
            )

        # Proba-V images are lat/long
        self.crs = rasterio.crs.CRS({"init": "epsg:4326"})

        # Proba-V images have four bands
        self.level_name = level_name

    def _get_window_pad(
        self, boundless: bool = True
    ) -> Tuple[rasterio.windows.Window, Optional[List]]:
        window_read = rasterio.windows.intersection(self.window_focus, self.window_data)

        if boundless:
            _, pad_width = window_utils.get_slice_pad(
                self.window_data, self.window_focus
            )
            need_pad = any(p != 0 for p in pad_width["x"] + pad_width["y"])
            if need_pad:
                pad_list_np = []
                for k in ["y", "x"]:
                    if k in pad_width:
                        pad_list_np.append(pad_width[k])
                    else:
                        pad_list_np.append((0, 0))
            else:
                pad_list_np = None
        else:
            pad_list_np = None

        return window_read, pad_list_np

    def footprint(self, crs: Optional[str] = None) -> Polygon:
        # TODO load footprint from metadata?
        pol = window_utils.window_polygon(self.window_focus, self.transform)
        if (crs is None) or window_utils.compare_crs(self.crs, crs):
            return pol

        return window_utils.polygon_to_crs(pol, self.crs, crs)

    def valid_footprint(self, crs: Optional[str] = None) -> Polygon:
        valids = self.load_mask()
        return valids.valid_footprint(crs=crs)

    def _load_bands(
        self,
        bands_names: Union[List[str], str],
        boundless: bool = True,
        fill_value_default: Number = 0,
    ) -> geotensor.GeoTensor:
        window_read, pad_list_np = self._get_window_pad(boundless=boundless)
        slice_ = window_read.toslices()
        if isinstance(bands_names, str):
            bands_names = [bands_names]
            flatten = True
        else:
            flatten = False

        with h5py.File(self.hdf5_file, "r") as input_f:
            bands_arrs = []
            for band in bands_names:
                data = read_band_toa(input_f, band, slice_)
                if pad_list_np is not None:
                    data = np.pad(
                        data,
                        tuple(pad_list_np),
                        mode="constant",
                        constant_values=fill_value_default,
                    )

                bands_arrs.append(data)

        if boundless:
            transform = self.transform
        else:
            transform = rasterio.windows.transform(window_read, self.real_transform)

        if flatten:
            img = bands_arrs[0]
        else:
            img = np.stack(bands_arrs, axis=0)

        return geotensor.GeoTensor(
            img,
            transform=transform,
            crs=self.crs,
            fill_value_default=fill_value_default,
        )

    def save_bands(self, img: np.ndarray):
        """

        Args:
            img: (4, self.real_height, self.real_width, 4) tensor

        Returns:

        """
        assert (
            img.shape[0] == 4
        ), "Unexpected number of channels expected 4 found {}".format(img.shape)
        assert (
            img.shape[1:] == self.real_shape
        ), f"Unexpected shape expected {self.real_shape} found {img.shape[1:]}"

        # TODO save only window_focus?

        with h5py.File(self.hdf5_file, "r+") as input_f:
            for i, b in enumerate(BAND_NAMES):
                band_to_save = img[i]
                mask_band_2_save = np.ma.getmaskarray(img[i])
                band_to_save = np.clip(np.ma.filled(band_to_save, 0), 0, 2)
                band_name = f"{self.level_name}/RADIOMETRY/{b}/{self.toatoc}"
                attrs = input_f[band_name].attrs
                band_to_save *= attrs["SCALE"]
                band_to_save += attrs["OFFSET"]
                band_to_save = np.round(band_to_save).astype(np.int16)
                band_to_save[mask_band_2_save] = -1
                input_f[band_name][...] = band_to_save

    def load_radiometry(
        self, indexes: Optional[List[int]] = None, boundless: bool = True
    ) -> geotensor.GeoTensor:
        if indexes is None:
            indexes = (0, 1, 2, 3)
        bands_names = [
            f"{self.level_name}/RADIOMETRY/{BAND_NAMES[i]}/{self.toatoc}"
            for i in indexes
        ]
        return self._load_bands(
            bands_names, boundless=boundless, fill_value_default=-1 / 2000.0
        )

    def load_sm(self, boundless: bool = True) -> geotensor.GeoTensor:
        """
        Reference of values in `SM` flags.

        From [user manual](http://www.vito-eodata.be/PDF/image/PROBAV-Products_User_Manual.pdf) pag 67
        * Clear  ->    000
        * Shadow ->    001
        * Undefined -> 010
        * Cloud  ->    011
        * Ice    ->    100
        * `2**3` sea/land
        * `2**4` quality swir (0 bad 1 good)
        * `2**5` quality nir
        * `2**6` quality red
        * `2**7` quality blue
        * `2**8` coverage swir (0 no 1 yes)
        * `2**9` coverage nir
        * `2**10` coverage red
        * `2**11` coverage blue
        """
        return self._load_bands(
            f"{self.level_name}/QUALITY/SM", boundless=boundless, fill_value_default=0
        )

    def load_mask(self, boundless: bool = True) -> geotensor.GeoTensor:
        """
        Returns the valid mask (False if the pixel is out of swath or is invalid). This function loads the SM band

        Args:
            boundless (bool, optional): boundless option to load the SM band. Defaults to True.

        Returns:
            geotensor.GeoTensor: mask with the same shape as the image
        """
        valids = self.load_sm(boundless=boundless)
        valids.values = ~mask_only_sm(valids.values)
        valids.fill_value_default = False
        return valids

    def load_sm_cloud_mask(
        self, mask_undefined: bool = False, boundless: bool = True
    ) -> geotensor.GeoTensor:
        sm = self.load_sm(boundless=boundless)
        cloud_mask = sm_cloud_mask(sm.values, mask_undefined=mask_undefined)
        return geotensor.GeoTensor(
            cloud_mask, transform=self.transform, crs=self.crs, fill_value_default=0
        )

    def is_recompressed_and_chunked(self) -> bool:
        original_bands = [
            f"{self.level_name}/RADIOMETRY/{b}/{self.toatoc}" for b in BAND_NAMES
        ]
        original_bands.append(f"{self.level_name}/QUALITY/SM")
        with h5py.File(self.hdf5_file, "r") as input_:
            for b in original_bands:
                if input_[b].compression == "szip":
                    return False
                if (input_[b].chunks is None) or (input_[b].chunks[0] == 1):
                    return False
        return True

    def assert_can_be_read(self):
        original_bands = [
            f"{self.level_name}/RADIOMETRY/{b}/{self.toatoc}" for b in BAND_NAMES
        ] + [f"{self.level_name}/QUALITY/SM"]
        with h5py.File(self.hdf5_file, "a") as input_:
            for name in original_bands:
                assert is_compression_available(
                    input_[name]
                ), f"Band {name} cannot be read. Compression: {input_[name].compression}"

    def recompress_bands(
        self,
        chunks: Tuple[int, int] = (512, 512),
        replace: bool = True,
        compression_dest: str = "gzip",
    ):
        original_bands = {
            b: f"{self.level_name}/RADIOMETRY/{b}/{self.toatoc}" for b in BAND_NAMES
        }
        original_bands.update({"SM": f"{self.level_name}/QUALITY/SM"})
        copy_bands = {k: v + "_NEW" for (k, v) in original_bands.items()}
        with h5py.File(self.hdf5_file, "a") as input_:
            for b in original_bands.keys():
                assert_compression_available(input_[original_bands[b]])
                data = input_[original_bands[b]][:]
                if copy_bands[b] in input_:
                    del input_[copy_bands[b]]

                ds = input_.create_dataset(
                    copy_bands[b],
                    data=data,
                    chunks=chunks,
                    compression=compression_dest,
                )

                attrs_copy = input_[original_bands[b]].attrs
                for k, v in attrs_copy.items():
                    ds.attrs[k] = v

                if replace:
                    del input_[original_bands[b]]
                    input_[original_bands[b]] = input_[copy_bands[b]]
                    del input_[copy_bands[b]]

    @property
    def transform(self) -> Affine:
        return rasterio.windows.transform(self.window_focus, self.real_transform)

    @property
    def res(self) -> Tuple[float, float]:
        return window_utils.res(self.transform)

    @property
    def height(self) -> int:
        return self.window_focus.height

    @property
    def width(self) -> int:
        return self.window_focus.width

    @property
    def bounds(self) -> Tuple[float, float, float, float]:
        return window_utils.window_bounds(self.window_focus, self.real_transform)

    def set_window(
        self,
        window: rasterio.windows.Window,
        relative: bool = True,
        boundless: bool = True,
    ):
        if relative:
            self.window_focus = rasterio.windows.Window(
                col_off=window.col_off + self.window_focus.col_off,
                row_off=window.row_off + self.window_focus.row_off,
                height=window.height,
                width=window.width,
            )
        else:
            self.window_focus = window

        if not boundless:
            self.window_focus = rasterio.windows.intersection(
                self.window_data, self.window_focus
            )

    def __copy__(self) -> "__class__":
        return ProbaV(
            self.hdf5_file, window=self.window_focus, level_name=self.level_name
        )

    def read_from_window(
        self, window: Optional[rasterio.windows.Window] = None, boundless: bool = True
    ) -> "__class__":
        copy = self.__copy__()
        copy.set_window(window=window, boundless=boundless)

        return copy

    def __repr__(self) -> str:
        return f""" 
         File: {self.hdf5_file}
         Transform: {self.transform}
         Shape: {self.height}, {self.width}
         Resolution: {self.res}
         Bounds: {self.bounds}
         CRS: {self.crs}
         Level: {self.level_name}
         TOA/TOC: {self.toatoc}
         Resolution name : {self.res_name}
        """

load_mask(boundless=True)

Returns the valid mask (False if the pixel is out of swath or is invalid). This function loads the SM band

Parameters:

Name Type Description Default
boundless bool

boundless option to load the SM band. Defaults to True.

True

Returns:

Type Description
GeoTensor

geotensor.GeoTensor: mask with the same shape as the image

Source code in georeader/readers/probav_image_operational.py
357
358
359
360
361
362
363
364
365
366
367
368
369
370
def load_mask(self, boundless: bool = True) -> geotensor.GeoTensor:
    """
    Returns the valid mask (False if the pixel is out of swath or is invalid). This function loads the SM band

    Args:
        boundless (bool, optional): boundless option to load the SM band. Defaults to True.

    Returns:
        geotensor.GeoTensor: mask with the same shape as the image
    """
    valids = self.load_sm(boundless=boundless)
    valids.values = ~mask_only_sm(valids.values)
    valids.fill_value_default = False
    return valids

load_sm(boundless=True)

Reference of values in SM flags.

From user manual pag 67 * Clear -> 000 * Shadow -> 001 * Undefined -> 010 * Cloud -> 011 * Ice -> 100 * 2**3 sea/land * 2**4 quality swir (0 bad 1 good) * 2**5 quality nir * 2**6 quality red * 2**7 quality blue * 2**8 coverage swir (0 no 1 yes) * 2**9 coverage nir * 2**10 coverage red * 2**11 coverage blue

Source code in georeader/readers/probav_image_operational.py
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
def load_sm(self, boundless: bool = True) -> geotensor.GeoTensor:
    """
    Reference of values in `SM` flags.

    From [user manual](http://www.vito-eodata.be/PDF/image/PROBAV-Products_User_Manual.pdf) pag 67
    * Clear  ->    000
    * Shadow ->    001
    * Undefined -> 010
    * Cloud  ->    011
    * Ice    ->    100
    * `2**3` sea/land
    * `2**4` quality swir (0 bad 1 good)
    * `2**5` quality nir
    * `2**6` quality red
    * `2**7` quality blue
    * `2**8` coverage swir (0 no 1 yes)
    * `2**9` coverage nir
    * `2**10` coverage red
    * `2**11` coverage blue
    """
    return self._load_bands(
        f"{self.level_name}/QUALITY/SM", boundless=boundless, fill_value_default=0
    )

save_bands(img)

Parameters:

Name Type Description Default
img ndarray

(4, self.real_height, self.real_width, 4) tensor

required

Returns:

Source code in georeader/readers/probav_image_operational.py
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
def save_bands(self, img: np.ndarray):
    """

    Args:
        img: (4, self.real_height, self.real_width, 4) tensor

    Returns:

    """
    assert (
        img.shape[0] == 4
    ), "Unexpected number of channels expected 4 found {}".format(img.shape)
    assert (
        img.shape[1:] == self.real_shape
    ), f"Unexpected shape expected {self.real_shape} found {img.shape[1:]}"

    # TODO save only window_focus?

    with h5py.File(self.hdf5_file, "r+") as input_f:
        for i, b in enumerate(BAND_NAMES):
            band_to_save = img[i]
            mask_band_2_save = np.ma.getmaskarray(img[i])
            band_to_save = np.clip(np.ma.filled(band_to_save, 0), 0, 2)
            band_name = f"{self.level_name}/RADIOMETRY/{b}/{self.toatoc}"
            attrs = input_f[band_name].attrs
            band_to_save *= attrs["SCALE"]
            band_to_save += attrs["OFFSET"]
            band_to_save = np.round(band_to_save).astype(np.int16)
            band_to_save[mask_band_2_save] = -1
            input_f[band_name][...] = band_to_save

ProbaVRadiometry

Bases: ProbaV

A specialized ProbaV reader class focused on radiometry data.

This class extends the base ProbaV class to provide a simplified interface for working with radiometry bands from Proba-V products.

Parameters:

Name Type Description Default
hdf5_file str

Path to the HDF5 file containing the Proba-V product.

required
window Optional[Window]

Optional window to focus on a specific region of the image. Defaults to None (entire image).

None
level_name str

Processing level of the product. Defaults to "LEVEL2A".

'LEVEL2A'
indexes Optional[List[int]]

Optional list of band indices to load. If None, all four bands (0=BLUE, 1=RED, 2=NIR, 3=SWIR) will be loaded. Defaults to None.

None

Attributes:

Name Type Description
dims Tuple[str]

Names of the dimensions ("band", "y", "x").

indexes List[int]

List of band indices to load.

dtype

Data type of the radiometry data.

count int

Number of bands to be loaded.

shape Tuple[int, int, int]

Shape of the data (bands, height, width).

values ndarray

The radiometry data values.

Examples:

>>> # Initialize the ProbaVRadiometry reader with a data path
>>> probav_rad = ProbaVRadiometry('/path/to/probav_product.HDF5')
>>> # Load only RED and NIR bands
>>> probav_rad_rn = ProbaVRadiometry('/path/to/probav_product.HDF5', indexes=[1, 2])
>>> # Get the data as a GeoTensor
>>> geotensor_data = probav_rad.load()
Source code in georeader/readers/probav_image_operational.py
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
class ProbaVRadiometry(ProbaV):
    """
    A specialized ProbaV reader class focused on radiometry data.

    This class extends the base ProbaV class to provide a simplified interface
    for working with radiometry bands from Proba-V products.

    Args:
        hdf5_file (str): Path to the HDF5 file containing the Proba-V product.
        window (Optional[rasterio.windows.Window]): Optional window to focus on a specific
            region of the image. Defaults to None (entire image).
        level_name (str): Processing level of the product. Defaults to "LEVEL2A".
        indexes (Optional[List[int]]): Optional list of band indices to load. If None,
            all four bands (0=BLUE, 1=RED, 2=NIR, 3=SWIR) will be loaded. Defaults to None.

    Attributes:
        dims (Tuple[str]): Names of the dimensions ("band", "y", "x").
        indexes (List[int]): List of band indices to load.
        dtype: Data type of the radiometry data.
        count (int): Number of bands to be loaded.
        shape (Tuple[int, int, int]): Shape of the data (bands, height, width).
        values (np.ndarray): The radiometry data values.

    Examples:
        >>> # Initialize the ProbaVRadiometry reader with a data path
        >>> probav_rad = ProbaVRadiometry('/path/to/probav_product.HDF5')
        >>> # Load only RED and NIR bands
        >>> probav_rad_rn = ProbaVRadiometry('/path/to/probav_product.HDF5', indexes=[1, 2])
        >>> # Get the data as a GeoTensor
        >>> geotensor_data = probav_rad.load()
    """

    def __init__(
        self,
        hdf5_file: str,
        window: Optional[rasterio.windows.Window] = None,
        level_name: str = "LEVEL2A",
        indexes: Optional[List[int]] = None,
    ):
        super().__init__(hdf5_file=hdf5_file, window=window, level_name=level_name)
        self.dims = ("band", "y", "x")

        # let read only some bands?
        if indexes is None:
            self.indexes = [0, 1, 2, 3]
        else:
            self.indexes = indexes

        self.dtype = self.dtype_radiometry

    @property
    def count(self):
        return len(self.indexes)

    def load(self, boundless: bool = True) -> geotensor.GeoTensor:
        return self.load_radiometry(boundless=boundless, indexes=self.indexes)

    @property
    def shape(self) -> Tuple:
        return self.count, self.window_focus.height, self.window_focus.width

    @property
    def width(self) -> int:
        return self.window_focus.width

    @property
    def height(self) -> int:
        return self.window_focus.height

    @property
    def values(self) -> np.ndarray:
        return self.load_radiometry(boundless=True, indexes=self.indexes).values

    def __copy__(self) -> "__class__":
        return ProbaVRadiometry(
            self.hdf5_file,
            window=self.window_focus,
            level_name=self.level_name,
            indexes=self.indexes,
        )

ProbaVSM

Bases: ProbaV

A specialized ProbaV reader class focused on Status Map (SM) data.

This class extends the base ProbaV class to provide a simplified interface for working with the status map band from Proba-V products. The SM band contains information about the pixel quality, cloud status, etc.

Parameters:

Name Type Description Default
hdf5_file str

Path to the HDF5 file containing the Proba-V product.

required
window Optional[Window]

Optional window to focus on a specific region of the image. Defaults to None (entire image).

None
level_name str

Processing level of the product. Defaults to "LEVEL2A".

'LEVEL2A'

Attributes:

Name Type Description
dims Tuple[str]

Names of the dimensions ("y", "x").

dtype

Data type of the SM data.

shape Tuple[int, int]

Shape of the SM data (height, width).

values ndarray

The SM data values.

Examples:

>>> # Initialize the ProbaVSM reader with a data path
>>> probav_sm = ProbaVSM('/path/to/probav_product.HDF5')
>>> # Get the SM data as a GeoTensor
>>> sm_data = probav_sm.load()
>>> # Extract cloud information
>>> cloud_mask = sm_cloud_mask(sm_data.values)
Source code in georeader/readers/probav_image_operational.py
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
class ProbaVSM(ProbaV):
    """
    A specialized ProbaV reader class focused on Status Map (SM) data.

    This class extends the base ProbaV class to provide a simplified interface
    for working with the status map band from Proba-V products. The SM band
    contains information about the pixel quality, cloud status, etc.

    Args:
        hdf5_file (str): Path to the HDF5 file containing the Proba-V product.
        window (Optional[rasterio.windows.Window]): Optional window to focus on a specific
            region of the image. Defaults to None (entire image).
        level_name (str): Processing level of the product. Defaults to "LEVEL2A".

    Attributes:
        dims (Tuple[str]): Names of the dimensions ("y", "x").
        dtype: Data type of the SM data.
        shape (Tuple[int, int]): Shape of the SM data (height, width).
        values (np.ndarray): The SM data values.

    Examples:
        >>> # Initialize the ProbaVSM reader with a data path
        >>> probav_sm = ProbaVSM('/path/to/probav_product.HDF5')
        >>> # Get the SM data as a GeoTensor
        >>> sm_data = probav_sm.load()
        >>> # Extract cloud information
        >>> cloud_mask = sm_cloud_mask(sm_data.values)
    """

    def __init__(
        self,
        hdf5_file: str,
        window: Optional[rasterio.windows.Window] = None,
        level_name: str = "LEVEL2A",
    ):
        super().__init__(hdf5_file=hdf5_file, window=window, level_name=level_name)
        self.dims = ("y", "x")
        self.dtype = self.dtype_sm

    def load(self, boundless: bool = True) -> geotensor.GeoTensor:
        return self.load_sm(boundless=boundless)

    @property
    def shape(self) -> Tuple:
        return self.window_focus.height, self.window_focus.width

    @property
    def width(self) -> int:
        return self.window_focus.width

    @property
    def height(self) -> int:
        return self.window_focus.height

    @property
    def values(self) -> np.ndarray:
        return self.load_sm(boundless=True).values

    def __copy__(self) -> "__class__":
        return ProbaVSM(
            self.hdf5_file, window=self.window_focus, level_name=self.level_name
        )

SPOT-VGT Reader

The SPOT-VGT reader provides functionality for reading SPOT-VGT products. Features include:

  • HDF4 file format support
  • Handling of radiometry and quality layers
  • Cloud and shadow mask extraction

Note: See the Proba-V tutorial for similar processing workflows as both sensors share similar data structures.

API Reference

SPOT VGT reader

Unofficial reader for SPOT VGT products. The reader is based on the user manual: https://docs.terrascope.be/DataProducts/SPOT-VGT/references/SPOT_VGT_PUM_v1.3.pdf

Authors: Dan Lopez-Puigdollers, Gonzalo Mateo-GarcΓ­a

SpotVGT

SPOT-VGT reader for handling SPOT Vegetation satellite products.

This class provides functionality to read and manipulate SPOT-VGT satellite imagery products. It handles the specific format and metadata of SPOT-VGT HDF4 files, supporting operations like loading radiometry data, masks, and cloud information.

Parameters:

Name Type Description Default
hdf4_file str

Path to the HDF4 file or directory containing the SPOT-VGT product.

required
window Optional[Window]

Optional window to focus on a specific region of the image. Defaults to None (entire image).

None

Attributes:

Name Type Description
hdf4_file str

Path to the HDF4 file.

name str

Basename of the HDF4 file.

satelliteID str

Satellite ID extracted from the filename.

station str

Station code extracted from the filename.

productID str

Product ID extracted from the filename.

year, month, day (str

Date components extracted from the filename.

segment str

Segment identifier extracted from the filename.

version str

Product version extracted from the filename.

files List[str]

List of files in the SPOT-VGT product.

files_dict Dict[str, str]

Dictionary mapping band names to file paths.

metadata Dict[str, str]

Metadata extracted from the LOG file.

real_shape Tuple[int, int]

Shape of the full image (height, width).

real_transform Affine

Affine transform for the full image.

dtype_radiometry

Data type for radiometry data (typically np.float32).

window_focus Window

Current window focus.

window_data Window

Window representing the full data extent.

start_date datetime

Start acquisition date and time.

end_date datetime

End acquisition date and time.

crs

Coordinate reference system.

toatoc str

Indicator of whether data is TOA (top of atmosphere).

res_name str

Resolution name identifier (e.g., '1KM').

level_name str

Processing level identifier.

Examples:

>>> import rasterio.windows
>>> # Initialize the SpotVGT reader with a data path
>>> spot_reader = SpotVGT('/path/to/V2KRNP____20140321F146_V003')
>>> # Load radiometry data
>>> bands = spot_reader.load_radiometry()
>>> # Get cloud mask
>>> cloud_mask = spot_reader.load_sm_cloud_mask()
>>> # Focus on a specific window
>>> window = rasterio.windows.Window(col_off=100, row_off=100, width=200, height=200)
>>> spot_reader.set_window(window)
Source code in georeader/readers/spotvgt_image_operational.py
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
class SpotVGT:
    """
    SPOT-VGT reader for handling SPOT Vegetation satellite products.

    This class provides functionality to read and manipulate SPOT-VGT satellite imagery products.
    It handles the specific format and metadata of SPOT-VGT HDF4 files, supporting operations
    like loading radiometry data, masks, and cloud information.

    Args:
        hdf4_file (str): Path to the HDF4 file or directory containing the SPOT-VGT product.
        window (Optional[rasterio.windows.Window]): Optional window to focus on a specific 
            region of the image. Defaults to None (entire image).

    Attributes:
        hdf4_file (str): Path to the HDF4 file.
        name (str): Basename of the HDF4 file.
        satelliteID (str): Satellite ID extracted from the filename.
        station (str): Station code extracted from the filename.
        productID (str): Product ID extracted from the filename.
        year, month, day (str): Date components extracted from the filename.
        segment (str): Segment identifier extracted from the filename.
        version (str): Product version extracted from the filename.
        files (List[str]): List of files in the SPOT-VGT product.
        files_dict (Dict[str, str]): Dictionary mapping band names to file paths.
        metadata (Dict[str, str]): Metadata extracted from the LOG file.
        real_shape (Tuple[int, int]): Shape of the full image (height, width).
        real_transform (rasterio.Affine): Affine transform for the full image.
        dtype_radiometry: Data type for radiometry data (typically np.float32).
        window_focus (rasterio.windows.Window): Current window focus.
        window_data (rasterio.windows.Window): Window representing the full data extent.
        start_date (dt.datetime): Start acquisition date and time.
        end_date (dt.datetime): End acquisition date and time.
        crs: Coordinate reference system.
        toatoc (str): Indicator of whether data is TOA (top of atmosphere).
        res_name (str): Resolution name identifier (e.g., '1KM').
        level_name (str): Processing level identifier.

    Examples:
        >>> import rasterio.windows
        >>> # Initialize the SpotVGT reader with a data path
        >>> spot_reader = SpotVGT('/path/to/V2KRNP____20140321F146_V003')
        >>> # Load radiometry data
        >>> bands = spot_reader.load_radiometry()
        >>> # Get cloud mask
        >>> cloud_mask = spot_reader.load_sm_cloud_mask()
        >>> # Focus on a specific window
        >>> window = rasterio.windows.Window(col_off=100, row_off=100, width=200, height=200)
        >>> spot_reader.set_window(window)
    """
    def __init__(self, hdf4_file: str, window: Optional[rasterio.windows.Window] = None):
        self.hdf4_file = hdf4_file
        self.name = os.path.basename(self.hdf4_file)
        matches = re.match(r'V(\d{1})(\w{3})(\w{1})____(\d{4})(\d{2})(\d{2})F(\w{3})_V(\d{3})', self.name)
        if matches is not None:
            (self.satelliteID, self.station, self.productID, self.year,
             self.month, self.day, self.segment, self.version) = matches.groups()
        else:
            raise FileNotFoundError("SPOT-VGT product not recognized %s" % self.hdf4_file)

        try:
            self.files = sorted([f for f in glob(os.path.join(self.hdf4_file, '*'))])
            self.files_dict = {re.match(r'V\d{12}_(\w+)',
                                        os.path.basename(self.files[i])).groups()[0]: self.files[i]
                               for i in range(len(self.files))}

            with open(self.files_dict['LOG'], "r") as f:
                self.metadata = {re.split(r'\s+', y)[0]: re.split(r'\s+', y)[1] for y in [x for x in f]}

            self.real_shape = (
                int(self.metadata["IMAGE_LOWER_RIGHT_ROW"]) - int(self.metadata["IMAGE_UPPER_LEFT_ROW"]) - 1,
                int(self.metadata["IMAGE_LOWER_RIGHT_COL"]) - int(self.metadata["IMAGE_UPPER_LEFT_COL"]) - 1)

            bbox = [
                float(self.metadata['CARTO_LOWER_LEFT_X']),
                float(self.metadata['CARTO_LOWER_LEFT_Y']),
                float(self.metadata['CARTO_UPPER_RIGHT_X']),
                float(self.metadata['CARTO_UPPER_RIGHT_Y'])
            ]
            self.real_transform = rasterio.transform.from_bounds(*bbox, width=self.real_shape[1],
                                                                 height=self.real_shape[0])

            self.dtype_radiometry = np.float32

        except OSError as e:
            raise FileNotFoundError("Error reading product %s" % self.hdf4_file)

        if window is None:
            self.window_focus = rasterio.windows.Window(row_off=0, col_off=0,
                                                        width=self.real_shape[1],
                                                        height=self.real_shape[0])
        else:
            self.window_focus = rasterio.windows.Window(row_off=0, col_off=0,
                                                        width=self.real_shape[1],
                                                        height=self.real_shape[0])

        self.window_data = rasterio.windows.Window(row_off=0, col_off=0,
                                                   width=self.real_shape[1],
                                                   height=self.real_shape[0])

        year, month, day = re.match(r'(\d{4})(\d{2})(\d{2})', self.metadata['SEGM_FIRST_DATE']).groups()
        hh, mm, ss = re.match(r'(\d{2})(\d{2})(\d{2})', self.metadata['SEGM_FIRST_TIME']).groups()

        self.start_date = dt.datetime(day=int(day), month=int(month), year=int(year),
                                      hour=int(hh), minute=int(mm), second=int(ss), tzinfo=dt.timezone.utc)

        year, month, day = re.match(r'(\d{4})(\d{2})(\d{2})', self.metadata['SEGM_LAST_DATE']).groups()
        hh, mm, ss = re.match(r'(\d{2})(\d{2})(\d{2})', self.metadata['SEGM_LAST_TIME']).groups()

        self.end_date = dt.datetime(day=int(day), month=int(month), year=int(year),
                                    hour=int(hh), minute=int(mm), second=int(ss), tzinfo=dt.timezone.utc)

        # self.map_projection_wkt

        self.toatoc = "TOA"

        self.res_name = '1KM'

        # SPOT-VGT images are lat/long
        self.crs = rasterio.crs.CRS({'init': 'epsg:4326'})

        # SPOT-VGT images have four bands
        self.level_name = "LEVEL2A"

    def _get_window_pad(self, boundless: bool = True) -> Tuple[rasterio.windows.Window, Optional[List]]:
        window_read = rasterio.windows.intersection(self.window_focus, self.window_data)

        if boundless:
            _, pad_width = window_utils.get_slice_pad(self.window_data, self.window_focus)
            need_pad = any(p != 0 for p in pad_width["x"] + pad_width["y"])
            if need_pad:
                pad_list_np = []
                for k in ["y", "x"]:
                    if k in pad_width:
                        pad_list_np.append(pad_width[k])
                    else:
                        pad_list_np.append((0, 0))
            else:
                pad_list_np = None
        else:
            pad_list_np = None

        return window_read, pad_list_np

    def footprint(self, crs:Optional[str]=None) -> Polygon:
        # TODO load footprint from metadata?
        pol = window_utils.window_polygon(self.window_focus, self.transform)
        if (crs is None) or window_utils.compare_crs(self.crs, crs):
            return pol

        return window_utils.polygon_to_crs(pol, self.crs, crs)

    def valid_footprint(self, crs:Optional[str]=None) -> Polygon:
        valids = self.load_mask()
        return valids.valid_footprint(crs=crs)        

    def _load_bands(self, bands_names: Union[List[str], str], boundless: bool = True,
                    fill_value_default: Number = 0) -> geotensor.GeoTensor:
        window_read, pad_list_np = self._get_window_pad(boundless=boundless)
        slice_ = window_read.toslices()
        if isinstance(bands_names, str):
            bands_names = [bands_names]
            flatten = True
        else:
            flatten = False

        hdf_objs = {b: SD(self.files_dict[b], SDC.READ) for b in bands_names}
        # Read dataset
        # shapes = [hdf_objs[b].datasets()["PIXEL_DATA"][1] for b in bands_names]
        # data = [hdf_objs[b].select("PIXEL_DATA")[slice_] for b in bands_names]

        bands_arrs = []
        # Original slice int32 gives an error. Cast to int
        for band in bands_names:
            data = read_band_toa(hdf_objs, band, (slice(int(slice_[0].start), int(slice_[0].stop), None),
                                                  slice(int(slice_[1].start), int(slice_[1].stop), None)))
            if pad_list_np:
                data = np.pad(data, tuple(pad_list_np), mode="constant", constant_values=fill_value_default)

            bands_arrs.append(data)

        if boundless:
            transform = self.transform
        else:
            transform = rasterio.windows.transform(window_read, self.real_transform)

        if flatten:
            img = bands_arrs[0]
        else:
            img = np.stack(bands_arrs, axis=0)

        return geotensor.GeoTensor(img, transform=transform, crs=self.crs,
                                   fill_value_default=fill_value_default)

    def load_radiometry(self, indexes: Optional[List[int]] = None, boundless: bool = True) -> geotensor.GeoTensor:
        if indexes is None:
            indexes = (0, 1, 2, 3)
        # bands_names = [f"{self.level_name}/RADIOMETRY/{BAND_NAMES[i]}/{self.toatoc}" for i in indexes]
        bands_names = [BANDS_DICT[i] for i in indexes]
        return self._load_bands(bands_names, boundless=boundless, fill_value_default=0)

    def load_sm(self, boundless: bool = True) -> geotensor.GeoTensor:
        """
        Reference of values in `SM` flags.

        From [user manual](https://docs.terrascope.be/DataProducts/SPOT-VGT/references/SPOT_VGT_PUM_v1.3.pdf) pag 46
        * Clear  ->    000
        * Shadow ->    001
        * Undefined -> 010
        * Cloud  ->    011
        * Ice    ->    100
        * `2**3` sea/land
        * `2**4` quality swir (0 bad 1 good)
        * `2**5` quality nir
        * `2**6` quality red
        * `2**7` quality blue
        """
        return self._load_bands('SM', boundless=boundless, fill_value_default=0)

    def load_mask(self, boundless: bool = True) -> geotensor.GeoTensor:
        """
        Returns the valid mask (False if the pixel is out of swath or is invalid). This function loads the SM band

        Args:
            boundless (bool, optional): boundless option to load the SM band. Defaults to True.

        Returns:
            geotensor.GeoTensor: mask with the same shape as the image
        """

        sm = self.load_sm(boundless=boundless)
        valids = sm.copy()
        invalids = mask_only_sm(sm.values)
        valids.values = ~invalids
        valids.fill_value_default = False

        return valids

    def load_sm_cloud_mask(self, mask_undefined:bool=False, boundless:bool=True) -> geotensor.GeoTensor:
        sm = self.load_sm(boundless=boundless)
        cloud_mask = sm_cloud_mask(sm.values, mask_undefined=mask_undefined)
        cloud_mask+=1
        invalids = mask_only_sm(sm.values)

        cloud_mask[invalids] = 0
        return geotensor.GeoTensor(cloud_mask, transform=self.transform, crs=self.crs, fill_value_default=0)

    @property
    def transform(self) -> Affine:
        return rasterio.windows.transform(self.window_focus, self.real_transform)

    @property
    def res(self) -> Tuple[float, float]:
        return window_utils.res(self.transform)

    @property
    def height(self) -> int:
        return self.window_focus.height

    @property
    def width(self) -> int:
        return self.window_focus.width

    @property
    def bounds(self) -> Tuple[float, float, float, float]:
        return window_utils.window_bounds(self.window_focus, self.real_transform)

    def set_window(self, window:rasterio.windows.Window, relative: bool = True, boundless: bool = True):
        if relative:
            self.window_focus = rasterio.windows.Window(col_off=window.col_off + self.window_focus.col_off,
                                                        row_off=window.row_off + self.window_focus.row_off,
                                                        height=window.height, width=window.width)
        else:
            self.window_focus = window

        if not boundless:
            self.window_focus = rasterio.windows.intersection(self.window_data, self.window_focus)

    def __copy__(self) -> '__class__':
        return SpotVGT(self.hdf4_file, window=self.window_focus)

    def read_from_window(self, window: Optional[rasterio.windows.Window] = None, boundless: bool = True) -> '__class__':
        copy = self.__copy__()
        copy.set_window(window=window, boundless=boundless)

        return copy

    def __repr__(self) -> str:
        return f""" 
         File: {self.hdf4_file}
         Transform: {self.transform}
         Shape: {self.height}, {self.width}
         Resolution: {self.res}
         Bounds: {self.bounds}
         CRS: {self.crs}
         Level: {self.level_name}
         TOA/TOC: {self.toatoc}
         Resolution name : {self.res_name}
        """

load_mask(boundless=True)

Returns the valid mask (False if the pixel is out of swath or is invalid). This function loads the SM band

Parameters:

Name Type Description Default
boundless bool

boundless option to load the SM band. Defaults to True.

True

Returns:

Type Description
GeoTensor

geotensor.GeoTensor: mask with the same shape as the image

Source code in georeader/readers/spotvgt_image_operational.py
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
def load_mask(self, boundless: bool = True) -> geotensor.GeoTensor:
    """
    Returns the valid mask (False if the pixel is out of swath or is invalid). This function loads the SM band

    Args:
        boundless (bool, optional): boundless option to load the SM band. Defaults to True.

    Returns:
        geotensor.GeoTensor: mask with the same shape as the image
    """

    sm = self.load_sm(boundless=boundless)
    valids = sm.copy()
    invalids = mask_only_sm(sm.values)
    valids.values = ~invalids
    valids.fill_value_default = False

    return valids

load_sm(boundless=True)

Reference of values in SM flags.

From user manual pag 46 * Clear -> 000 * Shadow -> 001 * Undefined -> 010 * Cloud -> 011 * Ice -> 100 * 2**3 sea/land * 2**4 quality swir (0 bad 1 good) * 2**5 quality nir * 2**6 quality red * 2**7 quality blue

Source code in georeader/readers/spotvgt_image_operational.py
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
def load_sm(self, boundless: bool = True) -> geotensor.GeoTensor:
    """
    Reference of values in `SM` flags.

    From [user manual](https://docs.terrascope.be/DataProducts/SPOT-VGT/references/SPOT_VGT_PUM_v1.3.pdf) pag 46
    * Clear  ->    000
    * Shadow ->    001
    * Undefined -> 010
    * Cloud  ->    011
    * Ice    ->    100
    * `2**3` sea/land
    * `2**4` quality swir (0 bad 1 good)
    * `2**5` quality nir
    * `2**6` quality red
    * `2**7` quality blue
    """
    return self._load_bands('SM', boundless=boundless, fill_value_default=0)

PRISMA Reader

The PRISMA reader handles data from the Italian Space Agency's hyperspectral mission, specifically working with Level 1B radiance data (not atmospherically corrected). PRISMA provides hyperspectral imaging in the 400-2500 nm spectral range, with a spectral resolution of ~12 nm.

Key features:

  • Reading L1B hyperspectral radiance data from HDF5 format files
  • Handling separate VNIR (400-1000 nm) and SWIR (1000-2500 nm) spectral ranges
  • Georeferencing functionality for non-orthorectified data using provided latitude/longitude coordinates
  • On-demand conversion from radiance (mW/mΒ²/sr/nm) to top-of-atmosphere reflectance
  • Spectral response function integration for accurate band simulation
  • Extraction of RGB previews from specific wavelengths
  • Access to satellite and solar geometry information for radiometric calculations

Tutorial examples:

API Reference

Module to read PRISMA (PRecursore IperSpettrale della Missione Applicativa) hyperspectral images.

PRISMA is an Italian Space Agency (ASI) Earth observation satellite launched in 2019, carrying a hyperspectral imaging spectrometer that captures data in 239 spectral bands from 400 to 2500 nm with a 30m spatial resolution.

Data Format Overview

PRISMA data is distributed in HDF5 format (HE5 extension) with a specific structure:

PRISMA HDF5 File Structure:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  /HDFEOS/SWATHS/PRS_L1_HCO/                             β”‚
β”‚  β”œβ”€β”€ Data Fields/                                        β”‚
β”‚  β”‚   β”œβ”€β”€ VNIR_Cube: (bands, crosstrack, downtrack)      β”‚
β”‚  β”‚   β”‚   └── 400-1010 nm, ~66 bands                     β”‚
β”‚  β”‚   └── SWIR_Cube: (bands, crosstrack, downtrack)      β”‚
β”‚  β”‚       └── 920-2500 nm, ~173 bands                    β”‚
β”‚  β”œβ”€β”€ Geolocation Fields/                                 β”‚
β”‚  β”‚   β”œβ”€β”€ Latitude_SWIR, Longitude_SWIR                  β”‚
β”‚  β”‚   └── Latitude_VNIR, Longitude_VNIR                  β”‚
β”‚  └── Attributes (solar/view angles, timing, etc.)       β”‚
β”‚                                                          β”‚
β”‚  /KDP_AUX/                                               β”‚
β”‚  β”œβ”€β”€ Cw_Vnir_Matrix, Cw_Swir_Matrix (wavelengths)       β”‚
β”‚  └── Fwhm_Vnir_Matrix, Fwhm_Swir_Matrix                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Unlike EMIT, PRISMA data is NOT orthorectified. The geolocation arrays provide lat/lon coordinates for each pixel, requiring gridding for visualization.

Dual-Sensor Configuration

PRISMA uses two separate sensors for VNIR and SWIR:

VNIR Sensor                          SWIR Sensor
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 400 - 1010 nm      β”‚               β”‚ 920 - 2500 nm      β”‚
β”‚ ~66 bands          β”‚               β”‚ ~173 bands         β”‚
β”‚ ~10 nm sampling    β”‚               β”‚ ~10 nm sampling    β”‚
β”‚                    β”‚               β”‚                    β”‚
β”‚ Shared 30m GSD     β”‚               β”‚ Shared 30m GSD     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚                                    β”‚
          └──────────── Overlap β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     920-1010 nm

The VNIR and SWIR sensors have overlapping wavelength coverage in the 920-1010 nm region, which can be used for cross-calibration.

Radiometric Units

  • L1 Radiance: mW/(mΒ²Β·srΒ·nm) - milliwatts per square meter per steradian per nanometer (equivalent to W/(mΒ²Β·srΒ·ΞΌm))
  • Scale factors and offsets are applied during loading to convert from DN to radiance

Spectral Characteristics

  • Total bands: ~239 (66 VNIR + 173 SWIR, minus flagged bands)
  • Spectral sampling: ~10 nm (varies slightly)
  • FWHM: ~10-12 nm
  • SNR: >200 for VNIR, >100 for SWIR

Examples

Basic usage::

from georeader.readers.prisma import PRISMA

# Load PRISMA image
prisma = PRISMA('/path/to/PRS_L1_STD_*.he5')

# Load specific wavelengths as reflectance
bands = prisma.load_wavelengths([850, 1600, 2200], as_reflectance=True)

# Load RGB composite
rgb = prisma.load_rgb(as_reflectance=True)

# Get georeferenced output (reprojected to UTM)
rgb_geo = prisma.load_rgb(as_reflectance=True, raw=False)

See Also

georeader.readers.emit : EMIT hyperspectral reader georeader.readers.enmap : EnMAP hyperspectral reader georeader.griddata : Utilities for gridding non-orthorectified data

References

  • ASI PRISMA Mission: https://www.asi.it/en/earth-science/prisma/
  • PRISMA User Guide: https://prisma.asi.it/

PRISMA

Reader for PRISMA (PRecursore IperSpettrale della Missione Applicativa) hyperspectral images.

This class provides comprehensive functionality to read and manipulate PRISMA satellite imagery products from the Italian Space Agency (ASI). It handles the dual-sensor (VNIR + SWIR) data format, supporting operations like:

  • Loading radiance or reflectance data at specific wavelengths
  • Automatic handling of VNIR/SWIR sensor selection based on wavelength
  • Converting radiance to reflectance using solar irradiance
  • Georeferencing raw data to projected coordinate systems

PRISMA Data Model

PRISMA stores data in sensor coordinates with separate lat/lon arrays for geolocation. Unlike EMIT's GLT approach, PRISMA requires gridding/interpolation for orthorectification:

Sensor Grid (raw)                  Geographic Grid (output)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ pushbroom scan      β”‚            β”‚ regular grid        β”‚
β”‚ β”Œβ”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”  β”‚  gridding  β”‚ β”Œβ”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”  β”‚
β”‚ β”‚ a β”‚ b β”‚ c β”‚ d β”‚  β”‚  ───────→  β”‚ β”‚ a'β”‚ b'β”‚ c'β”‚ d'β”‚  β”‚
β”‚ β”œβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€  β”‚            β”‚ β”œβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€  β”‚
β”‚ β”‚ e β”‚ f β”‚ g β”‚ h β”‚  β”‚            β”‚ β”‚ e'β”‚ f'β”‚ g'β”‚ h'β”‚  β”‚
β”‚ β””β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”˜  β”‚            β”‚ β””β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”˜  β”‚
β”‚ + lat/lon per pixelβ”‚            β”‚ + affine transform  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Raw methods (raw=True) return sensor coordinates; georeferenced methods (raw=False) apply gridding to regular geographic coordinates.

Dual Sensor Architecture

PRISMA has separate VNIR and SWIR sensors with overlapping coverage:

Wavelength Range:
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
400nm              1000nm                                 2500nm
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€ VNIR ───────────
                  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ SWIR ────────────────────
                  └─ overlap β”€β”˜
                  920-1010nm

The class automatically selects the appropriate sensor based on requested wavelengths.

Attributes

filename : str Path to the PRISMA HE5 file. lats : np.ndarray Latitude values (H, W) for each pixel in sensor coordinates. lons : np.ndarray Longitude values (H, W) for each pixel in sensor coordinates. attributes_prisma : Dict Dictionary of PRISMA metadata attributes from HDF5 root. nbands_vnir : int Number of valid VNIR bands (excluding flagged bands). vnir_range : Tuple[float, float] Wavelength range (min, max) of VNIR sensor in nm. nbands_swir : int Number of valid SWIR bands (excluding flagged bands). swir_range : Tuple[float, float] Wavelength range (min, max) of SWIR sensor in nm. time_coverage_start : datetime UTC datetime of acquisition start. time_coverage_end : datetime UTC datetime of acquisition end. units : str Radiance units: 'mW/m2/sr/nm'. sza_swir : float Solar zenith angle (degrees) for SWIR sensor. sza_vnir : float Solar zenith angle (degrees) for VNIR sensor. vza_swir : float View zenith angle (degrees) for SWIR sensor. vza_vnir : float View zenith angle (degrees) for VNIR sensor.

Lazy-Loaded Attributes

ltoa_swir : np.ndarray SWIR radiance data (H, W, B), loaded by load_raw(swir_flag=True). ltoa_vnir : np.ndarray VNIR radiance data (H, W, B), loaded by load_raw(swir_flag=False). wavelength_swir : np.ndarray SWIR wavelengths (H, B) - varies slightly across track. wavelength_vnir : np.ndarray VNIR wavelengths (H, B) - varies slightly across track. fwhm_swir : np.ndarray SWIR FWHM values (H, B) - varies slightly across track. fwhm_vnir : np.ndarray VNIR FWHM values (H, B) - varies slightly across track.

Examples

Basic loading::

>>> from georeader.readers.prisma import PRISMA
>>> 
>>> prisma = PRISMA('/path/to/PRS_L1_STD_*.he5')
>>> print(prisma)  # View metadata summary
>>> print(f"VNIR: {prisma.vnir_range}, SWIR: {prisma.swir_range}")

Loading specific wavelengths::

>>> # Load NDVI bands (Red at 665nm, NIR at 865nm)
>>> bands = prisma.load_wavelengths([665, 865], as_reflectance=True)
>>> print(bands.shape)  # (2, H, W) in sensor coordinates
>>> 
>>> # Load and georeference to UTM
>>> bands_geo = prisma.load_wavelengths([665, 865], as_reflectance=True, 
...                                       raw=False, resolution_dst=30)
>>> print(type(bands_geo))  # GeoTensor with transform and CRS

Loading RGB composite::

>>> # Raw sensor coordinates
>>> rgb_raw = prisma.load_rgb(as_reflectance=True, raw=True)
>>> 
>>> # Georeferenced output  
>>> rgb_geo = prisma.load_rgb(as_reflectance=True, raw=False)
>>> plt.imshow(np.clip(rgb_geo.values.transpose(1,2,0), 0, 0.3) / 0.3)

Working with raw data::

>>> # Load all SWIR bands
>>> prisma.load_raw(swir_flag=True)
>>> print(prisma.ltoa_swir.shape)  # (H, W, ~173)
>>> print(prisma.wavelength_swir.shape)  # (H, ~173) - wavelengths vary across track
>>> 
>>> # Load all VNIR bands
>>> prisma.load_raw(swir_flag=False)
>>> print(prisma.ltoa_vnir.shape)  # (H, W, ~66)

See Also

georeader.readers.emit.EMITImage : EMIT hyperspectral reader georeader.readers.enmap.EnMAP : EnMAP hyperspectral reader georeader.griddata : Gridding utilities for non-orthorectified data georeader.reflectance : Radiometric conversion utilities

References

  • ASI PRISMA Mission: https://www.asi.it/en/earth-science/prisma/
  • PRISMA Data Products: https://prisma.asi.it/
Source code in georeader/readers/prisma.py
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
class PRISMA:
    """
    Reader for PRISMA (PRecursore IperSpettrale della Missione Applicativa) hyperspectral images.

    This class provides comprehensive functionality to read and manipulate PRISMA satellite 
    imagery products from the Italian Space Agency (ASI). It handles the dual-sensor
    (VNIR + SWIR) data format, supporting operations like:

    - Loading radiance or reflectance data at specific wavelengths
    - Automatic handling of VNIR/SWIR sensor selection based on wavelength
    - Converting radiance to reflectance using solar irradiance
    - Georeferencing raw data to projected coordinate systems

    PRISMA Data Model
    -----------------
    PRISMA stores data in sensor coordinates with separate lat/lon arrays for geolocation.
    Unlike EMIT's GLT approach, PRISMA requires gridding/interpolation for orthorectification:

        Sensor Grid (raw)                  Geographic Grid (output)
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚ pushbroom scan      β”‚            β”‚ regular grid        β”‚
        β”‚ β”Œβ”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”  β”‚  gridding  β”‚ β”Œβ”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”  β”‚
        β”‚ β”‚ a β”‚ b β”‚ c β”‚ d β”‚  β”‚  ───────→  β”‚ β”‚ a'β”‚ b'β”‚ c'β”‚ d'β”‚  β”‚
        β”‚ β”œβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€  β”‚            β”‚ β”œβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€  β”‚
        β”‚ β”‚ e β”‚ f β”‚ g β”‚ h β”‚  β”‚            β”‚ β”‚ e'β”‚ f'β”‚ g'β”‚ h'β”‚  β”‚
        β”‚ β””β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”˜  β”‚            β”‚ β””β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”˜  β”‚
        β”‚ + lat/lon per pixelβ”‚            β”‚ + affine transform  β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

    Raw methods (raw=True) return sensor coordinates; georeferenced methods
    (raw=False) apply gridding to regular geographic coordinates.

    Dual Sensor Architecture
    ------------------------
    PRISMA has separate VNIR and SWIR sensors with overlapping coverage:

        Wavelength Range:
        β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
        400nm              1000nm                                 2500nm
        β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€ VNIR ───────────
                          β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ SWIR ────────────────────
                          └─ overlap β”€β”˜
                          920-1010nm

    The class automatically selects the appropriate sensor based on requested wavelengths.

    Attributes
    ----------
    filename : str
        Path to the PRISMA HE5 file.
    lats : np.ndarray
        Latitude values (H, W) for each pixel in sensor coordinates.
    lons : np.ndarray
        Longitude values (H, W) for each pixel in sensor coordinates.
    attributes_prisma : Dict
        Dictionary of PRISMA metadata attributes from HDF5 root.
    nbands_vnir : int
        Number of valid VNIR bands (excluding flagged bands).
    vnir_range : Tuple[float, float]
        Wavelength range (min, max) of VNIR sensor in nm.
    nbands_swir : int
        Number of valid SWIR bands (excluding flagged bands).
    swir_range : Tuple[float, float]
        Wavelength range (min, max) of SWIR sensor in nm.
    time_coverage_start : datetime
        UTC datetime of acquisition start.
    time_coverage_end : datetime
        UTC datetime of acquisition end.
    units : str
        Radiance units: 'mW/m2/sr/nm'.
    sza_swir : float
        Solar zenith angle (degrees) for SWIR sensor.
    sza_vnir : float
        Solar zenith angle (degrees) for VNIR sensor.
    vza_swir : float
        View zenith angle (degrees) for SWIR sensor.
    vza_vnir : float
        View zenith angle (degrees) for VNIR sensor.

    Lazy-Loaded Attributes
    ----------------------
    ltoa_swir : np.ndarray
        SWIR radiance data (H, W, B), loaded by `load_raw(swir_flag=True)`.
    ltoa_vnir : np.ndarray
        VNIR radiance data (H, W, B), loaded by `load_raw(swir_flag=False)`.
    wavelength_swir : np.ndarray
        SWIR wavelengths (H, B) - varies slightly across track.
    wavelength_vnir : np.ndarray
        VNIR wavelengths (H, B) - varies slightly across track.
    fwhm_swir : np.ndarray
        SWIR FWHM values (H, B) - varies slightly across track.
    fwhm_vnir : np.ndarray
        VNIR FWHM values (H, B) - varies slightly across track.

    Examples
    --------
    Basic loading::

        >>> from georeader.readers.prisma import PRISMA
        >>> 
        >>> prisma = PRISMA('/path/to/PRS_L1_STD_*.he5')
        >>> print(prisma)  # View metadata summary
        >>> print(f"VNIR: {prisma.vnir_range}, SWIR: {prisma.swir_range}")

    Loading specific wavelengths::

        >>> # Load NDVI bands (Red at 665nm, NIR at 865nm)
        >>> bands = prisma.load_wavelengths([665, 865], as_reflectance=True)
        >>> print(bands.shape)  # (2, H, W) in sensor coordinates
        >>> 
        >>> # Load and georeference to UTM
        >>> bands_geo = prisma.load_wavelengths([665, 865], as_reflectance=True, 
        ...                                       raw=False, resolution_dst=30)
        >>> print(type(bands_geo))  # GeoTensor with transform and CRS

    Loading RGB composite::

        >>> # Raw sensor coordinates
        >>> rgb_raw = prisma.load_rgb(as_reflectance=True, raw=True)
        >>> 
        >>> # Georeferenced output  
        >>> rgb_geo = prisma.load_rgb(as_reflectance=True, raw=False)
        >>> plt.imshow(np.clip(rgb_geo.values.transpose(1,2,0), 0, 0.3) / 0.3)

    Working with raw data::

        >>> # Load all SWIR bands
        >>> prisma.load_raw(swir_flag=True)
        >>> print(prisma.ltoa_swir.shape)  # (H, W, ~173)
        >>> print(prisma.wavelength_swir.shape)  # (H, ~173) - wavelengths vary across track
        >>> 
        >>> # Load all VNIR bands
        >>> prisma.load_raw(swir_flag=False)
        >>> print(prisma.ltoa_vnir.shape)  # (H, W, ~66)

    See Also
    --------
    georeader.readers.emit.EMITImage : EMIT hyperspectral reader
    georeader.readers.enmap.EnMAP : EnMAP hyperspectral reader
    georeader.griddata : Gridding utilities for non-orthorectified data
    georeader.reflectance : Radiometric conversion utilities

    References
    ----------
    - ASI PRISMA Mission: https://www.asi.it/en/earth-science/prisma/
    - PRISMA Data Products: https://prisma.asi.it/
    """

    def __init__(self, filename: str) -> None:
        if not os.path.exists(filename):
            raise FileNotFoundError(f"File {filename} not found")
        self.filename = filename
        self.swir_cube_dat = SWIR_FLAG["swir_cube_dat"][True]
        self.vni_cube_dat = SWIR_FLAG["swir_cube_dat"][False]

        with h5py.File(filename, mode="r") as f:
            dset = f[HE5_COORDS["swir_lat"]]
            self.lats = np.flip(dset[:, :], axis=0)
            dset = f[HE5_COORDS["swir_lon"]]
            self.lons = np.flip(dset[:, :], axis=0)
            self.attributes_prisma = dict(f.attrs)
            sza = f.attrs["Sun_zenith_angle"]

        arr = self.attributes_prisma["List_Cw_Vnir"][
            self.attributes_prisma["List_Cw_Vnir"] > 0
        ]
        self.nbands_vnir = len(arr)
        self.vnir_range = arr.min(), arr.max()
        arr = self.attributes_prisma["List_Cw_Swir"][
            self.attributes_prisma["List_Cw_Swir"] > 0
        ]
        self.swir_range = arr.min(), arr.max()
        self.nbands_swir = len(arr)

        self.ltoa_swir: Optional[NDArray] = None
        self.ltoa_vnir: Optional[NDArray] = None
        self.wavelength_swir: Optional[NDArray] = None
        self.fwhm_swir: Optional[NDArray] = None
        self.wavelength_vnir: Optional[NDArray] = None
        self.fwhm_vnir: Optional[NDArray] = None
        self.vza_swir: float = 0
        self.vza_vnir: float = 0
        self.sza_swir: float = sza
        self.sza_vnir: float = sza

        # self.time_coverage_start = self.attributes_prisma['Product_StartTime']
        self.time_coverage_start = datetime.fromisoformat(
            self.attributes_prisma["Product_StartTime"].decode("utf-8")
        ).replace(tzinfo=timezone.utc)
        self.time_coverage_end = datetime.fromisoformat(
            self.attributes_prisma["Product_StopTime"].decode("utf-8")
        ).replace(tzinfo=timezone.utc)
        self.units = "mW/m2/sr/nm"  # same as W/m^2/SR/um

        self._footprint = griddata.footprint(self.lons, self.lats)
        self._observation_date_correction_factor: Optional[float] = None

    def footprint(self, crs: Optional[str] = None) -> GeoTensor:
        if (crs is None) or compare_crs("EPSG:4326", crs):
            return self._footprint

        return window_utils.polygon_to_crs(
            self._footprint, crs_polygon="EPSG:4326", crs_dst=crs
        )

    @property
    def observation_date_correction_factor(self) -> float:
        if self._observation_date_correction_factor is None:
            self._observation_date_correction_factor = (
                reflectance.observation_date_correction_factor(
                    date_of_acquisition=self.time_coverage_start,
                    center_coords=self.footprint("EPSG:4326").centroid.coords[0],
                )
            )
        return self._observation_date_correction_factor

    @property
    def bounds(self) -> Tuple[float, float, float, float]:
        return self._footprint.bounds

    def load_raw(self, swir_flag: bool) -> NDArray:
        """
        Load the all the data from all the wavelengths for the VNIR or SWIR range.
        This function caches the data, wavelegths and FWHM in the attributes of the class:
            * `ltoa_swir`, `wavelength_swir`, `fwhm_swir`, `vza_swir`, `sza_swir` if `swir_flag` is True
            * `ltoa_vnir`, `wavelength_vnir`, `fwhm_vnir`, `vza_vnir`, `sza_vnir` if `swir_flag` is False

        Args:
            swir_flag (bool): if True it will load the SWIR range, otherwise it will load the VNIR range

        Returns:
            NDArray: 3D array with the reflectance values (H, W, B)
                where N and M are the dimensions of the image and B is the number of bands.
        """

        if swir_flag:
            if all(
                x is not None
                for x in [
                    self.ltoa_swir,
                    self.wavelength_swir,
                    self.fwhm_swir,
                    self.vza_swir,
                    self.sza_swir,
                ]
            ):
                return self.ltoa_swir
        else:
            if all(
                x is not None
                for x in [
                    self.ltoa_vnir,
                    self.wavelength_vnir,
                    self.fwhm_vnir,
                    self.vza_vnir,
                    self.sza_vnir,
                ]
            ):
                return self.ltoa_vnir

        swir_cube_dat = SWIR_FLAG["swir_cube_dat"][swir_flag]
        swir_lab = SWIR_FLAG["swir_lab"][swir_flag]  # True: "Swir", False: "Vnir"

        with h5py.File(self.filename, "r") as f:
            dset = f[swir_cube_dat]

            ltoa_img = np.flip(np.transpose(dset[:, :, :], axes=[0, 2, 1]), axis=0)

            dset = f["/KDP_AUX/Cw_" + swir_lab + "_Matrix"]
            wvl_mat_ini = dset[:, :]

            dset = f["/KDP_AUX/Fwhm_" + swir_lab + "_Matrix"]
            fwhm_mat_ini = dset[:, :]

            wvl_cntr = f.attrs["List_Cw_" + swir_lab]
            wvl_flag = f.attrs["List_Cw_" + swir_lab + "_Flags"]

            sc_fac = f.attrs["ScaleFactor_" + swir_lab]

            of_fac = f.attrs["Offset_" + swir_lab]

            vza = 0.0
            sza = f.attrs["Sun_zenith_angle"]

            ltoa_img = ltoa_img / sc_fac - of_fac

        # Lambda
        wvl_mat_ini = np.flip(wvl_mat_ini, axis=1)
        li_no0 = np.where(wvl_mat_ini[100, :] > 0)[0]
        wvl_mat = np.copy(wvl_mat_ini[:, li_no0])
        wl_center_ini = np.mean(wvl_mat, axis=0)

        # FWHM
        fwhm_mat_ini = np.flip(fwhm_mat_ini, axis=1)
        fwhm_mat = np.copy(fwhm_mat_ini[:, li_no0])

        M, N, B_tot = ltoa_img.shape

        if swir_flag:
            if B_tot == len(wl_center_ini):
                ltoa_img = np.flip(ltoa_img, axis=2)
            else:
                # ltoa_img = np.flip(ltoa_img[:, :, :-2], axis=2)
                non0_bands = np.where(wvl_flag == 1)[0]
                ltoa_img = np.flip(ltoa_img[:, :, non0_bands], axis=2)

        else:
            if B_tot == len(wl_center_ini):
                ltoa_img = np.flip(ltoa_img, axis=2)
            else:
                # ltoa_img = np.flip(ltoa_img[:, :, 3:], axis=2)  # Revisar esto(not sure)
                non0_bands = np.where(wvl_flag == 1)[0]
                ltoa_img = np.flip(ltoa_img[:, :, non0_bands], axis=2)

        ltoa_img = np.transpose(ltoa_img, (1, 0, 2))
        if swir_flag:
            self.ltoa_swir = ltoa_img
            self.wavelength_swir = wvl_mat
            self.fwhm_swir = fwhm_mat
            self.vza_swir = vza
            self.sza_swir = sza
        else:
            self.ltoa_vnir = ltoa_img
            self.wavelength_vnir = wvl_mat
            self.fwhm_vnir = fwhm_mat
            self.vza_vnir = vza
            self.sza_vnir = sza

        return ltoa_img

    def load_wavelengths(
        self,
        wavelengths: Union[float, List[float], NDArray],
        as_reflectance: bool = True,
        raw: bool = True,
        resolution_dst=30,
        dst_crs: Optional[Any] = None,
        fill_value_default: float = -1,
    ) -> Union[GeoTensor, NDArray]:
        """
        Load the reflectance of the given wavelengths

        Args:
            wavelengths (Union[float, List[float], NDArray]): List of wavelengths to load
            as_reflectance (bool, optional): return the values as reflectance rather than radiance. Defaults to True.
                If False values will have units of W/m^2/SR/um (`self.units`)
            raw (bool, optional): if True it will return the raw values,
                if False it will return the values reprojected to the specified CRS and resolution. Defaults to True.
            resolution_dst (int, optional): if raw is False, it will reproject the values to this resolution. Defaults to 30.
            dst_crs (Optional[Any], optional): if None it will use the corresponding UTM zone.
            fill_value_default (float, optional): fill value. Defaults to -1.

        Returns:
            Union[GeoTensor, NDArray]: if raw is True it will return a NDArray with the values, otherwise it will return a GeoTensor
                with the reprojected values in its `.values` attribute.
        """

        if isinstance(wavelengths, Number):
            wavelengths = np.array([wavelengths])
        else:
            wavelengths = np.array(wavelengths)

        load_swir = any(
            [
                wvl >= self.swir_range[0] and wvl < self.swir_range[1]
                for wvl in wavelengths
            ]
        )
        load_vnir = any(
            [
                wvl >= self.vnir_range[0] and wvl < self.vnir_range[1]
                for wvl in wavelengths
            ]
        )
        if load_swir:
            self.load_raw(swir_flag=True)
            wavelength_swir_mean = np.mean(self.wavelength_swir, axis=0)
            fwhm_swir_mean = np.mean(self.fwhm_swir, axis=0)
        if load_vnir:
            self.load_raw(swir_flag=False)
            wavelength_vnir_mean = np.mean(self.wavelength_vnir, axis=0)
            fwhm_vnir_mean = np.mean(self.fwhm_vnir, axis=0)

        ltoa_img = []
        fwhm = []
        for b in range(len(wavelengths)):
            if (
                wavelengths[b] >= self.swir_range[0]
                and wavelengths[b] < self.swir_range[1]
            ):
                index_band = np.argmin(np.abs(wavelengths[b] - wavelength_swir_mean))
                fwhm.append(fwhm_swir_mean[index_band])
                img = self.ltoa_swir[..., index_band]
            else:
                index_band = np.argmin(np.abs(wavelengths[b] - wavelength_vnir_mean))
                fwhm.append(fwhm_vnir_mean[index_band])
                img = self.ltoa_vnir[..., index_band]

            ltoa_img.append(img)

        # Transpose to row major
        ltoa_img = np.transpose(np.stack(ltoa_img, axis=0), (0, 2, 1))

        if as_reflectance:
            thuiller = reflectance.load_thuillier_irradiance()
            response = reflectance.srf(wavelengths, fwhm, thuiller["Nanometer"].values)

            solar_irradiance_norm = thuiller["Radiance(mW/m2/nm)"].values.dot(
                response
            )  # mW/m$^2$/nm
            solar_irradiance_norm /= 1_000  # W/m$^2$/nm

            ltoa_img = reflectance.radiance_to_reflectance(
                ltoa_img,
                solar_irradiance_norm,
                units=self.units,
                observation_date_corr_factor=self.observation_date_correction_factor,
            )

        if raw:
            return ltoa_img

        return griddata.read_to_crs(
            np.transpose(ltoa_img, (1, 2, 0)),
            lons=self.lons,
            lats=self.lats,
            resolution_dst=resolution_dst,
            dst_crs=dst_crs,
            fill_value_default=fill_value_default,
        )

    def load_rgb(
        self, as_reflectance: bool = True, raw: bool = True
    ) -> Union[GeoTensor, NDArray]:
        return self.load_wavelengths(
            wavelengths=WAVELENGTHS_RGB, as_reflectance=as_reflectance, raw=raw
        )

    def __repr__(self) -> str:
        return f"""
        File: {self.filename}
        Bounds: {self.bounds}
        Time: {self.time_coverage_start}
        VNIR Range: {self.vnir_range} {self.nbands_vnir} bands
        SWIR Range: {self.swir_range} {self.nbands_swir} bands
        """

load_raw(swir_flag)

Load the all the data from all the wavelengths for the VNIR or SWIR range. This function caches the data, wavelegths and FWHM in the attributes of the class: * ltoa_swir, wavelength_swir, fwhm_swir, vza_swir, sza_swir if swir_flag is True * ltoa_vnir, wavelength_vnir, fwhm_vnir, vza_vnir, sza_vnir if swir_flag is False

Parameters:

Name Type Description Default
swir_flag bool

if True it will load the SWIR range, otherwise it will load the VNIR range

required

Returns:

Name Type Description
NDArray NDArray

3D array with the reflectance values (H, W, B) where N and M are the dimensions of the image and B is the number of bands.

Source code in georeader/readers/prisma.py
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
def load_raw(self, swir_flag: bool) -> NDArray:
    """
    Load the all the data from all the wavelengths for the VNIR or SWIR range.
    This function caches the data, wavelegths and FWHM in the attributes of the class:
        * `ltoa_swir`, `wavelength_swir`, `fwhm_swir`, `vza_swir`, `sza_swir` if `swir_flag` is True
        * `ltoa_vnir`, `wavelength_vnir`, `fwhm_vnir`, `vza_vnir`, `sza_vnir` if `swir_flag` is False

    Args:
        swir_flag (bool): if True it will load the SWIR range, otherwise it will load the VNIR range

    Returns:
        NDArray: 3D array with the reflectance values (H, W, B)
            where N and M are the dimensions of the image and B is the number of bands.
    """

    if swir_flag:
        if all(
            x is not None
            for x in [
                self.ltoa_swir,
                self.wavelength_swir,
                self.fwhm_swir,
                self.vza_swir,
                self.sza_swir,
            ]
        ):
            return self.ltoa_swir
    else:
        if all(
            x is not None
            for x in [
                self.ltoa_vnir,
                self.wavelength_vnir,
                self.fwhm_vnir,
                self.vza_vnir,
                self.sza_vnir,
            ]
        ):
            return self.ltoa_vnir

    swir_cube_dat = SWIR_FLAG["swir_cube_dat"][swir_flag]
    swir_lab = SWIR_FLAG["swir_lab"][swir_flag]  # True: "Swir", False: "Vnir"

    with h5py.File(self.filename, "r") as f:
        dset = f[swir_cube_dat]

        ltoa_img = np.flip(np.transpose(dset[:, :, :], axes=[0, 2, 1]), axis=0)

        dset = f["/KDP_AUX/Cw_" + swir_lab + "_Matrix"]
        wvl_mat_ini = dset[:, :]

        dset = f["/KDP_AUX/Fwhm_" + swir_lab + "_Matrix"]
        fwhm_mat_ini = dset[:, :]

        wvl_cntr = f.attrs["List_Cw_" + swir_lab]
        wvl_flag = f.attrs["List_Cw_" + swir_lab + "_Flags"]

        sc_fac = f.attrs["ScaleFactor_" + swir_lab]

        of_fac = f.attrs["Offset_" + swir_lab]

        vza = 0.0
        sza = f.attrs["Sun_zenith_angle"]

        ltoa_img = ltoa_img / sc_fac - of_fac

    # Lambda
    wvl_mat_ini = np.flip(wvl_mat_ini, axis=1)
    li_no0 = np.where(wvl_mat_ini[100, :] > 0)[0]
    wvl_mat = np.copy(wvl_mat_ini[:, li_no0])
    wl_center_ini = np.mean(wvl_mat, axis=0)

    # FWHM
    fwhm_mat_ini = np.flip(fwhm_mat_ini, axis=1)
    fwhm_mat = np.copy(fwhm_mat_ini[:, li_no0])

    M, N, B_tot = ltoa_img.shape

    if swir_flag:
        if B_tot == len(wl_center_ini):
            ltoa_img = np.flip(ltoa_img, axis=2)
        else:
            # ltoa_img = np.flip(ltoa_img[:, :, :-2], axis=2)
            non0_bands = np.where(wvl_flag == 1)[0]
            ltoa_img = np.flip(ltoa_img[:, :, non0_bands], axis=2)

    else:
        if B_tot == len(wl_center_ini):
            ltoa_img = np.flip(ltoa_img, axis=2)
        else:
            # ltoa_img = np.flip(ltoa_img[:, :, 3:], axis=2)  # Revisar esto(not sure)
            non0_bands = np.where(wvl_flag == 1)[0]
            ltoa_img = np.flip(ltoa_img[:, :, non0_bands], axis=2)

    ltoa_img = np.transpose(ltoa_img, (1, 0, 2))
    if swir_flag:
        self.ltoa_swir = ltoa_img
        self.wavelength_swir = wvl_mat
        self.fwhm_swir = fwhm_mat
        self.vza_swir = vza
        self.sza_swir = sza
    else:
        self.ltoa_vnir = ltoa_img
        self.wavelength_vnir = wvl_mat
        self.fwhm_vnir = fwhm_mat
        self.vza_vnir = vza
        self.sza_vnir = sza

    return ltoa_img

load_wavelengths(wavelengths, as_reflectance=True, raw=True, resolution_dst=30, dst_crs=None, fill_value_default=-1)

Load the reflectance of the given wavelengths

Parameters:

Name Type Description Default
wavelengths Union[float, List[float], NDArray]

List of wavelengths to load

required
as_reflectance bool

return the values as reflectance rather than radiance. Defaults to True. If False values will have units of W/m^2/SR/um (self.units)

True
raw bool

if True it will return the raw values, if False it will return the values reprojected to the specified CRS and resolution. Defaults to True.

True
resolution_dst int

if raw is False, it will reproject the values to this resolution. Defaults to 30.

30
dst_crs Optional[Any]

if None it will use the corresponding UTM zone.

None
fill_value_default float

fill value. Defaults to -1.

-1

Returns:

Type Description
Union[GeoTensor, NDArray]

Union[GeoTensor, NDArray]: if raw is True it will return a NDArray with the values, otherwise it will return a GeoTensor with the reprojected values in its .values attribute.

Source code in georeader/readers/prisma.py
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
def load_wavelengths(
    self,
    wavelengths: Union[float, List[float], NDArray],
    as_reflectance: bool = True,
    raw: bool = True,
    resolution_dst=30,
    dst_crs: Optional[Any] = None,
    fill_value_default: float = -1,
) -> Union[GeoTensor, NDArray]:
    """
    Load the reflectance of the given wavelengths

    Args:
        wavelengths (Union[float, List[float], NDArray]): List of wavelengths to load
        as_reflectance (bool, optional): return the values as reflectance rather than radiance. Defaults to True.
            If False values will have units of W/m^2/SR/um (`self.units`)
        raw (bool, optional): if True it will return the raw values,
            if False it will return the values reprojected to the specified CRS and resolution. Defaults to True.
        resolution_dst (int, optional): if raw is False, it will reproject the values to this resolution. Defaults to 30.
        dst_crs (Optional[Any], optional): if None it will use the corresponding UTM zone.
        fill_value_default (float, optional): fill value. Defaults to -1.

    Returns:
        Union[GeoTensor, NDArray]: if raw is True it will return a NDArray with the values, otherwise it will return a GeoTensor
            with the reprojected values in its `.values` attribute.
    """

    if isinstance(wavelengths, Number):
        wavelengths = np.array([wavelengths])
    else:
        wavelengths = np.array(wavelengths)

    load_swir = any(
        [
            wvl >= self.swir_range[0] and wvl < self.swir_range[1]
            for wvl in wavelengths
        ]
    )
    load_vnir = any(
        [
            wvl >= self.vnir_range[0] and wvl < self.vnir_range[1]
            for wvl in wavelengths
        ]
    )
    if load_swir:
        self.load_raw(swir_flag=True)
        wavelength_swir_mean = np.mean(self.wavelength_swir, axis=0)
        fwhm_swir_mean = np.mean(self.fwhm_swir, axis=0)
    if load_vnir:
        self.load_raw(swir_flag=False)
        wavelength_vnir_mean = np.mean(self.wavelength_vnir, axis=0)
        fwhm_vnir_mean = np.mean(self.fwhm_vnir, axis=0)

    ltoa_img = []
    fwhm = []
    for b in range(len(wavelengths)):
        if (
            wavelengths[b] >= self.swir_range[0]
            and wavelengths[b] < self.swir_range[1]
        ):
            index_band = np.argmin(np.abs(wavelengths[b] - wavelength_swir_mean))
            fwhm.append(fwhm_swir_mean[index_band])
            img = self.ltoa_swir[..., index_band]
        else:
            index_band = np.argmin(np.abs(wavelengths[b] - wavelength_vnir_mean))
            fwhm.append(fwhm_vnir_mean[index_band])
            img = self.ltoa_vnir[..., index_band]

        ltoa_img.append(img)

    # Transpose to row major
    ltoa_img = np.transpose(np.stack(ltoa_img, axis=0), (0, 2, 1))

    if as_reflectance:
        thuiller = reflectance.load_thuillier_irradiance()
        response = reflectance.srf(wavelengths, fwhm, thuiller["Nanometer"].values)

        solar_irradiance_norm = thuiller["Radiance(mW/m2/nm)"].values.dot(
            response
        )  # mW/m$^2$/nm
        solar_irradiance_norm /= 1_000  # W/m$^2$/nm

        ltoa_img = reflectance.radiance_to_reflectance(
            ltoa_img,
            solar_irradiance_norm,
            units=self.units,
            observation_date_corr_factor=self.observation_date_correction_factor,
        )

    if raw:
        return ltoa_img

    return griddata.read_to_crs(
        np.transpose(ltoa_img, (1, 2, 0)),
        lons=self.lons,
        lats=self.lats,
        resolution_dst=resolution_dst,
        dst_crs=dst_crs,
        fill_value_default=fill_value_default,
    )

EMIT Reader

The EMIT (Earth Surface Mineral Dust Source Investigation) reader provides access to NASA's imaging spectrometer data from the International Space Station. This reader works with Level 1B calibrated radiance data (not atmospherically corrected).

Key features:

  • Reading L1B hyperspectral radiance data from NetCDF4 format files
  • Working with the 380-2500 nm spectral range with 7.4 nm sampling
  • Irregular grid georeferencing through GLT (Geographic Lookup Table)
  • Support for the observation geometry information (solar and viewing angles)
  • Integration with L2A mask products for cloud and shadow detection
  • Quality-aware analysis with cloud, cirrus, and spacecraft flag masks
  • Conversion from radiance (ΞΌW/cmΒ²/sr/nm) to top-of-atmosphere reflectance
  • Support for downloading data from NASA DAAC portals
  • Automatic detection and use of appropriate UTM projection

Tutorial example:

API Reference

Module to read EMIT (Earth Surface Mineral Dust Source Investigation) hyperspectral images.

EMIT is a NASA imaging spectrometer aboard the International Space Station that measures reflected solar radiation from Earth's surface in 285 spectral bands from 380 to 2500 nm. This module provides tools to read, georeference, and process EMIT L1B radiance data.

Data Format Overview

EMIT data is distributed in NetCDF format with a unique storage layout:

Raw Data Structure (NetCDF file):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  radiance: (downtrack, crosstrack, bands)  β”‚
β”‚  └── Shape: (~1280, ~1242, 285)            β”‚
β”‚                                             β”‚
β”‚  location/glt_x: (rows, cols)              β”‚
β”‚  location/glt_y: (rows, cols)              β”‚
β”‚  └── Geographic Lookup Table (GLT)         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The raw data is stored in sensor coordinates (pushbroom scan lines), NOT in geographic coordinates. The GLT provides a mapping from geographic (orthorectified) coordinates back to raw sensor coordinates.

GLT Orthorectification Process

The GLT (Geographic Lookup Table) is key to understanding EMIT data:

Geographic Grid (Output)          Sensor Grid (Raw Data)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ (0,0)               β”‚           β”‚ radiance array      β”‚
β”‚   β”Œβ”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”     β”‚   GLT     β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚ a β”‚ b β”‚ c β”‚     β”‚ ──────→   β”‚ β”‚ (5,2) (5,3)   β”‚   β”‚
β”‚   β”œβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€     β”‚ lookup    β”‚ β”‚ (6,1) (6,2)   β”‚   β”‚
β”‚   β”‚ d β”‚ e β”‚ f β”‚     β”‚           β”‚ β”‚ ...           β”‚   β”‚
β”‚   β””β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”˜     β”‚           β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚               (H,W) β”‚           β”‚                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

For pixel (row=1, col=2) in geographic grid:
    glt_x[1,2] = 5  β†’  raw_col = 5
    glt_y[1,2] = 2  β†’  raw_row = 2
    value = radiance[2, 5, :]  (all bands)

GLT values of 0 indicate invalid/no-data pixels

This approach allows: 1. Efficient storage (no wasted pixels from orthorectification padding) 2. Preservation of original radiometric values (no resampling) 3. Flexible reprojection to any target CRS

Radiometric Units

  • L1B Radiance: ΞΌW/(cmΒ²Β·srΒ·nm) - microwatts per square centimeter per steradian per nanometer
  • FWHM: Full Width at Half Maximum of spectral response in nm
  • Wavelengths: Center wavelengths in nm (380-2500 nm range)

Key Classes and Functions

  • EMITImage: Main class for reading and processing EMIT data
  • download_product: Download EMIT products from NASA Earthdata
  • get_radiance_link, get_obs_link: Generate download URLs

Requirements

Requires xarray: pip install xarray

Authentication for downloads requires NASA Earthdata credentials stored in: ~/.georeader/auth_emit.json with format: {"user": "...", "password": "..."}

Examples

Basic usage::

from georeader.readers.emit import EMITImage, download_product

# Download and open EMIT image
link = 'https://data.lpdaac.earthdatacloud.nasa.gov/...'
filepath = download_product(link)
emit = EMITImage(filepath)

# Reproject to UTM (recommended for analysis)
emit_utm = emit.to_crs("UTM")

# Load as reflectance (applies solar irradiance correction)
reflectance = emit_utm.load(as_reflectance=True)

# Load RGB composite
rgb = emit_utm.load_rgb(as_reflectance=True)

# Get cloud mask
cloud_mask = emit.validmask()

References

  • NASA EMIT Mission: https://earth.jpl.nasa.gov/emit/
  • EMIT Data Resources: https://github.com/nasa/EMIT-Data-Resources
  • EMIT Utils: https://github.com/emit-sds/emit-utils/
  • LP DAAC Data Access: https://lpdaac.usgs.gov/products/emitl1bradv001/

EMITImage

Reader for EMIT L1B (Earth Surface Mineral Dust Source Investigation) hyperspectral images.

This class provides comprehensive functionality to read and manipulate EMIT satellite imagery products from NASA's imaging spectrometer aboard the ISS. It handles the unique GLT-based (Geographic Lookup Table) storage format, supporting operations like:

  • Loading radiometry data with automatic orthorectification
  • Converting radiance to reflectance using solar irradiance
  • Accessing cloud and quality masks
  • Extracting viewing and solar geometry angles
  • Reprojecting to different coordinate reference systems

EMIT Data Model

EMIT stores data in sensor coordinates, not geographic coordinates. The GLT provides a lookup table mapping geographic pixels to sensor pixels:

GLT Orthorectification:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    Geographic Grid         β”‚      β”‚   Sensor Grid (raw)      β”‚
β”‚  (orthorectified space)    β”‚      β”‚  (pushbroom scan)        β”‚
β”‚  β”Œβ”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”        β”‚      β”‚  β”Œβ”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”      β”‚
β”‚  β”‚ Β· β”‚ a β”‚ b β”‚ Β· β”‚        β”‚  GLT β”‚  β”‚ e β”‚ a β”‚ b β”‚ Β· β”‚      β”‚
β”‚  β”œβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€        β”‚  ──→ β”‚  β”œβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€      β”‚
β”‚  β”‚ c β”‚ d β”‚ e β”‚ f β”‚        β”‚      β”‚  β”‚ f β”‚ c β”‚ d β”‚ Β· β”‚      β”‚
β”‚  β””β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”˜        β”‚      β”‚  β””β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”˜      β”‚
β”‚  (pixels with data)        β”‚      β”‚  (original acquistion)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Β· = no data (GLT value = 0)

For geographic pixel (row, col):
    raw_x = glt_x[row, col]  
    raw_y = glt_y[row, col]
    value = radiance[raw_y, raw_x, :]

This approach preserves original radiometric values without interpolation artifacts.

Spectral Characteristics

  • Wavelength range: 380-2500 nm (VNIR + SWIR)
  • Number of bands: 285
  • Spectral sampling: ~7.4 nm
  • Spatial resolution: 60m at nadir

Attributes

filename : str Path to the EMIT NetCDF file. nc_ds : xr.Dataset xarray Dataset handle for the main radiance file. glt : GeoTensor Geographic Lookup Table as a GeoTensor with shape (2, H, W). - glt.values[0]: x-indices into raw radiance (1-based) - glt.values[1]: y-indices into raw radiance (1-based) valid_glt : np.ndarray Boolean mask (H, W) indicating valid GLT entries (data coverage). glt_relative : GeoTensor GLT with indices relative to the data window (0-based). window_raw : rasterio.windows.Window Window defining the subset of raw data to read (optimizes I/O). real_transform : rasterio.Affine Affine transform for the orthorectified (geographic) grid. time_coverage_start : datetime UTC datetime of acquisition start. time_coverage_end : datetime UTC datetime of acquisition end. wavelengths : np.ndarray Center wavelengths (nm) for selected bands. fwhm : np.ndarray Full Width at Half Maximum (nm) for selected bands. band_selection : Union[int, Tuple[int, ...], slice] Current band subset selection. units : str Radiance units from file metadata (typically 'uW/(cm^2 sr nm)'). fill_value_default : float No-data value for radiance data. dims : Tuple[str] Dimension names ("band", "y", "x"). dtype : np.dtype Data type of radiance values.

Lazy-Loaded Properties

nc_ds_obs : xr.Dataset Observation data (viewing/solar angles, path length, elevation). Auto-downloaded from NASA Earthdata if not present locally. nc_ds_l2amask : xr.Dataset
L2A quality mask data (clouds, cirrus, water, aggregate flags). Auto-downloaded from NASA Earthdata if not present locally. mean_sza : float Mean solar zenith angle (degrees) across the scene. mean_vza : float Mean view zenith angle (degrees) across the scene. observation_date_correction_factor : float Earth-Sun distance correction factor for the acquisition date.

Examples

Basic loading and reprojection::

>>> from georeader.readers.emit import EMITImage, download_product
>>> 
>>> # Download from NASA Earthdata
>>> link = 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/...'
>>> filepath = download_product(link)
>>> 
>>> # Open and reproject to UTM
>>> emit = EMITImage(filepath)
>>> emit_utm = emit.to_crs("UTM", resolution_dst_crs=60)
>>> 
>>> # Load as reflectance
>>> refl = emit_utm.load(as_reflectance=True)
>>> print(refl.shape)  # (285, H, W)

Working with specific wavelengths::

>>> # Select RGB-like bands (640, 550, 460 nm)
>>> emit.set_band_selection([35, 23, 11])
>>> print(emit.wavelengths)  # [641.2, 553.1, 462.3]
>>> rgb = emit.load(as_reflectance=True)
>>> 
>>> # Or use the convenience method
>>> rgb = emit.load_rgb(as_reflectance=True)

Accessing masks and quality data::

>>> # Get valid (cloud-free) mask
>>> valid_mask = emit.validmask()
>>> print(f"Clear pixels: {emit.percentage_clear:.1f}%")
>>> 
>>> # Get specific mask layers
>>> cloud_mask = emit.mask("Cloud flag")
>>> water_mask = emit.water_mask()

Working with viewing geometry::

>>> # Get solar zenith angle
>>> sza = emit.sza()  # GeoTensor with SZA values
>>> 
>>> # Get mean angles for quick reference
>>> print(f"Mean SZA: {emit.mean_sza:.1f}Β°")
>>> print(f"Mean VZA: {emit.mean_vza:.1f}Β°")

Spatial subsetting::

>>> import rasterio.windows
>>> 
>>> # Read a spatial window
>>> window = rasterio.windows.Window(col_off=100, row_off=200, width=500, height=500)
>>> emit_subset = emit.read_from_window(window)
>>> data = emit_subset.load()

See Also

georeader.readers.prisma.PRISMA : PRISMA hyperspectral reader georeader.readers.enmap.EnMAP : EnMAP hyperspectral reader georeader.reflectance : Radiometric conversion utilities

References

  • EMIT L1B Product Guide: https://lpdaac.usgs.gov/products/emitl1bradv001/
  • EMIT Data Resources: https://github.com/nasa/EMIT-Data-Resources
  • EMIT Algorithms: Green et al. (2020) doi:10.1029/2020JD033451
Source code in georeader/readers/emit.py
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
class EMITImage:
    """
    Reader for EMIT L1B (Earth Surface Mineral Dust Source Investigation) hyperspectral images.

    This class provides comprehensive functionality to read and manipulate EMIT satellite 
    imagery products from NASA's imaging spectrometer aboard the ISS. It handles the 
    unique GLT-based (Geographic Lookup Table) storage format, supporting operations like:

    - Loading radiometry data with automatic orthorectification
    - Converting radiance to reflectance using solar irradiance
    - Accessing cloud and quality masks
    - Extracting viewing and solar geometry angles
    - Reprojecting to different coordinate reference systems

    EMIT Data Model
    ---------------
    EMIT stores data in sensor coordinates, not geographic coordinates. The GLT provides
    a lookup table mapping geographic pixels to sensor pixels:

        GLT Orthorectification:
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚    Geographic Grid         β”‚      β”‚   Sensor Grid (raw)      β”‚
        β”‚  (orthorectified space)    β”‚      β”‚  (pushbroom scan)        β”‚
        β”‚  β”Œβ”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”        β”‚      β”‚  β”Œβ”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”      β”‚
        β”‚  β”‚ Β· β”‚ a β”‚ b β”‚ Β· β”‚        β”‚  GLT β”‚  β”‚ e β”‚ a β”‚ b β”‚ Β· β”‚      β”‚
        β”‚  β”œβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€        β”‚  ──→ β”‚  β”œβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€      β”‚
        β”‚  β”‚ c β”‚ d β”‚ e β”‚ f β”‚        β”‚      β”‚  β”‚ f β”‚ c β”‚ d β”‚ Β· β”‚      β”‚
        β”‚  β””β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”˜        β”‚      β”‚  β””β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”˜      β”‚
        β”‚  (pixels with data)        β”‚      β”‚  (original acquistion)   β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

        Β· = no data (GLT value = 0)

        For geographic pixel (row, col):
            raw_x = glt_x[row, col]  
            raw_y = glt_y[row, col]
            value = radiance[raw_y, raw_x, :]

    This approach preserves original radiometric values without interpolation artifacts.

    Spectral Characteristics
    ------------------------
    - Wavelength range: 380-2500 nm (VNIR + SWIR)
    - Number of bands: 285
    - Spectral sampling: ~7.4 nm
    - Spatial resolution: 60m at nadir

    Attributes
    ----------
    filename : str
        Path to the EMIT NetCDF file.
    nc_ds : xr.Dataset
        xarray Dataset handle for the main radiance file.
    glt : GeoTensor
        Geographic Lookup Table as a GeoTensor with shape (2, H, W).
        - glt.values[0]: x-indices into raw radiance (1-based)
        - glt.values[1]: y-indices into raw radiance (1-based)
    valid_glt : np.ndarray
        Boolean mask (H, W) indicating valid GLT entries (data coverage).
    glt_relative : GeoTensor
        GLT with indices relative to the data window (0-based).
    window_raw : rasterio.windows.Window
        Window defining the subset of raw data to read (optimizes I/O).
    real_transform : rasterio.Affine
        Affine transform for the orthorectified (geographic) grid.
    time_coverage_start : datetime
        UTC datetime of acquisition start.
    time_coverage_end : datetime
        UTC datetime of acquisition end.
    wavelengths : np.ndarray
        Center wavelengths (nm) for selected bands.
    fwhm : np.ndarray
        Full Width at Half Maximum (nm) for selected bands.
    band_selection : Union[int, Tuple[int, ...], slice]
        Current band subset selection.
    units : str
        Radiance units from file metadata (typically 'uW/(cm^2 sr nm)').
    fill_value_default : float
        No-data value for radiance data.
    dims : Tuple[str]
        Dimension names ("band", "y", "x").
    dtype : np.dtype
        Data type of radiance values.

    Lazy-Loaded Properties
    ----------------------
    nc_ds_obs : xr.Dataset
        Observation data (viewing/solar angles, path length, elevation).
        Auto-downloaded from NASA Earthdata if not present locally.
    nc_ds_l2amask : xr.Dataset  
        L2A quality mask data (clouds, cirrus, water, aggregate flags).
        Auto-downloaded from NASA Earthdata if not present locally.
    mean_sza : float
        Mean solar zenith angle (degrees) across the scene.
    mean_vza : float
        Mean view zenith angle (degrees) across the scene.
    observation_date_correction_factor : float
        Earth-Sun distance correction factor for the acquisition date.

    Examples
    --------
    Basic loading and reprojection::

        >>> from georeader.readers.emit import EMITImage, download_product
        >>> 
        >>> # Download from NASA Earthdata
        >>> link = 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/...'
        >>> filepath = download_product(link)
        >>> 
        >>> # Open and reproject to UTM
        >>> emit = EMITImage(filepath)
        >>> emit_utm = emit.to_crs("UTM", resolution_dst_crs=60)
        >>> 
        >>> # Load as reflectance
        >>> refl = emit_utm.load(as_reflectance=True)
        >>> print(refl.shape)  # (285, H, W)

    Working with specific wavelengths::

        >>> # Select RGB-like bands (640, 550, 460 nm)
        >>> emit.set_band_selection([35, 23, 11])
        >>> print(emit.wavelengths)  # [641.2, 553.1, 462.3]
        >>> rgb = emit.load(as_reflectance=True)
        >>> 
        >>> # Or use the convenience method
        >>> rgb = emit.load_rgb(as_reflectance=True)

    Accessing masks and quality data::

        >>> # Get valid (cloud-free) mask
        >>> valid_mask = emit.validmask()
        >>> print(f"Clear pixels: {emit.percentage_clear:.1f}%")
        >>> 
        >>> # Get specific mask layers
        >>> cloud_mask = emit.mask("Cloud flag")
        >>> water_mask = emit.water_mask()

    Working with viewing geometry::

        >>> # Get solar zenith angle
        >>> sza = emit.sza()  # GeoTensor with SZA values
        >>> 
        >>> # Get mean angles for quick reference
        >>> print(f"Mean SZA: {emit.mean_sza:.1f}Β°")
        >>> print(f"Mean VZA: {emit.mean_vza:.1f}Β°")

    Spatial subsetting::

        >>> import rasterio.windows
        >>> 
        >>> # Read a spatial window
        >>> window = rasterio.windows.Window(col_off=100, row_off=200, width=500, height=500)
        >>> emit_subset = emit.read_from_window(window)
        >>> data = emit_subset.load()

    See Also
    --------
    georeader.readers.prisma.PRISMA : PRISMA hyperspectral reader
    georeader.readers.enmap.EnMAP : EnMAP hyperspectral reader
    georeader.reflectance : Radiometric conversion utilities

    References
    ----------
    - EMIT L1B Product Guide: https://lpdaac.usgs.gov/products/emitl1bradv001/
    - EMIT Data Resources: https://github.com/nasa/EMIT-Data-Resources
    - EMIT Algorithms: Green et al. (2020) doi:10.1029/2020JD033451
    """
    attributes_set_if_exists = ["_nc_ds_obs", "_mean_sza", "_mean_vza",
                                "_observation_bands", "_nc_ds_l2amask", "_mask_bands",
                                "obs_file", "l2amaskfile",
                                # Option B: opt-in radiance cache. ``_cache`` is a
                                # mutable dict shared by reference across all clones
                                # built from the same parent β€” that's what makes the
                                # cache visible end-to-end. ``cache_radiance`` is the
                                # opt-in flag (rebind-on-clone is fine; we don't toggle
                                # per-clone).
                                "_cache", "cache_radiance"]

    # Key under which the full-spectrum windowed radiance is stored in ``_cache``.
    _CACHE_KEY_RADIANCE = "radiance_window"

    def __init__(self, filename:str, glt:Optional[GeoTensor]=None,
                 band_selection:Optional[Union[int, Tuple[int, ...],slice]]=slice(None),
                 cache_radiance:bool=False,
                 reuse_handles_from:Optional['EMITImage']=None):
        if not HAS_XARRAY:
            raise ImportError("xarray is required to read EMIT images. Please install it with: pip install xarray")

        self.filename = filename
        if reuse_handles_from is not None:
            if reuse_handles_from.filename != self.filename:
                raise ValueError("reuse_handles_from must reference the same EMIT file")
            # Clone constructor path: reuse parent handles to avoid opening
            # throwaway datasets that would immediately be overwritten.
            self.nc_ds = reuse_handles_from.nc_ds
        else:
            self.nc_ds = safe_open_netcdf(self.filename, cache=False, load=False)
        self._nc_ds_obs = None
        self._nc_ds_l2amask = None
        self._observation_bands = None
        self._mask_bands = None
        self._sensor_band_params = None
        # Opt-in radiance cache. Default off β€” the dict is created either way so the
        # ``_cache is parent._cache`` invariant holds for clones even when caching
        # is disabled.
        self.cache_radiance:bool = cache_radiance
        self._cache:Dict[str, Any] = {}
        # self.real_shape = (self.nc_ds['radiance'].shape[-1],) + self.nc_ds['radiance'].shape[:-1]

        self._mean_sza = None
        self._mean_vza = None
        self.obs_file:Optional[str] = None
        self.l2amaskfile:Optional[str] = None

        geotransform = self.nc_ds.attrs['geotransform']
        self.real_transform = rasterio.Affine(geotransform[1], geotransform[2], geotransform[0],
                                              geotransform[4], geotransform[5], geotransform[3])

        self.time_coverage_start = datetime.strptime(self.nc_ds.attrs['time_coverage_start'], "%Y-%m-%dT%H:%M:%S%z")
        self.time_coverage_end = datetime.strptime(self.nc_ds.attrs['time_coverage_end'], "%Y-%m-%dT%H:%M:%S%z")

        self.dtype = self.nc_ds['radiance'].dtype
        self.dims = ("band", "y", "x")
        self.fill_value_default = self.nc_ds['radiance'].attrs.get('_FillValue', -9999)
        self.nodata = self.fill_value_default
        self.units = self.nc_ds["radiance"].attrs.get('units', '')

        if glt is None:
            # Open the location group to access glt_x and glt_y
            location_ds = safe_open_netcdf(self.filename, cache=False, load=False, group='location')
            glt_x = np.nan_to_num(location_ds['glt_x'].values, nan=0).astype(np.int32)
            glt_y = np.nan_to_num(location_ds['glt_y'].values, nan=0).astype(np.int32)
            location_ds.close()

            glt_arr = np.zeros((2,) + glt_x.shape, dtype=np.int32)
            glt_arr[0] = glt_x
            glt_arr[1] = glt_y
            # glt_arr -= 1 # account for 1-based indexing

            # https://rasterio.readthedocs.io/en/stable/api/rasterio.crs.html
            self.glt = GeoTensor(glt_arr, transform=self.real_transform, 
                                 crs=rasterio.crs.CRS.from_wkt(self.nc_ds.attrs['spatial_ref']),
                                 fill_value_default=0)
        else:
            self.glt = glt

        self.valid_glt = np.all(self.glt.values != self.glt.fill_value_default, axis=0)
        xmin, ymin, xmax, ymax = self._bounds_indexes_raw() # values are 1-based!

        # glt has the absolute indexes of the netCDF object
        # glt_relative has the relative indexes
        self.glt_relative = self.glt.copy()
        self.glt_relative.values[0, self.valid_glt] -= xmin
        self.glt_relative.values[1, self.valid_glt] -= ymin

        self.window_raw = rasterio.windows.Window(col_off=xmin-1, row_off=ymin-1, 
                                                  width=xmax-xmin+1, height=ymax-ymin+1)

        # Load sensor_band_parameters from its group, unless we're cloning from
        # an existing instance and can reuse the already-open handle.
        if reuse_handles_from is not None:
            self._sensor_band_params = reuse_handles_from._sensor_band_params
            self.bandname_dimension = reuse_handles_from.bandname_dimension
        else:
            self._sensor_band_params = safe_open_netcdf(self.filename, cache=False, load=False, group='sensor_band_parameters')
            if "wavelengths" in self._sensor_band_params:
                self.bandname_dimension = "wavelengths"
            elif "radiance_wl" in self._sensor_band_params:
                self.bandname_dimension = "radiance_wl"
            else:
                raise ValueError(f"wavelengths or radiance_wl not found in sensor_band_parameters")

        self.band_selection = band_selection
        self.wavelengths = self._sensor_band_params[self.bandname_dimension].values[self.band_selection]
        self.fwhm = self._sensor_band_params['fwhm'].values[self.band_selection]
        self._observation_date_correction_factor:Optional[float] = None

    @property
    def observation_date_correction_factor(self) -> float:
        if self._observation_date_correction_factor is None:
            self._observation_date_correction_factor = reflectance.observation_date_correction_factor(date_of_acquisition=self.time_coverage_start,
                                                                                                      center_coords=self.footprint("EPSG:4326").centroid.coords[0])
        return self._observation_date_correction_factor

    @property
    def crs(self) -> Any:
        return self.glt.crs

    @property
    def shape(self) -> Tuple:
        try:
            n_bands = len(self.wavelengths)
            return  (n_bands,) + self.glt.shape[1:]
        except Exception:
            return self.glt.shape

    @property
    def width(self) -> int:
        return self.shape[-1]

    @property
    def height(self) -> int:
        return self.shape[-2]

    @property
    def transform(self) -> rasterio.Affine:
        return self.glt.transform

    @property
    def res(self) -> Tuple[float, float]:
        return self.glt.res

    @property
    def bounds(self) -> Tuple[float, float, float, float]:
        return self.glt.bounds

    def footprint(self, crs:Optional[str]=None) -> Polygon:
        """
        Get the footprint of the image in the given CRS. If no CRS is given, the footprint is returned in the native CRS.
        This function takes into account the valid_glt mask to compute the footprint.

        Args:
            crs (Optional[str], optional): The CRS to return the footprint in. Defaults to None. 
                If None, the footprint is returned in the native CRS.

        Returns:
            Polygon: The footprint of the image in the given CRS.
        """
        if not hasattr(self, '_pol'):
            from georeader.vectorize import get_polygons
            pols = get_polygons(self.valid_glt, transform=self.transform)
            self._pol = unary_union(pols)
        if crs is not None:
            pol_crs = window_utils.polygon_to_crs(self._pol, self.crs, crs)
        else:
            pol_crs = self._pol

        pol_glt = self.glt.footprint(crs=crs)

        return pol_crs.intersection(pol_glt)

    def set_band_selection(self, band_selection:Optional[Union[int, Tuple[int, ...],slice]]=None):
        """
        Set the band selection. Band selection is absolute w.r.t self.nc_ds['radiance']

        Args:
            band_selection (Optional[Union[int, Tuple[int, ...],slice]], optional): slicing or selection of the bands. Defaults to None.

        Example:
            >>> emit_image.set_band_selection(slice(0, 3)) # will only load the three first bands
            >>> emit_image.wavelengths # will only return the wavelengths of the three first bands
            >>> emit_image.load() # will only load the three first bands
        """
        if band_selection is None:
            band_selection = slice(None)
        self.band_selection = band_selection
        self.wavelengths = self._sensor_band_params[self.bandname_dimension].values[self.band_selection]
        self.fwhm = self._sensor_band_params['fwhm'].values[self.band_selection]

    @ property
    def nc_ds_obs(self, obs_file:Optional[str]=None):
        """
        Loads the observation file. In this file we have information about angles (solar and viewing),
        elevation and ilumination based on elevation and path length.

        This function downloads the observation file if it does not exist from the JPL portal.

        It caches the observation file in the object. (self.nc_ds_obs)

        Args:
            obs_file (Optional[str], optional): Path to the observation file. 
                Defaults to None. If none it will download the observation file 
                from the EMIT server.
        """
        if self._nc_ds_obs is not None:
            return self._nc_ds_obs

        if obs_file is None:
            link_obs_file = get_obs_link(self.filename)
            obs_file = os.path.join(os.path.dirname(self.filename), os.path.basename(link_obs_file))
            if not os.path.exists(obs_file):
                download_product(link_obs_file, obs_file)

        self.obs_file = obs_file
        self._nc_ds_obs = safe_open_netcdf(obs_file, cache=False, load=False)
        # Load observation_bands from sensor_band_parameters group
        sensor_params = safe_open_netcdf(obs_file, cache=False, load=False, group='sensor_band_parameters')
        self._observation_bands = sensor_params['observation_bands'].values
        sensor_params.close()
        return self._nc_ds_obs

    @property
    def nc_ds_l2amask(self, l2amaskfile:Optional[str]=None) -> xr.Dataset:
        """
        Loads the L2A mask file. In this file we have information about the cloud mask.

        This function downloads the L2A mask file if it does not exist from the JPL portal.

        It caches the L2A mask file in the object. (self.nc_ds_l2amask)

        See https://lpdaac.usgs.gov/products/emitl2arflv001/ for info about the L2A mask file.

        Args:
            l2amaskfile (Optional[str], optional): Path to the L2A mask file. 
                Defaults to None. If none it will download the L2A mask file 
                from the EMIT server.
        """
        if self._nc_ds_l2amask is not None:
            return self._nc_ds_l2amask

        if l2amaskfile is None:
            link_l2amaskfile = get_l2amask_link(self.filename)
            l2amaskfile = os.path.join(os.path.dirname(self.filename), os.path.basename(link_l2amaskfile))
            if not os.path.exists(l2amaskfile):
                download_product(link_l2amaskfile, l2amaskfile)

        self.l2amaskfile = l2amaskfile
        self._nc_ds_l2amask = safe_open_netcdf(l2amaskfile, cache=False, load=False)
        # Load mask_bands from sensor_band_parameters group
        sensor_params = safe_open_netcdf(l2amaskfile, cache=False, load=False, 
                                         group='sensor_band_parameters')
        self._mask_bands = sensor_params["mask_bands"].values
        sensor_params.close()
        return self._nc_ds_l2amask

    @property
    def mask_bands(self) -> np.array:
        """ Returns the mask bands -> ['Cloud flag', 'Cirrus flag', 'Water flag', 'Spacecraft Flag',
       'Dilated Cloud Flag', 'AOD550', 'H2O (g cm-2)', 'Aggregate Flag'] """
        self.nc_ds_l2amask
        return self._mask_bands

    def validmask(self, with_buffer:bool=True) -> GeoTensor:
        """
        Return the validmask mask


        Returns:
            GeoTensor: bool mask. True means that the pixel is valid.
        """

        validmask = ~self.invalid_mask_raw(with_buffer=with_buffer)

        return self.georreference(validmask,
                                  fill_value_default=False)

    def invalid_mask_raw(self, with_buffer:bool=True) -> NDArray:
        """
        Returns the non georreferenced quality mask. True means that the pixel is not valid.

        This mask is computed as the sum of the Cloud flag, Cirrus flag, Spacecraft flag and Dilated Cloud Flag.
        True means that the pixel is not valid.

        From: https://github.com/nasa/EMIT-Data-Resources/blob/main/python/how-tos/How_to_use_EMIT_Quality_data.ipynb
        and https://github.com/nasa/EMIT-Data-Resources/blob/main/python/modules/emit_tools.py#L277


        """
        band_index =  [0,1,3]
        if with_buffer:
            band_index.append(4)

        slice_y, slice_x = self.window_raw.toslices()
        mask_arr = self.nc_ds_l2amask['mask'].values[slice_y, slice_x, band_index]
        mask_arr = np.sum(mask_arr, axis=-1)
        mask_arr = (mask_arr >= 1)
        return mask_arr

    @property
    def percentage_clear(self) -> float:
        """
        Return the percentage of clear pixels in the image

        Returns:
            float: percentage of clear pixels
        """

        invalids = self.invalid_mask_raw(with_buffer=False)
        return 100 * (1 - np.sum(invalids) / np.prod(invalids.shape))


    def mask(self, mask_name:str="cloud_mask") -> GeoTensor:
        """
        Return the mask layer with the given name.
        Mask shall be one of self.mask_bands -> ['Cloud flag', 'Cirrus flag', 'Water flag', 'Spacecraft Flag',
       'Dilated Cloud Flag', 'AOD550', 'H2O (g cm-2)', 'Aggregate Flag']

        Args:
            mask_name (str, optional): Name of the mask. Defaults to "cloud_mask".

        Returns:
            GeoTensor: mask
        """
        band_index = self.mask_bands.tolist().index(mask_name)
        slice_y, slice_x = self.window_raw.toslices()
        mask_arr = self.nc_ds_l2amask['mask'].values[slice_y, slice_x, band_index]
        return self.georreference(mask_arr,
                                  fill_value_default=self.nc_ds_l2amask['mask'].attrs.get('_FillValue', -9999))

    def water_mask(self) -> GeoTensor:
        """ Returns the water mask """
        return self.mask("Water flag")

    @property
    def observation_bands(self) -> np.array:
        """ Returns the observation bands """
        self.nc_ds_obs
        return self._observation_bands

    def observation(self, name:str) -> GeoTensor:
        """ Returns the observation with the given name """
        band_index = self.observation_bands.tolist().index(name)
        slice_y, slice_x = self.window_raw.toslices()
        # The obs file stores obs data in root group, not in a subgroup
        obs_arr = self.nc_ds_obs['obs'].values[slice_y, slice_x, band_index]
        return self.georreference(obs_arr, 
                                  fill_value_default=self.nc_ds_obs['obs'].attrs.get('_FillValue', -9999))

    def sza(self) -> GeoTensor:
        """ Return the solar zenith angle as a GeoTensor """
        return self.observation('To-sun zenith (0 to 90 degrees from zenith)')

    def vza(self) -> GeoTensor:
        """ Return the view zenith angle as a GeoTensor """
        return self.observation('To-sensor zenith (0 to 90 degrees from zenith)')

    def elevation(self) -> GeoTensor:
        location_ds = safe_open_netcdf(self.filename, cache=False, load=False, group='location')
        obs_arr = location_ds["elev"]
        slice_y, slice_x = self.window_raw.toslices()
        elev_data = obs_arr.values[slice_y, slice_x]
        fill_val = obs_arr.attrs.get('_FillValue', -9999)
        location_ds.close()
        return self.georreference(elev_data, fill_value_default=fill_val)

    @property
    def mean_sza(self) -> float:
        """ Return the mean solar zenith angle """
        if self._mean_sza is not None:
            return self._mean_sza

        band_index = self.observation_bands.tolist().index('To-sun zenith (0 to 90 degrees from zenith)')
        sza_arr = self.nc_ds_obs['obs'].values[..., band_index]
        fill_val = self.nc_ds_obs['obs'].attrs.get('_FillValue', -9999)
        self._mean_sza = float(np.mean(sza_arr[sza_arr != fill_val]))
        return self._mean_sza

    @property
    def mean_vza(self) -> float:
        """ Return the mean view zenith angle """
        if self._mean_vza is not None:
            return self._mean_vza
        band_index = self.observation_bands.tolist().index('To-sensor zenith (0 to 90 degrees from zenith)')
        vza_arr = self.nc_ds_obs['obs'].values[..., band_index]
        fill_val = self.nc_ds_obs['obs'].attrs.get('_FillValue', -9999)
        self._mean_vza = float(np.mean(vza_arr[vza_arr != fill_val]))
        return self._mean_vza

    def __copy__(self) -> '__class__':
        out = EMITImage(
            self.filename,
            glt=self.glt.copy(),
            band_selection=self.band_selection,
            reuse_handles_from=self,
        )

        # copy nc_ds_obs if it exists
        for attrname in self.attributes_set_if_exists:
            if hasattr(self, attrname):
                setattr(out, attrname, getattr(self, attrname))

        return out
    def copy(self) -> '__class__':
        return self.__copy__()

    def to_crs(self, crs:Any="UTM", 
               resolution_dst_crs:Optional[Union[float, Tuple[float, float]]]=60) -> '__class__':
        """
        Reproject the image to a new crs

        Args:
            crs (Any): CRS. 

        Returns:
            EmitImage: EMIT image in the new CRS

        Example:
            >>> emit_image = EMITImage("path/to/emit_image.nc")
            >>> emit_image_utm = emit_image.to_crs(crs="UTM")
        """
        if crs == "UTM":
            footprint = self.glt.footprint("EPSG:4326")
            crs = get_utm_epsg(footprint)

        glt = read.read_to_crs(self.glt, crs, resampling=rasterio.warp.Resampling.nearest, 
                               resolution_dst_crs=resolution_dst_crs)

        out = EMITImage(
            self.filename,
            glt=glt,
            band_selection=self.band_selection,
            reuse_handles_from=self,
        )

        # Propagate eagerly-set and lazily-loaded attributes from the parent so
        # the new instance shares the parent's NetCDF handles, sensor params,
        # observation bands, mean angles, etc. without re-opening anything.
        for attrname in self.attributes_set_if_exists:
            if hasattr(self, attrname):
                setattr(out, attrname, getattr(self, attrname))

        # _pol is not in attributes_set_if_exists because it's CRS-dependent β€”
        # it must be reprojected to the new CRS.
        if hasattr(self, '_pol'):
            setattr(out, '_pol', window_utils.polygon_to_crs(self._pol, self.crs, crs))

        return out


    def read_from_window(self, window:Optional[rasterio.windows.Window]=None, boundless:bool=True) -> '__class__':
        glt_window = self.glt.read_from_window(window, boundless=boundless)
        out = EMITImage(
            self.filename,
            glt=glt_window,
            band_selection=self.band_selection,
            reuse_handles_from=self,
        )

        # Propagate eagerly-set and lazily-loaded attributes from the parent.
        for attrname in self.attributes_set_if_exists:
            if hasattr(self, attrname):
                setattr(out, attrname, getattr(self, attrname))

        return out

    def read_from_bands(self, bands:Union[int, Tuple[int, ...], slice]) -> '__class__':
        copy = self.__copy__()
        copy.set_band_selection(bands)
        return copy

    def load(self, boundless:bool=True, as_reflectance:bool=False)-> GeoTensor:
        data = self.load_raw() # (C, H, W) or (H, W)
        if as_reflectance:
            invalids = np.isnan(data) | (data == self.fill_value_default)
            thuiller = reflectance.load_thuillier_irradiance()
            response = reflectance.srf(self.wavelengths, self.fwhm, thuiller["Nanometer"].values)
            solar_irradiance_norm = thuiller["Radiance(mW/m2/nm)"].values.dot(response) / 1_000
            data = reflectance.radiance_to_reflectance(data, solar_irradiance_norm,
                                                       units=self.units,
                                                       observation_date_corr_factor=self.observation_date_correction_factor)
            data[invalids] = self.fill_value_default
        return self.georreference(data, fill_value_default=self.fill_value_default)

    def load_rgb(self, as_reflectance:bool=True) -> GeoTensor:
        bands_read = np.argmin(np.abs(WAVELENGTHS_RGB[:, np.newaxis] - self.wavelengths), axis=1).tolist()
        ei_rgb = self.read_from_bands(bands_read)
        return ei_rgb.load(boundless=True, as_reflectance=as_reflectance)

    @property
    def shape_raw(self) -> Tuple[int, int, int]:
        """ Return the shape of the raw data in (C, H, W) format """
        return (len(self.wavelengths),) + rasterio.windows.shape(self.window_raw)

    def _bounds_indexes_raw(self) -> Tuple[int, int, int, int]:
        """ Return the bounds of the raw data: (min_x, min_y, max_x, max_y) """
        return _bounds_indexes_raw(self.glt.values, self.valid_glt)


    def load_raw(self, transpose:bool=True) -> np.array:
        """
        Load the raw data, without orthorectification

        Args:
            transpose (bool, optional): Transpose the data if it has 3 dimentsions to (C, H, W)
                Defaults to True. if False return (H, W, C)

        Returns:
            np.array: raw data (C, H, W) or (H, W)
        """

        slice_y, slice_x = self.window_raw.toslices()

        if self.cache_radiance:
            # Option B (opt-in): cache the full-spectrum windowed radiance so that
            # subsequent loads of band subsets become pure in-memory slices.
            # ``self._cache`` is a mutable dict shared with all clones built from
            # this instance (via ``attributes_set_if_exists``), so a single
            # decompression services every algorithm downstream.
            cached = self._cache.get(self._CACHE_KEY_RADIANCE)
            if cached is None:
                radiance = self.nc_ds['radiance']
                dims = radiance.dims
                cached = radiance.isel({dims[0]: slice_y, dims[1]: slice_x}).values
                self._cache[self._CACHE_KEY_RADIANCE] = cached
            data = cached[..., self.band_selection]
        else:
            # Default path: push the spatial (and, when possible, spectral) slice
            # into the NetCDF read via xarray .isel(). Avoids materialising the
            # full radiance variable in RAM, but re-reads from disk each call.
            radiance = self.nc_ds['radiance']
            dims = radiance.dims  # typically ('downtrack', 'crosstrack', 'bands')
            radiance = radiance.isel({dims[0]: slice_y, dims[1]: slice_x})

            if isinstance(self.band_selection, slice):
                radiance = radiance.isel({dims[2]: self.band_selection})
                data = radiance.values
            else:
                # Fancy indexing (list / array of indices) β€” push as far as we can
                # into the read (spatial), then numpy-slice the band axis.
                data = radiance.values[..., self.band_selection]

        # transpose to (C, H, W)
        if transpose and (len(data.shape) == 3):
            data = np.transpose(data, axes=(2, 0, 1))

        return data

    def clear_radiance_cache(self) -> None:
        """Drop the cached radiance window if present.

        After this call, the next ``load_raw()`` will re-read from disk. The
        ``_cache`` dict object itself is not replaced β€” clones built via
        ``__copy__`` / ``read_from_bands`` / ``to_crs`` / ``read_from_window``
        share the same dict by reference, so clearing through any clone is
        visible to all of them. Intended to be called from ``EmitProcessor.process``
        after all per-scene products are computed, to release the ~1.5 GB
        radiance array before the next scene is processed.
        """
        self._cache.pop(self._CACHE_KEY_RADIANCE, None)


    def georreference(self, data:np.array, 
                      fill_value_default:Optional[Union[int,float]]=None) -> GeoTensor:
        """
        Georreference an image in sensor coordinates to coordinates of the current 
        georreferenced object. If you do some processing with the raw data, you can 
        georreference the raw output with this function.

        Args:
            data (np.array): raw data (C, H, W) or (H, W). 

        Returns:
            GeoTensor: georreferenced version of data (C, H', W') or (H', W')

        Example:
            >>> emit_image = EMITImage("path/to/emit_image.nc")
            >>> emit_image_rgb = emit_image.read_from_bands([35, 23, 11])
            >>> data_rgb = emit_image_rgb.load_raw() # (3, H, W)
            >>> data_rgb_ortho = emit_image.georreference(data_rgb) # (3, H', W')
        """
        return georreference(self.glt_relative, data, self.valid_glt, 
                             fill_value_default=fill_value_default)


    @property
    def values(self) -> np.array:
        # return np.zeros(self.shape, dtype=self.dtype)
        raise self.load(boundless=True).values

    def __repr__(self)->str:
        return f""" 
         File: {self.filename}
         Transform: {self.transform}
         Shape: {self.shape}
         Resolution: {self.res}
         Bounds: {self.bounds}
         CRS: {self.crs}
         units: {self.units}
        """

mask_bands property

Returns the mask bands -> ['Cloud flag', 'Cirrus flag', 'Water flag', 'Spacecraft Flag', 'Dilated Cloud Flag', 'AOD550', 'H2O (g cm-2)', 'Aggregate Flag']

mean_sza property

Return the mean solar zenith angle

mean_vza property

Return the mean view zenith angle

nc_ds_l2amask property

Loads the L2A mask file. In this file we have information about the cloud mask.

This function downloads the L2A mask file if it does not exist from the JPL portal.

It caches the L2A mask file in the object. (self.nc_ds_l2amask)

See https://lpdaac.usgs.gov/products/emitl2arflv001/ for info about the L2A mask file.

Parameters:

Name Type Description Default
l2amaskfile Optional[str]

Path to the L2A mask file. Defaults to None. If none it will download the L2A mask file from the EMIT server.

required

nc_ds_obs property

Loads the observation file. In this file we have information about angles (solar and viewing), elevation and ilumination based on elevation and path length.

This function downloads the observation file if it does not exist from the JPL portal.

It caches the observation file in the object. (self.nc_ds_obs)

Parameters:

Name Type Description Default
obs_file Optional[str]

Path to the observation file. Defaults to None. If none it will download the observation file from the EMIT server.

required

observation_bands property

Returns the observation bands

percentage_clear property

Return the percentage of clear pixels in the image

Returns:

Name Type Description
float float

percentage of clear pixels

shape_raw property

Return the shape of the raw data in (C, H, W) format

clear_radiance_cache()

Drop the cached radiance window if present.

After this call, the next load_raw() will re-read from disk. The _cache dict object itself is not replaced β€” clones built via __copy__ / read_from_bands / to_crs / read_from_window share the same dict by reference, so clearing through any clone is visible to all of them. Intended to be called from EmitProcessor.process after all per-scene products are computed, to release the ~1.5 GB radiance array before the next scene is processed.

Source code in georeader/readers/emit.py
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
def clear_radiance_cache(self) -> None:
    """Drop the cached radiance window if present.

    After this call, the next ``load_raw()`` will re-read from disk. The
    ``_cache`` dict object itself is not replaced β€” clones built via
    ``__copy__`` / ``read_from_bands`` / ``to_crs`` / ``read_from_window``
    share the same dict by reference, so clearing through any clone is
    visible to all of them. Intended to be called from ``EmitProcessor.process``
    after all per-scene products are computed, to release the ~1.5 GB
    radiance array before the next scene is processed.
    """
    self._cache.pop(self._CACHE_KEY_RADIANCE, None)

footprint(crs=None)

Get the footprint of the image in the given CRS. If no CRS is given, the footprint is returned in the native CRS. This function takes into account the valid_glt mask to compute the footprint.

Parameters:

Name Type Description Default
crs Optional[str]

The CRS to return the footprint in. Defaults to None. If None, the footprint is returned in the native CRS.

None

Returns:

Name Type Description
Polygon Polygon

The footprint of the image in the given CRS.

Source code in georeader/readers/emit.py
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
def footprint(self, crs:Optional[str]=None) -> Polygon:
    """
    Get the footprint of the image in the given CRS. If no CRS is given, the footprint is returned in the native CRS.
    This function takes into account the valid_glt mask to compute the footprint.

    Args:
        crs (Optional[str], optional): The CRS to return the footprint in. Defaults to None. 
            If None, the footprint is returned in the native CRS.

    Returns:
        Polygon: The footprint of the image in the given CRS.
    """
    if not hasattr(self, '_pol'):
        from georeader.vectorize import get_polygons
        pols = get_polygons(self.valid_glt, transform=self.transform)
        self._pol = unary_union(pols)
    if crs is not None:
        pol_crs = window_utils.polygon_to_crs(self._pol, self.crs, crs)
    else:
        pol_crs = self._pol

    pol_glt = self.glt.footprint(crs=crs)

    return pol_crs.intersection(pol_glt)

georreference(data, fill_value_default=None)

Georreference an image in sensor coordinates to coordinates of the current georreferenced object. If you do some processing with the raw data, you can georreference the raw output with this function.

Parameters:

Name Type Description Default
data array

raw data (C, H, W) or (H, W).

required

Returns:

Name Type Description
GeoTensor GeoTensor

georreferenced version of data (C, H', W') or (H', W')

Example

emit_image = EMITImage("path/to/emit_image.nc") emit_image_rgb = emit_image.read_from_bands([35, 23, 11]) data_rgb = emit_image_rgb.load_raw() # (3, H, W) data_rgb_ortho = emit_image.georreference(data_rgb) # (3, H', W')

Source code in georeader/readers/emit.py
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
def georreference(self, data:np.array, 
                  fill_value_default:Optional[Union[int,float]]=None) -> GeoTensor:
    """
    Georreference an image in sensor coordinates to coordinates of the current 
    georreferenced object. If you do some processing with the raw data, you can 
    georreference the raw output with this function.

    Args:
        data (np.array): raw data (C, H, W) or (H, W). 

    Returns:
        GeoTensor: georreferenced version of data (C, H', W') or (H', W')

    Example:
        >>> emit_image = EMITImage("path/to/emit_image.nc")
        >>> emit_image_rgb = emit_image.read_from_bands([35, 23, 11])
        >>> data_rgb = emit_image_rgb.load_raw() # (3, H, W)
        >>> data_rgb_ortho = emit_image.georreference(data_rgb) # (3, H', W')
    """
    return georreference(self.glt_relative, data, self.valid_glt, 
                         fill_value_default=fill_value_default)

invalid_mask_raw(with_buffer=True)

Returns the non georreferenced quality mask. True means that the pixel is not valid.

This mask is computed as the sum of the Cloud flag, Cirrus flag, Spacecraft flag and Dilated Cloud Flag. True means that the pixel is not valid.

From: https://github.com/nasa/EMIT-Data-Resources/blob/main/python/how-tos/How_to_use_EMIT_Quality_data.ipynb and https://github.com/nasa/EMIT-Data-Resources/blob/main/python/modules/emit_tools.py#L277

Source code in georeader/readers/emit.py
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
def invalid_mask_raw(self, with_buffer:bool=True) -> NDArray:
    """
    Returns the non georreferenced quality mask. True means that the pixel is not valid.

    This mask is computed as the sum of the Cloud flag, Cirrus flag, Spacecraft flag and Dilated Cloud Flag.
    True means that the pixel is not valid.

    From: https://github.com/nasa/EMIT-Data-Resources/blob/main/python/how-tos/How_to_use_EMIT_Quality_data.ipynb
    and https://github.com/nasa/EMIT-Data-Resources/blob/main/python/modules/emit_tools.py#L277


    """
    band_index =  [0,1,3]
    if with_buffer:
        band_index.append(4)

    slice_y, slice_x = self.window_raw.toslices()
    mask_arr = self.nc_ds_l2amask['mask'].values[slice_y, slice_x, band_index]
    mask_arr = np.sum(mask_arr, axis=-1)
    mask_arr = (mask_arr >= 1)
    return mask_arr

load_raw(transpose=True)

Load the raw data, without orthorectification

Parameters:

Name Type Description Default
transpose bool

Transpose the data if it has 3 dimentsions to (C, H, W) Defaults to True. if False return (H, W, C)

True

Returns:

Type Description
array

np.array: raw data (C, H, W) or (H, W)

Source code in georeader/readers/emit.py
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
def load_raw(self, transpose:bool=True) -> np.array:
    """
    Load the raw data, without orthorectification

    Args:
        transpose (bool, optional): Transpose the data if it has 3 dimentsions to (C, H, W)
            Defaults to True. if False return (H, W, C)

    Returns:
        np.array: raw data (C, H, W) or (H, W)
    """

    slice_y, slice_x = self.window_raw.toslices()

    if self.cache_radiance:
        # Option B (opt-in): cache the full-spectrum windowed radiance so that
        # subsequent loads of band subsets become pure in-memory slices.
        # ``self._cache`` is a mutable dict shared with all clones built from
        # this instance (via ``attributes_set_if_exists``), so a single
        # decompression services every algorithm downstream.
        cached = self._cache.get(self._CACHE_KEY_RADIANCE)
        if cached is None:
            radiance = self.nc_ds['radiance']
            dims = radiance.dims
            cached = radiance.isel({dims[0]: slice_y, dims[1]: slice_x}).values
            self._cache[self._CACHE_KEY_RADIANCE] = cached
        data = cached[..., self.band_selection]
    else:
        # Default path: push the spatial (and, when possible, spectral) slice
        # into the NetCDF read via xarray .isel(). Avoids materialising the
        # full radiance variable in RAM, but re-reads from disk each call.
        radiance = self.nc_ds['radiance']
        dims = radiance.dims  # typically ('downtrack', 'crosstrack', 'bands')
        radiance = radiance.isel({dims[0]: slice_y, dims[1]: slice_x})

        if isinstance(self.band_selection, slice):
            radiance = radiance.isel({dims[2]: self.band_selection})
            data = radiance.values
        else:
            # Fancy indexing (list / array of indices) β€” push as far as we can
            # into the read (spatial), then numpy-slice the band axis.
            data = radiance.values[..., self.band_selection]

    # transpose to (C, H, W)
    if transpose and (len(data.shape) == 3):
        data = np.transpose(data, axes=(2, 0, 1))

    return data

mask(mask_name='cloud_mask')

Return the mask layer with the given name. Mask shall be one of self.mask_bands -> ['Cloud flag', 'Cirrus flag', 'Water flag', 'Spacecraft Flag', 'Dilated Cloud Flag', 'AOD550', 'H2O (g cm-2)', 'Aggregate Flag']

Args: mask_name (str, optional): Name of the mask. Defaults to "cloud_mask".

Returns: GeoTensor: mask

Source code in georeader/readers/emit.py
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
def mask(self, mask_name:str="cloud_mask") -> GeoTensor:
    """
    Return the mask layer with the given name.
    Mask shall be one of self.mask_bands -> ['Cloud flag', 'Cirrus flag', 'Water flag', 'Spacecraft Flag',
   'Dilated Cloud Flag', 'AOD550', 'H2O (g cm-2)', 'Aggregate Flag']

    Args:
        mask_name (str, optional): Name of the mask. Defaults to "cloud_mask".

    Returns:
        GeoTensor: mask
    """
    band_index = self.mask_bands.tolist().index(mask_name)
    slice_y, slice_x = self.window_raw.toslices()
    mask_arr = self.nc_ds_l2amask['mask'].values[slice_y, slice_x, band_index]
    return self.georreference(mask_arr,
                              fill_value_default=self.nc_ds_l2amask['mask'].attrs.get('_FillValue', -9999))

observation(name)

Returns the observation with the given name

Source code in georeader/readers/emit.py
861
862
863
864
865
866
867
868
def observation(self, name:str) -> GeoTensor:
    """ Returns the observation with the given name """
    band_index = self.observation_bands.tolist().index(name)
    slice_y, slice_x = self.window_raw.toslices()
    # The obs file stores obs data in root group, not in a subgroup
    obs_arr = self.nc_ds_obs['obs'].values[slice_y, slice_x, band_index]
    return self.georreference(obs_arr, 
                              fill_value_default=self.nc_ds_obs['obs'].attrs.get('_FillValue', -9999))

set_band_selection(band_selection=None)

Set the band selection. Band selection is absolute w.r.t self.nc_ds['radiance']

Parameters:

Name Type Description Default
band_selection Optional[Union[int, Tuple[int, ...], slice]]

slicing or selection of the bands. Defaults to None.

None
Example

emit_image.set_band_selection(slice(0, 3)) # will only load the three first bands emit_image.wavelengths # will only return the wavelengths of the three first bands emit_image.load() # will only load the three first bands

Source code in georeader/readers/emit.py
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
def set_band_selection(self, band_selection:Optional[Union[int, Tuple[int, ...],slice]]=None):
    """
    Set the band selection. Band selection is absolute w.r.t self.nc_ds['radiance']

    Args:
        band_selection (Optional[Union[int, Tuple[int, ...],slice]], optional): slicing or selection of the bands. Defaults to None.

    Example:
        >>> emit_image.set_band_selection(slice(0, 3)) # will only load the three first bands
        >>> emit_image.wavelengths # will only return the wavelengths of the three first bands
        >>> emit_image.load() # will only load the three first bands
    """
    if band_selection is None:
        band_selection = slice(None)
    self.band_selection = band_selection
    self.wavelengths = self._sensor_band_params[self.bandname_dimension].values[self.band_selection]
    self.fwhm = self._sensor_band_params['fwhm'].values[self.band_selection]

sza()

Return the solar zenith angle as a GeoTensor

Source code in georeader/readers/emit.py
870
871
872
def sza(self) -> GeoTensor:
    """ Return the solar zenith angle as a GeoTensor """
    return self.observation('To-sun zenith (0 to 90 degrees from zenith)')

to_crs(crs='UTM', resolution_dst_crs=60)

Reproject the image to a new crs

Parameters:

Name Type Description Default
crs Any

CRS.

'UTM'

Returns:

Name Type Description
EmitImage __class__

EMIT image in the new CRS

Example

emit_image = EMITImage("path/to/emit_image.nc") emit_image_utm = emit_image.to_crs(crs="UTM")

Source code in georeader/readers/emit.py
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
def to_crs(self, crs:Any="UTM", 
           resolution_dst_crs:Optional[Union[float, Tuple[float, float]]]=60) -> '__class__':
    """
    Reproject the image to a new crs

    Args:
        crs (Any): CRS. 

    Returns:
        EmitImage: EMIT image in the new CRS

    Example:
        >>> emit_image = EMITImage("path/to/emit_image.nc")
        >>> emit_image_utm = emit_image.to_crs(crs="UTM")
    """
    if crs == "UTM":
        footprint = self.glt.footprint("EPSG:4326")
        crs = get_utm_epsg(footprint)

    glt = read.read_to_crs(self.glt, crs, resampling=rasterio.warp.Resampling.nearest, 
                           resolution_dst_crs=resolution_dst_crs)

    out = EMITImage(
        self.filename,
        glt=glt,
        band_selection=self.band_selection,
        reuse_handles_from=self,
    )

    # Propagate eagerly-set and lazily-loaded attributes from the parent so
    # the new instance shares the parent's NetCDF handles, sensor params,
    # observation bands, mean angles, etc. without re-opening anything.
    for attrname in self.attributes_set_if_exists:
        if hasattr(self, attrname):
            setattr(out, attrname, getattr(self, attrname))

    # _pol is not in attributes_set_if_exists because it's CRS-dependent β€”
    # it must be reprojected to the new CRS.
    if hasattr(self, '_pol'):
        setattr(out, '_pol', window_utils.polygon_to_crs(self._pol, self.crs, crs))

    return out

validmask(with_buffer=True)

Return the validmask mask

Returns:

Name Type Description
GeoTensor GeoTensor

bool mask. True means that the pixel is valid.

Source code in georeader/readers/emit.py
784
785
786
787
788
789
790
791
792
793
794
795
796
def validmask(self, with_buffer:bool=True) -> GeoTensor:
    """
    Return the validmask mask


    Returns:
        GeoTensor: bool mask. True means that the pixel is valid.
    """

    validmask = ~self.invalid_mask_raw(with_buffer=with_buffer)

    return self.georreference(validmask,
                              fill_value_default=False)

vza()

Return the view zenith angle as a GeoTensor

Source code in georeader/readers/emit.py
874
875
876
def vza(self) -> GeoTensor:
    """ Return the view zenith angle as a GeoTensor """
    return self.observation('To-sensor zenith (0 to 90 degrees from zenith)')

water_mask()

Returns the water mask

Source code in georeader/readers/emit.py
851
852
853
def water_mask(self) -> GeoTensor:
    """ Returns the water mask """
    return self.mask("Water flag")

download_product(link_down, filename=None, display_progress_bar=True, auth=None)

Download a product from the EMIT website (https://search.earthdata.nasa.gov/search). It requires that you have an account in the NASA Earthdata portal.

This code is based on this example: https://git.earthdata.nasa.gov/projects/LPDUR/repos/daac_data_download_python/browse

Parameters:

Name Type Description Default
link_down str

link to the product

required
filename Optional[str]

filename to save the product

None
display_progress_bar bool

display tqdm progress bar

True
auth Optional[Tuple[str, str]]

tuple with user and password to download the product. If None, it will try to read the user and password from ~/.georeader/auth_emit.json

None
Example

link_down = 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL1BRAD.001/EMIT_L1B_RAD_001_20220828T051941_2224004_006/EMIT_L1B_RAD_001_20220828T051941_2224004_006.nc' filename = download_product(link_down)

Source code in georeader/readers/emit.py
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
def download_product(link_down:str, filename:Optional[str]=None,
                     display_progress_bar:bool=True,
                     auth:Optional[Tuple[str, str]] = None) -> str:
    """
    Download a product from the EMIT website (https://search.earthdata.nasa.gov/search). 
    It requires that you have an account in the NASA Earthdata portal. 

    This code is based on this example: https://git.earthdata.nasa.gov/projects/LPDUR/repos/daac_data_download_python/browse

    Args:
        link_down: link to the product
        filename: filename to save the product
        display_progress_bar: display tqdm progress bar
        auth: tuple with user and password to download the product. If None, it will try to read the user and password from ~/.georeader/auth_emit.json 

    Example:
        >>> link_down = 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL1BRAD.001/EMIT_L1B_RAD_001_20220828T051941_2224004_006/EMIT_L1B_RAD_001_20220828T051941_2224004_006.nc'
        >>> filename = download_product(link_down)
    """
    headers = None
    if auth is None:
        if AUTH_METHOD == "auth":
            auth = get_auth()
        elif AUTH_METHOD == "token":
            assert TOKEN is not None, "You need to set the TOKEN variable to download EMIT images"
            headers = get_headers()

    return download_product_base(link_down, filename=filename, auth=auth,
                                 headers=headers,
                                 display_progress_bar=display_progress_bar, 
                                 verify=False)

Get the link to download a product from the EMIT website. See: https://git.earthdata.nasa.gov/projects/LPDUR/repos/daac_data_download_python/browse

Parameters:

Name Type Description Default
product_path str

path to the product or filename of the product or product name with or without extension. e.g. 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'

required
Example

product_path = 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc' link = get_radiance_link(product_path) 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL1BRAD.001/EMIT_L1B_RAD_001_20220827T060753_2223904_013/EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'

Source code in georeader/readers/emit.py
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
def get_radiance_link(product_path:str) -> str:
    """
    Get the link to download a product from the EMIT website.
    See: https://git.earthdata.nasa.gov/projects/LPDUR/repos/daac_data_download_python/browse

    Args:
        product_path: path to the product or filename of the product or product name with or without extension.
            e.g. 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'

    Example:
        >>> product_path = 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'
        >>> link = get_radiance_link(product_path)
        'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL1BRAD.001/EMIT_L1B_RAD_001_20220827T060753_2223904_013/EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'
    """
    "EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc"
    namefile = os.path.splitext(os.path.basename(product_path))[0]
    product_id = os.path.splitext(namefile)[0]
    content_id = product_id.split("_")
    content_id[1] = "L1B"
    content_id[2] = "RAD"
    content_id[3] = content_id[3].replace("V", "")
    product_id = "_".join(content_id)
    link = f"https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL1BRAD.001/{product_id}/{product_id}.nc"
    return link

Get the link to download a product from the EMIT website. See: https://git.earthdata.nasa.gov/projects/LPDUR/repos/daac_data_download_python/browse

Parameters:

Name Type Description Default
product_path str

path to the product or filename of the product with or without extension. e.g. 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'

required
Example

product_path = 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc' link = get_radiance_link(product_path) 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL1BRAD.001/EMIT_L1B_RAD_001_20220827T060753_2223904_013/EMIT_L1B_OBS_001_20220827T060753_2223904_013.nc'

Source code in georeader/readers/emit.py
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
def get_obs_link(product_path:str) -> str:
    """
    Get the link to download a product from the EMIT website.
    See: https://git.earthdata.nasa.gov/projects/LPDUR/repos/daac_data_download_python/browse

    Args:
        product_path: path to the product or filename of the product with or without extension.
            e.g. 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'

    Example:
        >>> product_path = 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'
        >>> link = get_radiance_link(product_path)
        'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL1BRAD.001/EMIT_L1B_RAD_001_20220827T060753_2223904_013/EMIT_L1B_OBS_001_20220827T060753_2223904_013.nc'
    """
    namefile = os.path.splitext(os.path.basename(product_path))[0]

    product_id = os.path.splitext(namefile)[0]
    content_id = product_id.split("_")
    content_id[1] = "L1B"
    content_id[2] = "RAD"
    content_id[3] = content_id[3].replace("V", "")
    product_id = "_".join(content_id)

    content_id[2] = "OBS"
    namefile = "_".join(content_id)

    link = f"https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL1BRAD.001/{product_id}/{namefile}.nc"
    return link

Get the link to download a product from the EMIT website. See: https://git.earthdata.nasa.gov/projects/LPDUR/repos/daac_data_download_python/browse

Parameters:

Name Type Description Default
tile str

path to the product or filename of the product with or without extension. e.g. 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'

required
Example

product_path = 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc' link = get_radiance_link(product_path) 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BCH4ENH.001/EMIT_L2B_CH4ENH_001_20220810T064957_2222205_033/EMIT_L2B_CH4ENH_001_20220810T064957_2222205_033.tif'

Source code in georeader/readers/emit.py
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
def get_ch4enhancement_link(tile:str) -> str:
    """
    Get the link to download a product from the EMIT website.
    See: https://git.earthdata.nasa.gov/projects/LPDUR/repos/daac_data_download_python/browse

    Args:
        tile (str): path to the product or filename of the product with or without extension.
            e.g. 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'

    Example:
        >>> product_path = 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'
        >>> link = get_radiance_link(product_path)
        'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BCH4ENH.001/EMIT_L2B_CH4ENH_001_20220810T064957_2222205_033/EMIT_L2B_CH4ENH_001_20220810T064957_2222205_033.tif'
    """
    namefile = os.path.splitext(os.path.basename(tile))[0]

    product_id = os.path.splitext(namefile)[0]
    content_id = product_id.split("_")
    content_id[1] = "L2B"
    content_id[2] = "CH4ENH"
    content_id[3] = content_id[3].replace("V", "")
    product_id = "_".join(content_id)
    link = f"https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2BCH4ENH.001/{product_id}/{product_id}.tif"
    return link

Get the link to download a product from the EMIT website (https://search.earthdata.nasa.gov/search)

Parameters:

Name Type Description Default
tile str

path to the product or filename of the L1B product with or without extension. e.g. 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'

required

Returns:

Name Type Description
str str

link to the L2A mask product

Example

tile = 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc' link = get_l2amask_link(tile) 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20220827T060753_2223904_013/EMIT_L2A_MASK_001_20220827T060753_2223904_013.nc'

Source code in georeader/readers/emit.py
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
def get_l2amask_link(tile: str) -> str:
    """
    Get the link to download a product from the EMIT website (https://search.earthdata.nasa.gov/search)

    Args:
        tile (str): path to the product or filename of the L1B product with or without extension.
            e.g. 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'

    Returns:
        str: link to the L2A mask product

    Example:
        >>> tile = 'EMIT_L1B_RAD_001_20220827T060753_2223904_013.nc'
        >>> link = get_l2amask_link(tile)
        'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2ARFL.001/EMIT_L2A_RFL_001_20220827T060753_2223904_013/EMIT_L2A_MASK_001_20220827T060753_2223904_013.nc'
    """
    namefile = os.path.splitext(os.path.basename(tile))[0]
    namefile = namefile + ".nc"

    product_id = os.path.splitext(namefile)[0]
    content_id = product_id.split("_")
    content_id[1] = "L2A"
    content_id[2] = "RFL"
    content_id[3] = content_id[3].replace("V", "")
    product_id = "_".join(content_id)

    content_id[2] = "MASK"
    namefilenew = "_".join(content_id) + ".nc"
    link = f"https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/EMITL2ARFL.001/{product_id}/{namefilenew}"
    return link

valid_mask(filename, with_buffer=False, dst_crs='UTM', resolution_dst_crs=60)

Loads the valid mask from the EMIT L2AMASK file.

Parameters:

Name Type Description Default
filename str

path to the L2AMASK file. e.g. EMIT_L2A_MASK_001_20220827T060753_2223904_013.nc

required
with_buffer bool

If True, the buffer band is used to compute the valid mask. Defaults to False.

False

Returns:

Name Type Description
GeoTensor Tuple[GeoTensor, float]

valid mask

Source code in georeader/readers/emit.py
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
def valid_mask(filename:str, with_buffer:bool=False, 
               dst_crs:Optional[Any]="UTM", 
               resolution_dst_crs:Optional[Union[float, Tuple[float, float]]]=60) -> Tuple[GeoTensor, float]:
    """
    Loads the valid mask from the EMIT L2AMASK file.

    Args:
        filename (str): path to the L2AMASK file. e.g. EMIT_L2A_MASK_001_20220827T060753_2223904_013.nc
        with_buffer (bool, optional): If True, the buffer band is used to compute the valid mask. Defaults to False.

    Returns:
        GeoTensor: valid mask
    """

    if not HAS_XARRAY:
        raise ImportError("xarray is required to read EMIT images. Please install it with: pip install xarray")

    nc_ds = safe_open_netcdf(filename, cache=False, load=False)

    geotransform = nc_ds.attrs['geotransform']
    real_transform = rasterio.Affine(geotransform[1], geotransform[2], geotransform[0],
                                     geotransform[4], geotransform[5], geotransform[3])

    # Open location group to access glt data
    location_ds = safe_open_netcdf(filename, cache=False, load=False, group='location')
    glt_x = location_ds['glt_x'].values
    glt_y = location_ds['glt_y'].values
    location_ds.close()

    glt_arr = np.zeros((2,) + glt_x.shape, dtype=np.int32)
    glt_arr[0] = glt_x
    glt_arr[1] = glt_y
    # glt_arr -= 1 # account for 1-based indexing

    # https://rasterio.readthedocs.io/en/stable/api/rasterio.crs.html
    glt = GeoTensor(glt_arr, transform=real_transform, 
                    crs=rasterio.crs.CRS.from_wkt(nc_ds.attrs['spatial_ref']),
                    fill_value_default=0)

    if dst_crs is not None:
        if dst_crs == "UTM":
            footprint = glt.footprint("EPSG:4326")
            dst_crs = get_utm_epsg(footprint)

        glt = read.read_to_crs(glt, dst_crs=dst_crs, 
                               resampling=rasterio.warp.Resampling.nearest, 
                               resolution_dst_crs=resolution_dst_crs)

    valid_glt = np.all(glt.values != glt.fill_value_default, axis=0)
    xmin = np.min(glt.values[0, valid_glt])
    ymin = np.min(glt.values[1, valid_glt])

    glt_relative = glt.copy()
    glt_relative.values[0, valid_glt] -= xmin
    glt_relative.values[1, valid_glt] -= ymin
    # mask_bands = nc_ds["sensor_band_parameters"]["mask_bands"][:]

    band_index =  [0,1,3]
    if with_buffer:
        band_index.append(4)

    mask_arr = nc_ds['mask'][:, :, band_index]
    invalidmask_raw = np.sum(mask_arr, axis=-1)
    invalidmask_raw = (invalidmask_raw >= 1)

    validmask = ~invalidmask_raw

    percentage_clear = 100 * (np.sum(validmask) / np.prod(validmask.shape))

    return georreference(glt_relative, validmask, valid_glt,
                         fill_value_default=False), percentage_clear

EnMAP Reader

The EnMAP (Environmental Mapping and Analysis Program) reader processes data from the German hyperspectral satellite mission. This reader works with Level 1B radiometrically calibrated data (not atmospherically corrected) that contains radiance values in physical units.

Key features:

  • Reading L1B hyperspectral radiance data from GeoTIFF format with accompanying XML metadata
  • Working with separate VNIR (420-1000 nm) and SWIR (900-2450 nm) spectral ranges
  • Support for 228 spectral channels with 6.5 nm (VNIR) and 10 nm (SWIR) sampling
  • Integration with Rational Polynomial Coefficients (RPCs) for accurate geometric correction
  • Conversion from radiance (mW/mΒ²/sr/nm) to top-of-atmosphere reflectance
  • Access to solar illumination and viewing geometry for radiometric calculations
  • Support for quality masks

Tutorial example:

API Reference

Module to read EnMAP (Environmental Mapping and Analysis Program) hyperspectral images.

EnMAP is a German hyperspectral satellite mission operated by DLR (German Aerospace Center), launched in 2022. It provides high-spectral-resolution data in 224 bands from 420 to 2450 nm with a 30m spatial resolution and 30km swath width.

Data Format Overview

EnMAP data is distributed as separate GeoTIFF files with an XML metadata file:

EnMAP Product Structure:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  ENMAP01-____L1B-DT0000000000_20220501T101523Z_001_V010110_...     β”‚
β”‚  β”œβ”€β”€ *-METADATA.XML           ← Main metadata file (input)         β”‚
β”‚  β”œβ”€β”€ *-SPECTRAL_IMAGE_VNIR.TIF   420-1000 nm, ~88 bands            β”‚
β”‚  β”œβ”€β”€ *-SPECTRAL_IMAGE_SWIR.TIF   900-2450 nm, ~136 bands           β”‚
β”‚  β”œβ”€β”€ *-QL_QUALITY_CLOUD.TIF      Cloud mask                        β”‚
β”‚  β”œβ”€β”€ *-QL_QUALITY_CIRRUS.TIF     Cirrus mask                       β”‚
β”‚  β”œβ”€β”€ *-QL_QUALITY_SNOW.TIF       Snow mask                         β”‚
β”‚  β”œβ”€β”€ *-QL_QUALITY_HAZE.TIF       Haze mask                         β”‚
β”‚  └── *-QL_PIXELMASK_*.TIF        Per-sensor pixel masks            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Unlike EMIT and PRISMA, EnMAP L1B data is already orthorectified (map-projected) with Rational Polynomial Coefficients (RPCs) stored in the metadata for refined geolocation.

Dual-Sensor Architecture

EnMAP uses two pushbroom sensors with overlapping spectral coverage:

VNIR Detector                        SWIR Detector
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 420 - 1000 nm      β”‚               β”‚ 900 - 2450 nm      β”‚
β”‚ ~88 bands          β”‚               β”‚ ~136 bands         β”‚
β”‚ 6.5 nm sampling    β”‚               β”‚ 10 nm sampling     β”‚
β”‚ Si CCD             β”‚               β”‚ HgCdTe             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚                                    β”‚
          └──────────── Overlap β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     900-1000 nm

The spectral overlap enables cross-calibration between the two detectors.

Radiometric Processing

EnMAP L1B data requires conversion from Digital Numbers (DN) to radiance:

L_Ξ» = DN Γ— GAIN + OFFSET   [W/(mΒ²Β·srΒ·nm)]

Note: DLR provides gains between 2000-10000 (multiplicative, not divisive)
The reader applies: L = (GAIN Γ— DN + OFFSET) Γ— 1000 to get mW/(mΒ²Β·srΒ·nm)

Rational Polynomial Coefficients (RPCs)

EnMAP includes RPCs for precise geolocation refinement:

Pixel (col, row) ──→ RPC Transform ──→ Geographic (lon, lat)

RPCs model:
- Satellite orbit and attitude
- Sensor geometry  
- Terrain elevation effects (when height_off is set appropriately)

The reader can apply RPCs during loading for refined geolocation.

Product Levels

  • L1B: At-sensor radiance, sensor geometry
  • L2A: Surface reflectance, atmospheric correction applied

This reader is designed for L1B products.

Examples

Basic usage::

from georeader.readers.enmap import EnMAP

# Load from metadata XML file
enmap = EnMAP('/path/to/*-METADATA.XML')

# Load specific wavelengths as reflectance
bands = enmap.load_wavelengths([665, 865, 1600], as_reflectance=True)

# Load RGB with RPC-refined geolocation
rgb = enmap.load_rgb(as_reflectance=True, apply_rpcs=True)

# Load quality masks
cloud_mask = enmap.load_product('QL_QUALITY_CLOUD')

See Also

georeader.readers.emit : EMIT hyperspectral reader georeader.readers.prisma : PRISMA hyperspectral reader georeader.rasterio_reader : Base reader for GeoTIFF files

References

  • DLR EnMAP Mission: https://www.enmap.org/
  • EnMAP Product Specification: https://www.enmap.org/data_access/
  • GFZ enpt Package: https://github.com/GFZ/enpt (metadata parsing reference)

EnMAP

Reader for EnMAP (Environmental Mapping and Analysis Program) hyperspectral images.

This class provides comprehensive functionality to read and manipulate EnMAP satellite imagery products from DLR. It handles the multi-file product structure (separate VNIR/SWIR GeoTIFFs with XML metadata), supporting operations like:

  • Loading radiance or reflectance data at specific wavelengths
  • Automatic handling of VNIR/SWIR sensor selection based on wavelength
  • Converting DN to radiance using gain/offset from metadata
  • Converting radiance to reflectance using solar irradiance
  • Applying Rational Polynomial Coefficients (RPCs) for refined geolocation
  • Loading quality masks (cloud, cirrus, snow, haze)

EnMAP Data Model

EnMAP L1B products are orthorectified (map-projected) GeoTIFFs with separate files for VNIR and SWIR bands:

File Structure:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  METADATA.XML  ──→  wavelengths, FWHM, angles,    β”‚
β”‚                      gain/offset, RPCs             β”‚
β”‚                                                    β”‚
β”‚  SPECTRAL_IMAGE_VNIR.TIF  ──→  (88, H, W) bands   β”‚
β”‚  SPECTRAL_IMAGE_SWIR.TIF  ──→  (136, H, W) bands  β”‚
β”‚                                                    β”‚
β”‚  QL_QUALITY_*.TIF  ──→  quality masks              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Radiometric Conversion

DN to radiance conversion is automatic::

L_Ξ» = (GAIN Γ— DN + OFFSET) Γ— 1000   [mW/(mΒ²Β·srΒ·nm)]

Note: DLR gains are multiplicative (not divisive as in some sensors)

Spectral Configuration

EnMAP has two detectors with overlapping coverage:

Wavelength: 420nm ──── 1000nm ──── 2450nm
            β”œβ”€β”€ VNIR ─────
                      β”œβ”€β”€β”€β”€ SWIR ───────────
                      β”” overlapβ”˜
                      900-1000nm
  • VNIR: Silicon CCD, ~88 bands, 6.5nm sampling, SNR >500:1
  • SWIR: HgCdTe, ~136 bands, 10nm sampling, SNR >150:1

Attributes

xml_file : str Path to the EnMAP XML metadata file. by_folder : bool Whether files are organized by folder structure (alternative naming convention). swir_file : str Path to the SWIR GeoTIFF file (derived from xml_file). fs : fsspec.AbstractFileSystem Filesystem for file access (local or cloud storage). vnir : RasterioReader Reader for VNIR spectral image. swir : RasterioReader Reader for SWIR spectral image. wl_center : Dict[str, np.ndarray] Center wavelengths per sensor: {'vnir': [...], 'swir': [...]}. wl_fwhm : Dict[str, np.ndarray] FWHM per sensor: {'vnir': [...], 'swir': [...]}. gain_arr : Dict[str, np.ndarray] Radiometric gains per sensor for DN→radiance conversion. offs_arr : Dict[str, np.ndarray] Radiometric offsets per sensor for DN→radiance conversion. vnir_range : Tuple[float, float] VNIR wavelength range (min, max) including FWHM margins. swir_range : Tuple[float, float] SWIR wavelength range (min, max) including FWHM margins. hsf : float Mean ground elevation (m) from scene metadata. sza : float Solar zenith angle (degrees). saa : float Solar azimuth angle (degrees). vza : float View zenith angle (across-track off-nadir angle, degrees). vaa : float View azimuth angle (scene azimuth, degrees). rpcs_vnir : rasterio.rpc.RPC Rational Polynomial Coefficients for VNIR refined geolocation. rpcs_swir : rasterio.rpc.RPC Rational Polynomial Coefficients for SWIR refined geolocation. time_coverage_start : datetime UTC datetime of acquisition start. time_coverage_end : datetime UTC datetime of acquisition end. units : str Radiance units: 'mW/m2/sr/nm'.

Properties (from underlying readers)

shape : Tuple[int, int, int] Full shape (total_bands, height, width). transform : rasterio.Affine Affine geotransform from SWIR file. crs : rasterio.crs.CRS Coordinate reference system from SWIR file. bounds : Tuple[float, float, float, float] Geographic bounds (xmin, ymin, xmax, ymax). res : Tuple[float, float] Pixel resolution (x, y).

Examples

Basic loading::

>>> from georeader.readers.enmap import EnMAP
>>> 
>>> enmap = EnMAP('/data/ENMAP01-...-METADATA.XML')
>>> print(enmap)  # View metadata summary

Loading specific wavelengths::

>>> # Load NDVI bands as reflectance
>>> bands = enmap.load_wavelengths([665, 865], as_reflectance=True)
>>> print(bands.shape)  # (2, H, W)
>>> 
>>> # Compute NDVI
>>> red, nir = bands.values[0], bands.values[1]
>>> ndvi = (nir - red) / (nir + red + 1e-10)

Loading RGB with RPC refinement::

>>> # Apply RPCs for better geolocation (recommended)
>>> rgb = enmap.load_rgb(as_reflectance=True, apply_rpcs=True)
>>> 
>>> # Without RPCs (uses original map projection)
>>> rgb = enmap.load_rgb(as_reflectance=True, apply_rpcs=False)

Loading quality masks::

>>> # Load cloud mask
>>> cloud = enmap.load_product('QL_QUALITY_CLOUD')
>>> 
>>> # Available products: 
>>> # 'QL_QUALITY_CLOUD', 'QL_QUALITY_CIRRUS', 'QL_QUALITY_SNOW',
>>> # 'QL_QUALITY_HAZE', 'QL_PIXELMASK_VNIR', 'QL_PIXELMASK_SWIR'

Spatial subsetting with window_focus::

>>> from rasterio.windows import Window
>>> 
>>> # Focus on a specific region
>>> window = Window(col_off=100, row_off=200, width=500, height=500)
>>> enmap_subset = EnMAP('/path/to/METADATA.XML', window_focus=window)

Cloud storage access::

>>> import gcsfs
>>> 
>>> fs = gcsfs.GCSFileSystem()
>>> enmap = EnMAP('gs://bucket/ENMAP-METADATA.XML', fs=fs)

See Also

georeader.readers.emit.EMITImage : EMIT hyperspectral reader georeader.readers.prisma.PRISMA : PRISMA hyperspectral reader georeader.rasterio_reader.RasterioReader : Base reader for GeoTIFF georeader.read.read_rpcs : Apply RPC transformations

References

  • DLR EnMAP Mission: https://www.enmap.org/
  • GFZ enpt Package: https://github.com/GFZ/enpt (metadata parser reference)
  • EnMAP Product Specification Document
Source code in georeader/readers/enmap.py
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
class EnMAP:
    """
    Reader for EnMAP (Environmental Mapping and Analysis Program) hyperspectral images.

    This class provides comprehensive functionality to read and manipulate EnMAP satellite
    imagery products from DLR. It handles the multi-file product structure (separate VNIR/SWIR
    GeoTIFFs with XML metadata), supporting operations like:

    - Loading radiance or reflectance data at specific wavelengths
    - Automatic handling of VNIR/SWIR sensor selection based on wavelength
    - Converting DN to radiance using gain/offset from metadata
    - Converting radiance to reflectance using solar irradiance
    - Applying Rational Polynomial Coefficients (RPCs) for refined geolocation
    - Loading quality masks (cloud, cirrus, snow, haze)

    EnMAP Data Model
    ----------------
    EnMAP L1B products are orthorectified (map-projected) GeoTIFFs with separate files
    for VNIR and SWIR bands:

        File Structure:
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  METADATA.XML  ──→  wavelengths, FWHM, angles,    β”‚
        β”‚                      gain/offset, RPCs             β”‚
        β”‚                                                    β”‚
        β”‚  SPECTRAL_IMAGE_VNIR.TIF  ──→  (88, H, W) bands   β”‚
        β”‚  SPECTRAL_IMAGE_SWIR.TIF  ──→  (136, H, W) bands  β”‚
        β”‚                                                    β”‚
        β”‚  QL_QUALITY_*.TIF  ──→  quality masks              β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

    Radiometric Conversion
    ----------------------
    DN to radiance conversion is automatic::

        L_Ξ» = (GAIN Γ— DN + OFFSET) Γ— 1000   [mW/(mΒ²Β·srΒ·nm)]

        Note: DLR gains are multiplicative (not divisive as in some sensors)

    Spectral Configuration
    ----------------------
    EnMAP has two detectors with overlapping coverage:

        Wavelength: 420nm ──── 1000nm ──── 2450nm
                    β”œβ”€β”€ VNIR ─────
                              β”œβ”€β”€β”€β”€ SWIR ───────────
                              β”” overlapβ”˜
                              900-1000nm

    - VNIR: Silicon CCD, ~88 bands, 6.5nm sampling, SNR >500:1
    - SWIR: HgCdTe, ~136 bands, 10nm sampling, SNR >150:1

    Attributes
    ----------
    xml_file : str
        Path to the EnMAP XML metadata file.
    by_folder : bool
        Whether files are organized by folder structure (alternative naming convention).
    swir_file : str
        Path to the SWIR GeoTIFF file (derived from xml_file).
    fs : fsspec.AbstractFileSystem
        Filesystem for file access (local or cloud storage).
    vnir : RasterioReader
        Reader for VNIR spectral image.
    swir : RasterioReader
        Reader for SWIR spectral image.
    wl_center : Dict[str, np.ndarray]
        Center wavelengths per sensor: {'vnir': [...], 'swir': [...]}.
    wl_fwhm : Dict[str, np.ndarray]
        FWHM per sensor: {'vnir': [...], 'swir': [...]}.
    gain_arr : Dict[str, np.ndarray]
        Radiometric gains per sensor for DN→radiance conversion.
    offs_arr : Dict[str, np.ndarray]
        Radiometric offsets per sensor for DN→radiance conversion.
    vnir_range : Tuple[float, float]
        VNIR wavelength range (min, max) including FWHM margins.
    swir_range : Tuple[float, float]
        SWIR wavelength range (min, max) including FWHM margins.
    hsf : float
        Mean ground elevation (m) from scene metadata.
    sza : float
        Solar zenith angle (degrees).
    saa : float
        Solar azimuth angle (degrees).
    vza : float
        View zenith angle (across-track off-nadir angle, degrees).
    vaa : float
        View azimuth angle (scene azimuth, degrees).
    rpcs_vnir : rasterio.rpc.RPC
        Rational Polynomial Coefficients for VNIR refined geolocation.
    rpcs_swir : rasterio.rpc.RPC
        Rational Polynomial Coefficients for SWIR refined geolocation.
    time_coverage_start : datetime
        UTC datetime of acquisition start.
    time_coverage_end : datetime
        UTC datetime of acquisition end.
    units : str
        Radiance units: 'mW/m2/sr/nm'.

    Properties (from underlying readers)
    ------------------------------------
    shape : Tuple[int, int, int]
        Full shape (total_bands, height, width).
    transform : rasterio.Affine
        Affine geotransform from SWIR file.
    crs : rasterio.crs.CRS
        Coordinate reference system from SWIR file.
    bounds : Tuple[float, float, float, float]
        Geographic bounds (xmin, ymin, xmax, ymax).
    res : Tuple[float, float]
        Pixel resolution (x, y).

    Examples
    --------
    Basic loading::

        >>> from georeader.readers.enmap import EnMAP
        >>> 
        >>> enmap = EnMAP('/data/ENMAP01-...-METADATA.XML')
        >>> print(enmap)  # View metadata summary

    Loading specific wavelengths::

        >>> # Load NDVI bands as reflectance
        >>> bands = enmap.load_wavelengths([665, 865], as_reflectance=True)
        >>> print(bands.shape)  # (2, H, W)
        >>> 
        >>> # Compute NDVI
        >>> red, nir = bands.values[0], bands.values[1]
        >>> ndvi = (nir - red) / (nir + red + 1e-10)

    Loading RGB with RPC refinement::

        >>> # Apply RPCs for better geolocation (recommended)
        >>> rgb = enmap.load_rgb(as_reflectance=True, apply_rpcs=True)
        >>> 
        >>> # Without RPCs (uses original map projection)
        >>> rgb = enmap.load_rgb(as_reflectance=True, apply_rpcs=False)

    Loading quality masks::

        >>> # Load cloud mask
        >>> cloud = enmap.load_product('QL_QUALITY_CLOUD')
        >>> 
        >>> # Available products: 
        >>> # 'QL_QUALITY_CLOUD', 'QL_QUALITY_CIRRUS', 'QL_QUALITY_SNOW',
        >>> # 'QL_QUALITY_HAZE', 'QL_PIXELMASK_VNIR', 'QL_PIXELMASK_SWIR'

    Spatial subsetting with window_focus::

        >>> from rasterio.windows import Window
        >>> 
        >>> # Focus on a specific region
        >>> window = Window(col_off=100, row_off=200, width=500, height=500)
        >>> enmap_subset = EnMAP('/path/to/METADATA.XML', window_focus=window)

    Cloud storage access::

        >>> import gcsfs
        >>> 
        >>> fs = gcsfs.GCSFileSystem()
        >>> enmap = EnMAP('gs://bucket/ENMAP-METADATA.XML', fs=fs)

    See Also
    --------
    georeader.readers.emit.EMITImage : EMIT hyperspectral reader
    georeader.readers.prisma.PRISMA : PRISMA hyperspectral reader
    georeader.rasterio_reader.RasterioReader : Base reader for GeoTIFF
    georeader.read.read_rpcs : Apply RPC transformations

    References
    ----------
    - DLR EnMAP Mission: https://www.enmap.org/
    - GFZ enpt Package: https://github.com/GFZ/enpt (metadata parser reference)
    - EnMAP Product Specification Document
    """

    def __init__(
        self,
        xml_file: str,
        by_folder: bool = False,
        window_focus: Optional[Window] = None,
        fs: Optional[fsspec.AbstractFileSystem] = None,
    ) -> None:
        self.xml_file = xml_file
        self.by_folder = by_folder
        if not self.xml_file.endswith(".xml") and not self.xml_file.endswith(".XML"):
            raise ValueError(
                f"Invalid SWIR file path {self.xml_file} must be a XML file"
            )

        if self.by_folder:
            assert (
                PRODUCT_FOLDERS["METADATA"] in self.xml_file
            ), f"Invalid SWIR file path {self.xml_file} must contain {PRODUCT_FOLDERS['METADATA']} if by folder"
            self.swir_file = (
                self.xml_file.replace(
                    PRODUCT_FOLDERS["METADATA"], PRODUCT_FOLDERS["SPECTRAL_IMAGE_SWIR"]
                )
                .replace(".XML", ".TIF")
                .replace(".xml", ".tif")
            )
        else:
            assert (
                "METADATA" in self.xml_file
            ), f"Invalid SWIR file path {self.xml_file} must contain METADATA if not by folder"
            self.swir_file = (
                self.xml_file.replace("METADATA", "SPECTRAL_IMAGE_SWIR")
                .replace(".XML", ".TIF")
                .replace(".xml", ".tif")
            )

        if not self.swir_file.endswith(".tif") and not self.swir_file.endswith(".TIF"):
            raise ValueError(
                f"Invalid SWIR file path {self.swir_file} must be a TIF file"
            )

        if self.xml_file.startswith("gs://") or self.xml_file.startswith("az://"):
            assert fs is not None, "Filesystem must be provided if using cloud storage"
            self.fs = fs
            assert fs.exists(self.xml_file), f"File {self.xml_file} does not exist"
            assert fs.exists(self.swir_file), f"File {self.swir_file} does not exist"
        else:
            self.fs = fs or fsspec.filesystem("file")
            assert os.path.exists(self.xml_file), f"File {self.xml_file} does not exist"
            assert os.path.exists(
                self.swir_file
            ), f"File {self.swir_file} does not exist"

        self.swir = RasterioReader(self.swir_file, window_focus=window_focus)

        if self.by_folder:
            self.vnir = RasterioReader(
                self.swir_file.replace(
                    PRODUCT_FOLDERS["SPECTRAL_IMAGE_SWIR"],
                    PRODUCT_FOLDERS["SPECTRAL_IMAGE_VNIR"],
                ),
                window_focus=window_focus,
            )
        else:
            self.vnir = RasterioReader(
                self.swir_file.replace("SPECTRAL_IMAGE_SWIR", "SPECTRAL_IMAGE_VNIR"),
                window_focus=window_focus,
            )

        with self.fs.open(self.xml_file) as fh:
            (
                self.wl_center,
                self.wl_fwhm,
                self.hsf,
                self.sza,
                self.saa,
                self.vaa,
                self.vza,
                self.gain_arr,
                self.offs_arr,
                startTime,
                endTime,
                self.rpcs_vnir,
                self.rpcs_swir,
            ) = read_xml(fh)

        self.swir_range = (
            self.wl_center["swir"][0] - self.wl_fwhm["swir"][0],
            self.wl_center["swir"][-1] + self.wl_fwhm["swir"][-1],
        )
        self.vnir_range = (
            self.wl_center["vnir"][0] - self.wl_fwhm["vnir"][0],
            self.wl_center["vnir"][-1] + self.wl_fwhm["vnir"][-1],
        )

        self.units = "mW/m2/sr/nm"  # == W/m^2/SR/um
        self.time_coverage_start = startTime
        self.time_coverage_end = endTime
        self._observation_date_correction_factor: Optional[float] = None

    @property
    def observation_date_correction_factor(self) -> float:
        if self._observation_date_correction_factor is None:
            self._observation_date_correction_factor = (
                reflectance.observation_date_correction_factor(
                    date_of_acquisition=self.time_coverage_start,
                    center_coords=self.footprint("EPSG:4326").centroid.coords[0],
                )
            )
        return self._observation_date_correction_factor

    @property
    def window_focus(self) -> Optional[Window]:
        return self.swir.window_focus

    @property
    def shape(self) -> tuple:
        return (
            len(self.wl_center["vnir"]) + len(self.wl_center["swir"]),
        ) + self.swir.shape[-2:]

    @property
    def transform(self):
        return self.swir.transform

    @property
    def crs(self):
        return self.swir.crs

    @property
    def res(self):
        return self.swir.res

    @property
    def width(self):
        return self.window_focus.width

    @property
    def height(self):
        return self.window_focus.height

    @property
    def bounds(self):
        return self.swir.bounds

    @property
    def fill_value_default(self):
        return self.swir.fill_value_default

    def footprint(self, crs: Optional[Any] = None) -> Any:
        return self.swir.footprint(crs=crs)

    def load_product(self, product_name: str) -> GeoTensor:
        if product_name not in PRODUCT_FOLDERS:
            raise ValueError(f"Invalid product name: {product_name}")

        if self.by_folder:
            folder = PRODUCT_FOLDERS[product_name]
            product_path = self.swir_file.replace(
                PRODUCT_FOLDERS["SPECTRAL_IMAGE_SWIR"], folder
            )

            raster_product = RasterioReader(
                product_path, window_focus=self.window_focus
            ).load()
        else:
            product_path = self.swir_file.replace("SPECTRAL_IMAGE_SWIR", product_name)
            raster_product = RasterioReader(
                product_path, window_focus=self.window_focus
            ).load()

        # Convert to radiance if SPECTRAL_IMAGE_SWIR or SPECRTAL_IMAGE_VNIR
        if product_name == "SPECTRAL_IMAGE_SWIR":
            name_coef = "swir"
        elif product_name == "SPECTRAL_IMAGE_VNIR":
            name_coef = "vnir"
        else:
            name_coef = None

        # https://github.com/GFZ/enpt/blob/main/enpt/model/images/images_sensorgeo.py#L327
        # LΞ» = QCAL * GAIN + OFFSET
        # NOTE: - DLR provides gains between 2000 and 10000, so we have to DEVIDE by gains
        #       - DLR gains / offsets are provided in W/m2/sr/nm, so we have to multiply by 1000 to get
        #         mW/m2/sr/nm as needed later
        if name_coef is not None:
            gain = self.gain_arr[name_coef]
            offset = self.offs_arr[name_coef]
            invalids = raster_product.values == raster_product.fill_value_default
            raster_product.values = (
                gain[:, np.newaxis, np.newaxis] * raster_product.values
                + offset[:, np.newaxis, np.newaxis]
            ) * SC_COEFF
            raster_product.values[invalids] = self.fill_value_default

        return raster_product

    def load_wavelengths(
        self,
        wavelengths: Union[float, List[float], NDArray],
        as_reflectance: bool = True,
    ) -> Union[GeoTensor, NDArray]:
        """
        Load the reflectance of the given wavelengths

        Args:
            wavelengths (Union[float, List[float], NDArray]): List of wavelengths to load
            as_reflectance (bool, optional): return the values as reflectance rather than radiance.
                Defaults to True. If False values will have units of W/m^2/SR/um == mW/m2/sr/nm (`self.units`)

        Returns:
            Union[GeoTensor, NDArray]: GeoTensor with the values in reflectance or radiance units.

        Raises:
            ValueError: If any wavelength is outside the sensor's range.
        """
        if isinstance(wavelengths, Number):
            wavelengths = np.array([wavelengths])
        else:
            wavelengths = np.array(wavelengths)

        # Check all wavelengths are within the range of the sensor
        if any(
            [
                wvl < self.vnir_range[0] or wvl > self.swir_range[1]
                for wvl in wavelengths
            ]
        ):
            raise ValueError(
                f"Invalid wavelength range, must be between {self.vnir_range[0]} and {self.swir_range[1]}"
            )

        wavelengths_loaded = []
        fwhm = []
        ltoa_img = []
        for b in range(len(wavelengths)):
            if (
                wavelengths[b] >= self.swir_range[0]
                and wavelengths[b] < self.swir_range[1]
            ):
                index_band = np.argmin(np.abs(wavelengths[b] - self.wl_center["swir"]))
                fwhm.append(self.wl_fwhm["swir"][index_band])
                wavelengths_loaded.append(self.wl_center["swir"][index_band])
                rst = self.swir.isel({"band": [index_band]}).load().squeeze()
                invalids = (rst.values == rst.fill_value_default) | np.isnan(rst.values)

                # Convert to radiance
                gain = self.gain_arr["swir"][index_band]
                offset = self.offs_arr["swir"][index_band]
                img = (gain * rst.values + offset) * SC_COEFF
                img[invalids] = self.fill_value_default
            else:
                index_band = np.argmin(np.abs(wavelengths[b] - self.wl_center["vnir"]))
                fwhm.append(self.wl_fwhm["vnir"][index_band])
                wavelengths_loaded.append(self.wl_center["vnir"][index_band])
                rst = self.vnir.isel({"band": [index_band]}).load().squeeze()
                invalids = (rst.values == rst.fill_value_default) | np.isnan(rst.values)

                # Convert to radiance
                gain = self.gain_arr["vnir"][index_band]
                offset = self.offs_arr["vnir"][index_band]
                img = (gain * rst.values + offset) * SC_COEFF
                img[invalids] = self.fill_value_default

            ltoa_img.append(img)

        ltoa_img = GeoTensor(
            np.stack(ltoa_img, axis=0),
            transform=self.transform,
            crs=self.crs,
            fill_value_default=self.fill_value_default,
        )

        if as_reflectance:
            thuiller = reflectance.load_thuillier_irradiance()
            response = reflectance.srf(
                wavelengths_loaded, fwhm, thuiller["Nanometer"].values
            )

            solar_irradiance_norm = thuiller["Radiance(mW/m2/nm)"].values.dot(
                response
            )  # mW/m$^2$/SR/nm
            solar_irradiance_norm /= 1_000  # W/m$^2$/nm

            # Divide by 10 to convert from mW/m^2/SR/nm to Β΅W /cmΒ²/SR/nm
            ltoa_img = reflectance.radiance_to_reflectance(
                ltoa_img,
                solar_irradiance_norm,
                units=self.units,
                observation_date_corr_factor=self.observation_date_correction_factor,
            )

        return ltoa_img

    def load_rgb(
        self,
        as_reflectance: bool = True,
        apply_rpcs: bool = True,
        dst_crs: str = "EPSG:4326",
        resolution_dst_crs: Optional[Union[float, Tuple[float, float]]] = None,
    ) -> GeoTensor:
        """
        Load RGB image from VNIR bands. Converts radiance to TOA reflectance if as_reflectance is True
        otherwise it will return the radiance values in W/m^2/SR/um == mW/m2/sr/nm (`self.units`)

        Args:
            as_reflectance (bool, optional): Convert radiance to TOA reflectance. Defaults to True.
            apply_rpcs (bool, optional): Apply RPCs to the image. Defaults to True.
            dst_crs (str, optional): Destination CRS. Defaults to "EPSG:4326".
            resolution_dst_crs (Optional[Union[float, Tuple[float, float]]], optional):
                Resolution of the destination CRS. Defaults to None.
        Returns:
            GeoTensor: with the RGB image
        """
        rgb = self.load_wavelengths(WAVELENGTHS_RGB, as_reflectance=as_reflectance)
        if apply_rpcs:
            return read.read_rpcs(
                rgb.values,
                rpcs=self.rpcs_vnir,
                dst_crs=dst_crs,
                resolution_dst_crs=resolution_dst_crs,
                fill_value_default=rgb.fill_value_default,
            )
        elif dst_crs is not None:
            return read.read_to_crs(
                rgb, resolution_dst_crs=resolution_dst_crs, dst_crs=dst_crs
            )

        return rgb

    def load(self) -> GeoTensor:
        swir = self.load_product("SPECTRAL_IMAGE_SWIR")
        # vnir = self.load_product('SPECTRAL_IMAGE_VNIR')

        return swir

    def __repr__(self) -> str:
        return f"""
        File: {self.xml_file}
        Bounds: {self.bounds}
        Time: {self.time_coverage_start}
        Spatial shape (height, width): {self.height, self.width}
        VNIR Range: {self.vnir_range} nbands: {len(self.wl_center['vnir'])} 
        SWIR Range: {self.swir_range} nbands: {len(self.wl_center['swir'])}
        """

load_rgb(as_reflectance=True, apply_rpcs=True, dst_crs='EPSG:4326', resolution_dst_crs=None)

Load RGB image from VNIR bands. Converts radiance to TOA reflectance if as_reflectance is True otherwise it will return the radiance values in W/m^2/SR/um == mW/m2/sr/nm (self.units)

Parameters:

Name Type Description Default
as_reflectance bool

Convert radiance to TOA reflectance. Defaults to True.

True
apply_rpcs bool

Apply RPCs to the image. Defaults to True.

True
dst_crs str

Destination CRS. Defaults to "EPSG:4326".

'EPSG:4326'
resolution_dst_crs Optional[Union[float, Tuple[float, float]]]

Resolution of the destination CRS. Defaults to None.

None

Returns: GeoTensor: with the RGB image

Source code in georeader/readers/enmap.py
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
def load_rgb(
    self,
    as_reflectance: bool = True,
    apply_rpcs: bool = True,
    dst_crs: str = "EPSG:4326",
    resolution_dst_crs: Optional[Union[float, Tuple[float, float]]] = None,
) -> GeoTensor:
    """
    Load RGB image from VNIR bands. Converts radiance to TOA reflectance if as_reflectance is True
    otherwise it will return the radiance values in W/m^2/SR/um == mW/m2/sr/nm (`self.units`)

    Args:
        as_reflectance (bool, optional): Convert radiance to TOA reflectance. Defaults to True.
        apply_rpcs (bool, optional): Apply RPCs to the image. Defaults to True.
        dst_crs (str, optional): Destination CRS. Defaults to "EPSG:4326".
        resolution_dst_crs (Optional[Union[float, Tuple[float, float]]], optional):
            Resolution of the destination CRS. Defaults to None.
    Returns:
        GeoTensor: with the RGB image
    """
    rgb = self.load_wavelengths(WAVELENGTHS_RGB, as_reflectance=as_reflectance)
    if apply_rpcs:
        return read.read_rpcs(
            rgb.values,
            rpcs=self.rpcs_vnir,
            dst_crs=dst_crs,
            resolution_dst_crs=resolution_dst_crs,
            fill_value_default=rgb.fill_value_default,
        )
    elif dst_crs is not None:
        return read.read_to_crs(
            rgb, resolution_dst_crs=resolution_dst_crs, dst_crs=dst_crs
        )

    return rgb

load_wavelengths(wavelengths, as_reflectance=True)

Load the reflectance of the given wavelengths

Parameters:

Name Type Description Default
wavelengths Union[float, List[float], NDArray]

List of wavelengths to load

required
as_reflectance bool

return the values as reflectance rather than radiance. Defaults to True. If False values will have units of W/m^2/SR/um == mW/m2/sr/nm (self.units)

True

Returns:

Type Description
Union[GeoTensor, NDArray]

Union[GeoTensor, NDArray]: GeoTensor with the values in reflectance or radiance units.

Raises:

Type Description
ValueError

If any wavelength is outside the sensor's range.

Source code in georeader/readers/enmap.py
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
def load_wavelengths(
    self,
    wavelengths: Union[float, List[float], NDArray],
    as_reflectance: bool = True,
) -> Union[GeoTensor, NDArray]:
    """
    Load the reflectance of the given wavelengths

    Args:
        wavelengths (Union[float, List[float], NDArray]): List of wavelengths to load
        as_reflectance (bool, optional): return the values as reflectance rather than radiance.
            Defaults to True. If False values will have units of W/m^2/SR/um == mW/m2/sr/nm (`self.units`)

    Returns:
        Union[GeoTensor, NDArray]: GeoTensor with the values in reflectance or radiance units.

    Raises:
        ValueError: If any wavelength is outside the sensor's range.
    """
    if isinstance(wavelengths, Number):
        wavelengths = np.array([wavelengths])
    else:
        wavelengths = np.array(wavelengths)

    # Check all wavelengths are within the range of the sensor
    if any(
        [
            wvl < self.vnir_range[0] or wvl > self.swir_range[1]
            for wvl in wavelengths
        ]
    ):
        raise ValueError(
            f"Invalid wavelength range, must be between {self.vnir_range[0]} and {self.swir_range[1]}"
        )

    wavelengths_loaded = []
    fwhm = []
    ltoa_img = []
    for b in range(len(wavelengths)):
        if (
            wavelengths[b] >= self.swir_range[0]
            and wavelengths[b] < self.swir_range[1]
        ):
            index_band = np.argmin(np.abs(wavelengths[b] - self.wl_center["swir"]))
            fwhm.append(self.wl_fwhm["swir"][index_band])
            wavelengths_loaded.append(self.wl_center["swir"][index_band])
            rst = self.swir.isel({"band": [index_band]}).load().squeeze()
            invalids = (rst.values == rst.fill_value_default) | np.isnan(rst.values)

            # Convert to radiance
            gain = self.gain_arr["swir"][index_band]
            offset = self.offs_arr["swir"][index_band]
            img = (gain * rst.values + offset) * SC_COEFF
            img[invalids] = self.fill_value_default
        else:
            index_band = np.argmin(np.abs(wavelengths[b] - self.wl_center["vnir"]))
            fwhm.append(self.wl_fwhm["vnir"][index_band])
            wavelengths_loaded.append(self.wl_center["vnir"][index_band])
            rst = self.vnir.isel({"band": [index_band]}).load().squeeze()
            invalids = (rst.values == rst.fill_value_default) | np.isnan(rst.values)

            # Convert to radiance
            gain = self.gain_arr["vnir"][index_band]
            offset = self.offs_arr["vnir"][index_band]
            img = (gain * rst.values + offset) * SC_COEFF
            img[invalids] = self.fill_value_default

        ltoa_img.append(img)

    ltoa_img = GeoTensor(
        np.stack(ltoa_img, axis=0),
        transform=self.transform,
        crs=self.crs,
        fill_value_default=self.fill_value_default,
    )

    if as_reflectance:
        thuiller = reflectance.load_thuillier_irradiance()
        response = reflectance.srf(
            wavelengths_loaded, fwhm, thuiller["Nanometer"].values
        )

        solar_irradiance_norm = thuiller["Radiance(mW/m2/nm)"].values.dot(
            response
        )  # mW/m$^2$/SR/nm
        solar_irradiance_norm /= 1_000  # W/m$^2$/nm

        # Divide by 10 to convert from mW/m^2/SR/nm to Β΅W /cmΒ²/SR/nm
        ltoa_img = reflectance.radiance_to_reflectance(
            ltoa_img,
            solar_irradiance_norm,
            units=self.units,
            observation_date_corr_factor=self.observation_date_correction_factor,
        )

    return ltoa_img

Carbon Mapper Reader

The Carbon Mapper reader provides typed access to the Carbon Mapper STAC catalogue and plume API β€” atmospheric methane / carbon-dioxide retrievals from the Tanager-1, EMIT, AVIRIS, and GAO instruments. Carbon Mapper publishes:

  • L2B scenes (per-pixel CH4 column-matched-filter, RGB, uncertainty, artifact-mask) addressed by scene_id in the l2b-ch4-mfa-v3a STAC collection.
  • L3A per-plume rasters (alpha-banded delineated plume mask) addressed by plume_id in the l3a collection.
  • Source records β€” DBSCAN clusters of plumes detected at the same physical site, addressed by deterministic source_name.

Key features:

  • Token-aware HTTP client (obtain_token, refresh_token, download_asset) with file-based persistence (CarbonMapperConfig).
  • Typed query layer (CMTileItem, CMRawPlume, CMSource, exception hierarchy) β€” never returns raw dicts.
  • Lazy raster wrappers (CMImageRaster, CMPlumeRaster) backed by RasterioReader. CMPlumeRaster.polygon() extracts the authoritative plume polygon from the L3A plume_tif band-4 alpha mask β€” the upstream source of truth for plume geometry.
  • Cross-resolution helpers: get_tile_for_plume, get_source_for_plume, list_tiles_for_source, list_plumes_for_tile.

Optional install: the reader is gated behind the [carbonmapper] extra to keep the base install minimal:

pip install 'georeader-spaceml[carbonmapper]'

This pulls in pydantic (for CMRawPlume) and requests (for the API client). Azure SDK is intentionally not included β€” downstream consumers can layer keyvault-backed token loading on top of CarbonMapperConfig.

API Reference

High-level typed queries over the Carbon Mapper REST + STAC APIs.

This module is the typed, cross-resolution layer that sits between the raw HTTP wrappers in :mod:georeader.readers.carbonmapper.download and consumers (the Phase 2 DailyMonitoringCM ETL, analyst notebooks, future Partner-feed backfills).

Why this exists

:mod:download exposes ~16 low-level endpoint wrappers that return raw JSON / pandas DataFrames. Every consumer otherwise has to:

  1. Pick the right endpoint (/catalog/plume-csv vs /catalog/plumes/annotated vs STAC search β€” all three have different schemas).
  2. Parse the response into something usable.
  3. Stitch resources together by hand: plume β†’ scene_id via rsplit("-", 1)[0], scene_id β†’ STAC item, plume β†’ source via /catalog/source/plume/name/{plume_id}.

This module lifts those patterns into:

  • One function per logical question (not per HTTP endpoint).
  • Typed return values (:class:CMRawPlume, :class:CMTileItem, :class:CMSource) β€” never raw dicts.
  • Owned knowledge of the bbox-encoding (data_model Β§2.1) and source_name query-suffix (data_model Β§2.2) quirks.

Failure modes

The exception hierarchy is part of the contract:

  • :class:CMPlumeNotFound β€” get_plume 404.
  • :class:CMSourceNotFound β€” get_source 404.
  • :class:CMSceneNotPublished β€” get_tile / get_tile_for_plume 404 (CM publishes L2B selectively β€” data_model Β§5.2). The cross-resolution helper :func:get_tile_for_plume catches this and returns None; the single-resource :func:get_tile re-raises so callers can choose to defer.

Examples

"What does CM know about this plume?":

from georeader.readers.carbonmapper.api_queries import get_plume_context plume, tile, source = get_plume_context(token, "tan20251212t185057c20s4001-E") plume.plume_id 'tan20251212t185057c20s4001-E' tile.scene_id if tile else None 'tan20251212t185057c20s4001' source.sector if source else None # may be None if unattributed '1B2'

"All tiles ever observing this chronic emitter":

from georeader.readers.carbonmapper.api_queries import list_tiles_for_source tiles = list_tiles_for_source(token, "CH4_1B2_100m_-104.17525_32.49125") {t.platform for t in tiles}

See also

georeader.readers.carbonmapper.download : raw HTTP / JSON wrappers. georeader.readers.carbonmapper.plume.CMRawPlume : typed plume model. georeader.readers.carbonmapper.source.CMSource : typed source model.

CMTileItem dataclass

Lightweight Carbon Mapper L2B STAC item β€” API-only, no DB binding.

The DB-bound counterpart is CarbonMapperTile (Phase 1). The promotion direction (API β†’ DB) lives on the DB side via CarbonMapperTile.from_cm_tile_item(item, cm_provider=...); this keeps :mod:api_queries free of any database imports.

Frozen so instances are hashable and safe to use as dict keys when deduplicating scene_ids in cross-resolution queries.

Attributes

scene_id: STAC item id β€” equivalent to plume_id.rsplit("-", 1)[0] for plumes that came from this scene. collection: STAC collection id, e.g. "l2b-ch4-mfa-v3a". datetime: UTC-aware acquisition time parsed from properties["datetime"]. platform: properties["platform"] β€” "Tanager1", "EMIT", etc. bbox: (W, S, E, N) in WGS-84 decimal degrees. geometry: Shapely geometry (typically a Polygon) of the scene footprint. asset_urls: Mapping of asset name β†’ href URL, e.g. {"cmf": "https://.../cmf.tif", "rgb": ...}. The L2B CH4 collection consistently exposes cmf, rgb, uncertainty, and artifact-mask. properties: Full properties mapping from the STAC item. raw: Original STAC item dict β€” useful for fields not yet exposed on the dataclass.

Examples

from georeader.readers.carbonmapper.api_queries import CMTileItem tile = CMTileItem.from_stac_item({ ... "id": "tan20251212t185057c20s4001", ... "collection": "l2b-ch4-mfa-v3a", ... "properties": {"datetime": "2025-12-12T18:50:57Z", ... "platform": "Tanager1"}, ... "bbox": [-103.6, 31.4, -103.4, 31.6], ... "geometry": {"type": "Polygon", "coordinates": [ ... [[-103.6, 31.4], [-103.4, 31.4], ... [-103.4, 31.6], [-103.6, 31.6], [-103.6, 31.4]]]}, ... "assets": {"cmf": {"href": "https://cm/.../cmf.tif"}}, ... }) tile.scene_id, tile.platform ('tan20251212t185057c20s4001', 'Tanager1') tile.asset_urls["cmf"] 'https://cm/.../cmf.tif'

Source code in georeader/readers/carbonmapper/api_queries.py
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
@dataclass(frozen=True)
class CMTileItem:
    """Lightweight Carbon Mapper L2B STAC item β€” API-only, no DB binding.

    The DB-bound counterpart is ``CarbonMapperTile`` (Phase 1). The
    promotion direction (API β†’ DB) lives on the *DB* side via
    ``CarbonMapperTile.from_cm_tile_item(item, cm_provider=...)``; this
    keeps :mod:`api_queries` free of any database imports.

    Frozen so instances are hashable and safe to use as dict keys when
    deduplicating ``scene_ids`` in cross-resolution queries.

    Attributes
    ----------
    scene_id:
        STAC item id β€” equivalent to ``plume_id.rsplit("-", 1)[0]`` for
        plumes that came from this scene.
    collection:
        STAC collection id, e.g. ``"l2b-ch4-mfa-v3a"``.
    datetime:
        UTC-aware acquisition time parsed from
        ``properties["datetime"]``.
    platform:
        ``properties["platform"]`` β€” ``"Tanager1"``, ``"EMIT"``, etc.
    bbox:
        ``(W, S, E, N)`` in WGS-84 decimal degrees.
    geometry:
        Shapely geometry (typically a Polygon) of the scene footprint.
    asset_urls:
        Mapping of asset name β†’ href URL, e.g.
        ``{"cmf": "https://.../cmf.tif", "rgb": ...}``. The L2B CH4
        collection consistently exposes ``cmf``, ``rgb``,
        ``uncertainty``, and ``artifact-mask``.
    properties:
        Full ``properties`` mapping from the STAC item.
    raw:
        Original STAC item dict β€” useful for fields not yet exposed
        on the dataclass.

    Examples
    --------
    >>> from georeader.readers.carbonmapper.api_queries import CMTileItem
    >>> tile = CMTileItem.from_stac_item({
    ...     "id": "tan20251212t185057c20s4001",
    ...     "collection": "l2b-ch4-mfa-v3a",
    ...     "properties": {"datetime": "2025-12-12T18:50:57Z",
    ...                    "platform": "Tanager1"},
    ...     "bbox": [-103.6, 31.4, -103.4, 31.6],
    ...     "geometry": {"type": "Polygon", "coordinates": [
    ...         [[-103.6, 31.4], [-103.4, 31.4],
    ...          [-103.4, 31.6], [-103.6, 31.6], [-103.6, 31.4]]]},
    ...     "assets": {"cmf": {"href": "https://cm/.../cmf.tif"}},
    ... })
    >>> tile.scene_id, tile.platform
    ('tan20251212t185057c20s4001', 'Tanager1')
    >>> tile.asset_urls["cmf"]
    'https://cm/.../cmf.tif'
    """

    scene_id: str
    collection: str
    datetime: datetime
    platform: str
    bbox: tuple[float, float, float, float]
    geometry: BaseGeometry
    asset_urls: Mapping[str, str]
    properties: Mapping[str, Any]
    raw: Mapping[str, Any]

    @classmethod
    def from_stac_item(cls, item: Mapping[str, Any]) -> "CMTileItem":
        """Build a :class:`CMTileItem` from a raw STAC item dict.

        Tolerates both string and pre-parsed datetime values for
        ``properties["datetime"]`` and falls back to ``utcnow`` if the
        property is missing entirely.

        Parameters
        ----------
        item:
            STAC item dict (Feature shape) as returned by
            :func:`georeader.readers.carbonmapper.download.stac_get_item` or
            :func:`georeader.readers.carbonmapper.download.stac_search`.

        Returns
        -------
        CMTileItem

        Raises
        ------
        ValueError
            If ``item["bbox"]`` is missing or not 4-length.
        """
        props = dict(item.get("properties") or {})
        bbox = tuple(item.get("bbox") or ())
        if len(bbox) != 4:
            raise ValueError(f"STAC item missing 4-tuple bbox: {item.get('id')!r}")

        dt_raw = props.get("datetime")
        if isinstance(dt_raw, datetime):
            dt = dt_raw
        elif isinstance(dt_raw, str):
            dt = datetime.fromisoformat(dt_raw.replace("Z", "+00:00"))
        else:
            dt = datetime.now(timezone.utc)

        geom_dict = item.get("geometry") or {}
        if not geom_dict:
            raise ValueError(f"STAC item missing geometry: {item.get('id')!r}")
        geom = shape(geom_dict)  # type: ignore[arg-type]

        assets = item.get("assets") or {}
        asset_urls = {
            name: asset.get("href", "")
            for name, asset in assets.items()
            if isinstance(asset, Mapping)
        }

        return cls(
            scene_id=str(item.get("id", "")),
            collection=str(item.get("collection", "")),
            datetime=dt,
            platform=str(props.get("platform", "")),
            bbox=(float(bbox[0]), float(bbox[1]), float(bbox[2]), float(bbox[3])),
            geometry=geom,
            asset_urls=asset_urls,
            properties=props,
            raw=dict(item),
        )

from_stac_item(item) classmethod

Build a :class:CMTileItem from a raw STAC item dict.

Tolerates both string and pre-parsed datetime values for properties["datetime"] and falls back to utcnow if the property is missing entirely.

Parameters

item: STAC item dict (Feature shape) as returned by :func:georeader.readers.carbonmapper.download.stac_get_item or :func:georeader.readers.carbonmapper.download.stac_search.

Returns

CMTileItem

Raises

ValueError If item["bbox"] is missing or not 4-length.

Source code in georeader/readers/carbonmapper/api_queries.py
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
@classmethod
def from_stac_item(cls, item: Mapping[str, Any]) -> "CMTileItem":
    """Build a :class:`CMTileItem` from a raw STAC item dict.

    Tolerates both string and pre-parsed datetime values for
    ``properties["datetime"]`` and falls back to ``utcnow`` if the
    property is missing entirely.

    Parameters
    ----------
    item:
        STAC item dict (Feature shape) as returned by
        :func:`georeader.readers.carbonmapper.download.stac_get_item` or
        :func:`georeader.readers.carbonmapper.download.stac_search`.

    Returns
    -------
    CMTileItem

    Raises
    ------
    ValueError
        If ``item["bbox"]`` is missing or not 4-length.
    """
    props = dict(item.get("properties") or {})
    bbox = tuple(item.get("bbox") or ())
    if len(bbox) != 4:
        raise ValueError(f"STAC item missing 4-tuple bbox: {item.get('id')!r}")

    dt_raw = props.get("datetime")
    if isinstance(dt_raw, datetime):
        dt = dt_raw
    elif isinstance(dt_raw, str):
        dt = datetime.fromisoformat(dt_raw.replace("Z", "+00:00"))
    else:
        dt = datetime.now(timezone.utc)

    geom_dict = item.get("geometry") or {}
    if not geom_dict:
        raise ValueError(f"STAC item missing geometry: {item.get('id')!r}")
    geom = shape(geom_dict)  # type: ignore[arg-type]

    assets = item.get("assets") or {}
    asset_urls = {
        name: asset.get("href", "")
        for name, asset in assets.items()
        if isinstance(asset, Mapping)
    }

    return cls(
        scene_id=str(item.get("id", "")),
        collection=str(item.get("collection", "")),
        datetime=dt,
        platform=str(props.get("platform", "")),
        bbox=(float(bbox[0]), float(bbox[1]), float(bbox[2]), float(bbox[3])),
        geometry=geom,
        asset_urls=asset_urls,
        properties=props,
        raw=dict(item),
    )

CMAPIError

Bases: Exception

Base for everything raised by :mod:api_queries.

Catch this to handle any expected Carbon Mapper API miss in one block. requests.HTTPError for non-404 statuses (e.g. 500, 429) propagates unchanged β€” those are infra issues, not data issues.

Source code in georeader/readers/carbonmapper/api_queries.py
 99
100
101
102
103
104
105
class CMAPIError(Exception):
    """Base for everything raised by :mod:`api_queries`.

    Catch this to handle any expected Carbon Mapper API miss in one
    block. ``requests.HTTPError`` for non-404 statuses (e.g. 500, 429)
    propagates unchanged β€” those are infra issues, not data issues.
    """

CMPlumeNotFound

Bases: CMAPIError

Raised by :func:get_plume when the plume is unknown to CM.

The unmodified plume_id is preserved on the instance for logging.

Examples

try: ... get_plume(token, "tan-does-not-exist") # doctest: +SKIP ... except CMPlumeNotFound as exc: ... log.warning("missing plume", plume_id=exc.plume_id)

Source code in georeader/readers/carbonmapper/api_queries.py
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
class CMPlumeNotFound(CMAPIError):
    """Raised by :func:`get_plume` when the plume is unknown to CM.

    The unmodified ``plume_id`` is preserved on the instance for
    logging.

    Examples
    --------
    >>> try:
    ...     get_plume(token, "tan-does-not-exist")  # doctest: +SKIP
    ... except CMPlumeNotFound as exc:
    ...     log.warning("missing plume", plume_id=exc.plume_id)
    """

    def __init__(self, plume_id: str):
        super().__init__(f"Plume not found: {plume_id}")
        self.plume_id = plume_id

CMSceneNotPublished

Bases: CMAPIError

Raised when STAC has no L2B item for a given scene_id.

Carbon Mapper publishes L2B selectively (data_model.md Β§5.2): plumes can exist for scenes whose L2B raster has not been (or never will be) released. The Phase 2 promotion path defers such plumes rather than failing hard.

The :func:get_tile single-resource fetcher raises this so callers can pick a strategy; the cross-resolution :func:get_tile_for_plume catches it and returns None.

Source code in georeader/readers/carbonmapper/api_queries.py
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
class CMSceneNotPublished(CMAPIError):
    """Raised when STAC has no L2B item for a given ``scene_id``.

    Carbon Mapper publishes L2B selectively (``data_model.md Β§5.2``):
    plumes can exist for scenes whose L2B raster has not been (or never
    will be) released. The Phase 2 promotion path defers such plumes
    rather than failing hard.

    The :func:`get_tile` single-resource fetcher *raises* this so
    callers can pick a strategy; the cross-resolution
    :func:`get_tile_for_plume` *catches* it and returns ``None``.
    """

    def __init__(self, scene_id: str):
        super().__init__(f"L2B scene not published: {scene_id}")
        self.scene_id = scene_id

CMSourceNotFound

Bases: CMAPIError

Raised by :func:get_source when the source name is unknown.

The (cleaned, query-suffix-stripped) source_name is preserved on the instance.

Source code in georeader/readers/carbonmapper/api_queries.py
127
128
129
130
131
132
133
134
135
136
class CMSourceNotFound(CMAPIError):
    """Raised by :func:`get_source` when the source name is unknown.

    The (cleaned, query-suffix-stripped) ``source_name`` is preserved
    on the instance.
    """

    def __init__(self, source_name: str):
        super().__init__(f"Source not found: {source_name}")
        self.source_name = source_name

get_tile(token, scene_id, *, collection=DEFAULT_L2B_COLLECTION)

Fetch a single L2B STAC item by scene_id.

Wraps GET /stac/collections/{collection}/items/{scene_id}.

Parameters

token: Bearer token (STAC item endpoints accept anonymous reads for published items, but auth surfaces additional fields). scene_id: The L2B scene_id, equal to plume_id.rsplit("-", 1)[0] for any plume that came from this scene. collection: STAC collection β€” defaults to :data:DEFAULT_L2B_COLLECTION (CH4 matched-filter v3a). Override for CO2 or earlier versions.

Returns

CMTileItem

Raises

CMSceneNotPublished When the L2B item has not been published yet (HTTP 404). Re-raised β€” not caught β€” so callers can choose to defer.

Examples

tile = get_tile(token, "tan20251212t185057c20s4001") # doctest: +SKIP tile.platform, list(tile.asset_urls) ('Tanager1', ['cmf', 'rgb', 'uncertainty', 'artifact-mask'])

Source code in georeader/readers/carbonmapper/api_queries.py
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
def get_tile(
    token: str,
    scene_id: str,
    *,
    collection: str = DEFAULT_L2B_COLLECTION,
) -> CMTileItem:
    """Fetch a single L2B STAC item by ``scene_id``.

    Wraps ``GET /stac/collections/{collection}/items/{scene_id}``.

    Parameters
    ----------
    token:
        Bearer token (STAC item endpoints accept anonymous reads for
        published items, but auth surfaces additional fields).
    scene_id:
        The L2B scene_id, equal to ``plume_id.rsplit("-", 1)[0]`` for
        any plume that came from this scene.
    collection:
        STAC collection β€” defaults to :data:`DEFAULT_L2B_COLLECTION`
        (CH4 matched-filter v3a). Override for CO2 or earlier versions.

    Returns
    -------
    CMTileItem

    Raises
    ------
    CMSceneNotPublished
        When the L2B item has not been published yet (HTTP 404).
        Re-raised β€” not caught β€” so callers can choose to defer.

    Examples
    --------
    >>> tile = get_tile(token, "tan20251212t185057c20s4001")  # doctest: +SKIP
    >>> tile.platform, list(tile.asset_urls)
    ('Tanager1', ['cmf', 'rgb', 'uncertainty', 'artifact-mask'])
    """
    try:
        raw = _dl.stac_get_item(collection, scene_id, token=token)
    except requests.HTTPError as exc:
        if _is_404(exc):
            raise CMSceneNotPublished(scene_id) from exc
        raise
    return CMTileItem.from_stac_item(raw)

get_plume(token, plume_id)

Fetch a single plume by its CM plume_id.

Wraps GET /catalog/plume/{id} and parses the result through :class:CMRawPlume.

Parameters

token: Carbon Mapper Bearer token. Required for non-public fields. plume_id: Either the colloquial name (e.g. "tan20251212t185057c20s4001-E") or the UUID form.

Returns

CMRawPlume

Raises

CMPlumeNotFound When the API returns 404. requests.HTTPError For non-404 errors (5xx, 429, etc.).

Examples

plume = get_plume(token, "tan20251212t185057c20s4001-E") # doctest: +SKIP plume.plume_id, plume.gas ('tan20251212t185057c20s4001-E', 'CH4')

Source code in georeader/readers/carbonmapper/api_queries.py
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
def get_plume(token: str, plume_id: str) -> CMRawPlume:
    """Fetch a single plume by its CM ``plume_id``.

    Wraps ``GET /catalog/plume/{id}`` and parses the result through
    :class:`CMRawPlume`.

    Parameters
    ----------
    token:
        Carbon Mapper Bearer token. Required for non-public fields.
    plume_id:
        Either the colloquial name (e.g.
        ``"tan20251212t185057c20s4001-E"``) or the UUID form.

    Returns
    -------
    CMRawPlume

    Raises
    ------
    CMPlumeNotFound
        When the API returns 404.
    requests.HTTPError
        For non-404 errors (5xx, 429, etc.).

    Examples
    --------
    >>> plume = get_plume(token, "tan20251212t185057c20s4001-E")  # doctest: +SKIP
    >>> plume.plume_id, plume.gas
    ('tan20251212t185057c20s4001-E', 'CH4')
    """
    try:
        raw = _dl.get_plume_by_id(plume_id, token=token)
    except requests.HTTPError as exc:
        if _is_404(exc):
            raise CMPlumeNotFound(plume_id) from exc
        raise
    return CMRawPlume(**raw)

get_source(token, source_name)

Fetch a single Carbon Mapper source by its canonical name.

Strips the source-name query-string suffix (?plume_gas=...) automatically (data_model Β§2.2) β€” pass either the dirty or clean form.

Parameters

token: Bearer token. source_name: Canonical or query-suffixed source name, e.g. "CH4_1B2_100m_-104.17525_32.49125" or "CH4_1B2_100m_-104.17525_32.49125?plume_gas=CH4".

Returns

CMSource

Raises

CMSourceNotFound When the API returns 404.

Examples

src = get_source(token, "CH4_1B2_100m_-104.17525_32.49125") # doctest: +SKIP src.sector, src.plume_count ('1B2', 12)

Source code in georeader/readers/carbonmapper/api_queries.py
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
def get_source(token: str, source_name: str) -> CMSource:
    """Fetch a single Carbon Mapper source by its canonical name.

    Strips the source-name query-string suffix (``?plume_gas=...``)
    automatically (``data_model Β§2.2``) β€” pass either the dirty or
    clean form.

    Parameters
    ----------
    token:
        Bearer token.
    source_name:
        Canonical or query-suffixed source name, e.g.
        ``"CH4_1B2_100m_-104.17525_32.49125"`` or
        ``"CH4_1B2_100m_-104.17525_32.49125?plume_gas=CH4"``.

    Returns
    -------
    CMSource

    Raises
    ------
    CMSourceNotFound
        When the API returns 404.

    Examples
    --------
    >>> src = get_source(token, "CH4_1B2_100m_-104.17525_32.49125")  # doctest: +SKIP
    >>> src.sector, src.plume_count
    ('1B2', 12)
    """
    cleaned = _strip_query_suffix(source_name)
    try:
        raw = _dl.get_source_by_name(cleaned, token=token)
    except requests.HTTPError as exc:
        if _is_404(exc):
            raise CMSourceNotFound(cleaned) from exc
        raise
    # The single-source endpoint can return either a Feature or properties
    # directly; coerce to a Feature shape so CMSource.from_geojson_feature
    # handles both.
    if "properties" not in raw and "source_name" in raw:
        feature = {"properties": dict(raw),
                   "geometry": {"type": "Point",
                                "coordinates": [raw.get("lon"), raw.get("lat")]}}
    else:
        feature = dict(raw)

    # The /catalog/source/{name} endpoint sometimes returns top-level
    # geometry with null coords and stashes the real centroid under
    # properties.point β€” fall back to that when the outer geometry is
    # unusable.
    geom = feature.get("geometry") or {}
    coords = geom.get("coordinates") or [None, None]
    if not coords or coords[0] is None or coords[1] is None:
        props = feature.get("properties") or {}
        point = props.get("point") or {}
        pcoords = point.get("coordinates") if isinstance(point, dict) else None
        if pcoords and pcoords[0] is not None and pcoords[1] is not None:
            feature = dict(feature)
            feature["geometry"] = {"type": "Point", "coordinates": list(pcoords)}

    return CMSource.from_geojson_feature(feature)

get_tile_for_plume(token, plume_id, *, collection=DEFAULT_L2B_COLLECTION)

Resolve a plume to its parent L2B STAC item.

Derives the parent scene_id via plume_id.rsplit("-", 1)[0] and looks up the corresponding STAC item.

Unlike :func:get_tile, this helper catches :class:CMSceneNotPublished and returns None β€” appropriate for consumers (Phase 2 ETL) that want to defer rather than error.

Parameters

token: Bearer token. plume_id: Colloquial plume id (with the -{part} suffix). collection: STAC collection β€” defaults to :data:DEFAULT_L2B_COLLECTION.

Returns

CMTileItem | None None when the L2B scene has not been published yet.

Examples

tile = get_tile_for_plume(token, "tan20251212t185057c20s4001-E") # doctest: +SKIP tile.scene_id if tile else "deferred" # doctest: +SKIP 'tan20251212t185057c20s4001'

Source code in georeader/readers/carbonmapper/api_queries.py
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
def get_tile_for_plume(
    token: str,
    plume_id: str,
    *,
    collection: str = DEFAULT_L2B_COLLECTION,
) -> CMTileItem | None:
    """Resolve a plume to its parent L2B STAC item.

    Derives the parent ``scene_id`` via
    ``plume_id.rsplit("-", 1)[0]`` and looks up the corresponding
    STAC item.

    Unlike :func:`get_tile`, this helper **catches**
    :class:`CMSceneNotPublished` and returns ``None`` β€” appropriate for
    consumers (Phase 2 ETL) that want to defer rather than error.

    Parameters
    ----------
    token:
        Bearer token.
    plume_id:
        Colloquial plume id (with the ``-{part}`` suffix).
    collection:
        STAC collection β€” defaults to :data:`DEFAULT_L2B_COLLECTION`.

    Returns
    -------
    CMTileItem | None
        ``None`` when the L2B scene has not been published yet.

    Examples
    --------
    >>> tile = get_tile_for_plume(token, "tan20251212t185057c20s4001-E")  # doctest: +SKIP
    >>> tile.scene_id if tile else "deferred"  # doctest: +SKIP
    'tan20251212t185057c20s4001'
    """
    scene_id = _scene_id_from_plume(plume_id)
    try:
        return get_tile(token, scene_id, collection=collection)
    except CMSceneNotPublished:
        return None

get_source_for_plume(token, plume_id)

Resolve a plume to its attributed Carbon Mapper source.

Wraps /catalog/source/plume/name/{plume_id} β€” the by-name endpoint, which returns the cleaned source_name (preferred over the UUID-keyed sibling for colloquial plume_id strings).

Returns None when CM has not attributed the plume to a source (HTTP 404). Other HTTP errors propagate.

Parameters

token: Bearer token. plume_id: Colloquial plume id.

Returns

CMSource | None

Examples

src = get_source_for_plume(token, "tan20251212t185057c20s4001-E") # doctest: +SKIP src.source_name if src else "unattributed" # doctest: +SKIP 'CH4_1B2_100m_-104.0_32.0'

Source code in georeader/readers/carbonmapper/api_queries.py
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
def get_source_for_plume(
    token: str,
    plume_id: str,
) -> CMSource | None:
    """Resolve a plume to its attributed Carbon Mapper source.

    Wraps ``/catalog/source/plume/name/{plume_id}`` β€” the *by-name*
    endpoint, which returns the cleaned ``source_name`` (preferred over
    the UUID-keyed sibling for colloquial ``plume_id`` strings).

    Returns ``None`` when CM has not attributed the plume to a source
    (HTTP 404). Other HTTP errors propagate.

    Parameters
    ----------
    token:
        Bearer token.
    plume_id:
        Colloquial plume id.

    Returns
    -------
    CMSource | None

    Examples
    --------
    >>> src = get_source_for_plume(token, "tan20251212t185057c20s4001-E")  # doctest: +SKIP
    >>> src.source_name if src else "unattributed"  # doctest: +SKIP
    'CH4_1B2_100m_-104.0_32.0'
    """
    try:
        raw = _dl.get_source_for_plume_name(plume_id, token=token)
    except requests.HTTPError as exc:
        if _is_404(exc):
            return None
        raise
    if not raw:
        return None
    if "geometry" not in raw and "properties" not in raw:
        feature = {
            "properties": dict(raw),
            "geometry": {"type": "Point",
                         "coordinates": [raw.get("lon"), raw.get("lat")]},
        }
    else:
        feature = dict(raw)

    # The endpoint occasionally returns a plume-shaped payload (no
    # source_name, null top-level geometry) when CM has not yet
    # attributed the plume β€” treat as unattributed.
    props = feature.get("properties") or {}
    if not props.get("source_name") and not feature.get("source_name"):
        return None

    # Fall back to properties.point when the outer geometry is null
    # (same quirk as get_source).
    geom = feature.get("geometry") or {}
    coords = geom.get("coordinates") or [None, None]
    if not coords or coords[0] is None or coords[1] is None:
        point = props.get("point") or {}
        pcoords = point.get("coordinates") if isinstance(point, dict) else None
        if pcoords and pcoords[0] is not None and pcoords[1] is not None:
            feature["geometry"] = {"type": "Point", "coordinates": list(pcoords)}

    return CMSource.from_geojson_feature(feature)

get_plume_context(token, plume_id)

Single-call fetch of a plume plus its parent tile and source.

The most common notebook / ETL question is "give me everything CM knows about this plume". This helper batches the three independent REST/STAC calls behind a single name and surfaces the contracts as a typed tuple.

Failure modes are asymmetric:

  • The plume itself must exist β€” CMPlumeNotFound propagates.
  • Tile resolution returns None when the scene has not been published to L2B (CMSceneNotPublished caught internally).
  • Source resolution returns None when CM has not attributed the plume (404 caught internally).

Parameters

token: Bearer token. plume_id: Colloquial plume id.

Returns

(CMRawPlume, CMTileItem | None, CMSource | None)

Raises

CMPlumeNotFound When the plume itself is unknown.

Examples

Notebook exploration:

plume, tile, source = get_plume_context( # doctest: +SKIP ... token, "tan20251212t185057c20s4001-E", ... ) print(f"emission: {plume.emission_auto:.0f} kg/h") # doctest: +SKIP emission: 1240 kg/h if source: # doctest: +SKIP ... print(f"source {source.source_name} sector {source.sector}")

Source code in georeader/readers/carbonmapper/api_queries.py
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
def get_plume_context(
    token: str,
    plume_id: str,
) -> tuple[CMRawPlume, CMTileItem | None, CMSource | None]:
    """Single-call fetch of a plume plus its parent tile and source.

    The most common notebook / ETL question is *"give me everything CM
    knows about this plume"*. This helper batches the three independent
    REST/STAC calls behind a single name and surfaces the contracts as
    a typed tuple.

    Failure modes are asymmetric:

    - The plume itself **must** exist β€” ``CMPlumeNotFound`` propagates.
    - Tile resolution returns ``None`` when the scene has not been
      published to L2B (``CMSceneNotPublished`` caught internally).
    - Source resolution returns ``None`` when CM has not attributed
      the plume (404 caught internally).

    Parameters
    ----------
    token:
        Bearer token.
    plume_id:
        Colloquial plume id.

    Returns
    -------
    (CMRawPlume, CMTileItem | None, CMSource | None)

    Raises
    ------
    CMPlumeNotFound
        When the plume itself is unknown.

    Examples
    --------
    Notebook exploration:

    >>> plume, tile, source = get_plume_context(  # doctest: +SKIP
    ...     token, "tan20251212t185057c20s4001-E",
    ... )
    >>> print(f"emission: {plume.emission_auto:.0f} kg/h")  # doctest: +SKIP
    emission: 1240 kg/h
    >>> if source:                                          # doctest: +SKIP
    ...     print(f"source {source.source_name} sector {source.sector}")
    """
    plume = get_plume(token, plume_id)
    tile = get_tile_for_plume(token, plume_id)
    source = get_source_for_plume(token, plume_id)
    return plume, tile, source

list_tiles(token, *, bbox=None, datetime_min=None, datetime_max=None, collection=DEFAULT_L2B_COLLECTION, limit=1000)

Materialised list of L2B STAC items matching filters.

Wraps /stac/search (comma-joined STAC bbox encoding).

Parameters

token: Bearer token. bbox: (W, S, E, N) WGS-84 spatial filter. datetime_min, datetime_max: Optional UTC bounds. collection: STAC collection β€” defaults to :data:DEFAULT_L2B_COLLECTION. limit: Max items in this call.

Returns

list[CMTileItem]

Examples

tiles = list_tiles( # doctest: +SKIP ... token, bbox=(-104.5, 31.0, -101.5, 33.5), limit=10, ... ) {t.platform for t in tiles} # doctest: +SKIP

Source code in georeader/readers/carbonmapper/api_queries.py
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
def list_tiles(
    token: str,
    *,
    bbox: BBox | None = None,
    datetime_min: datetime | None = None,
    datetime_max: datetime | None = None,
    collection: str = DEFAULT_L2B_COLLECTION,
    limit: int = 1_000,
) -> list[CMTileItem]:
    """Materialised list of L2B STAC items matching filters.

    Wraps ``/stac/search`` (comma-joined STAC bbox encoding).

    Parameters
    ----------
    token:
        Bearer token.
    bbox:
        ``(W, S, E, N)`` WGS-84 spatial filter.
    datetime_min, datetime_max:
        Optional UTC bounds.
    collection:
        STAC collection β€” defaults to :data:`DEFAULT_L2B_COLLECTION`.
    limit:
        Max items in this call.

    Returns
    -------
    list[CMTileItem]

    Examples
    --------
    >>> tiles = list_tiles(  # doctest: +SKIP
    ...     token, bbox=(-104.5, 31.0, -101.5, 33.5), limit=10,
    ... )
    >>> {t.platform for t in tiles}  # doctest: +SKIP
    {'Tanager1', 'EMIT'}
    """
    dt_range = _build_datetime_range(datetime_min, datetime_max)
    result = _dl.stac_search(
        collections=[collection],
        bbox=bbox,
        datetime_range=dt_range,
        limit=limit,
        token=token,
    )
    features = result.get("features", []) if isinstance(result, Mapping) else []
    return [CMTileItem.from_stac_item(f) for f in features]

list_plumes(token, *, bbox=None, sectors=None, instruments=None, datetime_min=None, datetime_max=None, gas=Gas.CH4, limit=1000)

Materialised list of plumes matching filters.

Wraps /catalog/plumes/annotated and converts each row into a :class:CMRawPlume. The bbox is encoded as repeated keys (REST style β€” see :func:georeader.readers.carbonmapper.download._rest_bbox_params).

Parameters

token: Bearer token. bbox: (W, S, E, N) WGS-84 spatial filter. sectors: IPCC sector codes β€” e.g. ["1B2", "6A"]. instruments: Instrument short codes β€” e.g. ["emi", "tan"] or :class:Instrument members like [Instrument.EMIT, Instrument.TANAGER]. datetime_min, datetime_max: Optional UTC bounds β€” combined into an RFC 3339 interval. gas: :data:Gas.CH4 (default). CH4-only for this PR; Gas.CO2 lands in a follow-up. Typed as Gas | Literal["CH4"] so plain string call-sites (gas="CH4") continue to type-check. limit: Max rows returned in this call. The API caps at 1 000 per page.

Returns

list[CMRawPlume]

Examples

Permian methane plumes for Q1 2025 from EMIT and Tanager:

from datetime import datetime, timezone plumes = list_plumes( # doctest: +SKIP ... token, ... bbox=(-104.5, 31.0, -101.5, 33.5), ... instruments=["emi", "tan"], ... datetime_min=datetime(2025, 1, 1, tzinfo=timezone.utc), ... datetime_max=datetime(2025, 4, 1, tzinfo=timezone.utc), ... limit=500, ... ) sum(p.emission_auto or 0 for p in plumes) # doctest: +SKIP 412350.0

Source code in georeader/readers/carbonmapper/api_queries.py
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
def list_plumes(
    token: str,
    *,
    bbox: BBox | None = None,
    sectors: list[str] | None = None,
    instruments: list[str] | None = None,
    datetime_min: datetime | None = None,
    datetime_max: datetime | None = None,
    gas: Gas | Literal["CH4"] = Gas.CH4,
    limit: int = 1_000,
) -> list[CMRawPlume]:
    """Materialised list of plumes matching filters.

    Wraps ``/catalog/plumes/annotated`` and converts each row into a
    :class:`CMRawPlume`. The bbox is encoded as repeated keys (REST
    style β€” see :func:`georeader.readers.carbonmapper.download._rest_bbox_params`).

    Parameters
    ----------
    token:
        Bearer token.
    bbox:
        ``(W, S, E, N)`` WGS-84 spatial filter.
    sectors:
        IPCC sector codes β€” e.g. ``["1B2", "6A"]``.
    instruments:
        Instrument short codes β€” e.g. ``["emi", "tan"]`` or
        :class:`Instrument` members like ``[Instrument.EMIT, Instrument.TANAGER]``.
    datetime_min, datetime_max:
        Optional UTC bounds β€” combined into an RFC 3339 interval.
    gas:
        :data:`Gas.CH4` (default). **CH4-only for this PR**;
        ``Gas.CO2`` lands in a follow-up. Typed as
        ``Gas | Literal["CH4"]`` so plain string call-sites
        (``gas="CH4"``) continue to type-check.
    limit:
        Max rows returned in this call. The API caps at 1 000 per page.

    Returns
    -------
    list[CMRawPlume]

    Examples
    --------
    Permian methane plumes for Q1 2025 from EMIT and Tanager:

    >>> from datetime import datetime, timezone
    >>> plumes = list_plumes(  # doctest: +SKIP
    ...     token,
    ...     bbox=(-104.5, 31.0, -101.5, 33.5),
    ...     instruments=["emi", "tan"],
    ...     datetime_min=datetime(2025, 1, 1, tzinfo=timezone.utc),
    ...     datetime_max=datetime(2025, 4, 1, tzinfo=timezone.utc),
    ...     limit=500,
    ... )
    >>> sum(p.emission_auto or 0 for p in plumes)  # doctest: +SKIP
    412350.0
    """
    dt_range = _build_datetime_range(datetime_min, datetime_max)
    result = _dl.get_plumes_annotated(
        plume_gas=str(gas),
        bbox=bbox,
        datetime_range=dt_range,
        sectors=sectors,
        instruments=instruments,
        limit=limit,
        token=token,
    )
    items = result.get("items", []) if isinstance(result, Mapping) else []
    return [CMRawPlume(**row) for row in items]

list_sources(token, *, bbox=None, sectors=None, gas=Gas.CH4)

List Carbon Mapper sources matching filters.

Wraps the source listing endpoint (REST Catalog). Each item is parsed via :meth:CMSource.from_geojson_feature, which strips the source-name query-suffix.

Parameters

token: Bearer token. bbox: (W, S, E, N) WGS-84 spatial filter (REST repeated-keys encoding). sectors: IPCC sector codes. gas: :data:Gas.CH4 (default). CH4-only for this PR; Gas.CO2 lands in a follow-up.

Returns

list[CMSource]

Examples

Top oil & gas sources in the Permian:

sources = list_sources( # doctest: +SKIP ... token, ... bbox=(-104.5, 31.0, -101.5, 33.5), ... sectors=["1B2"], ... ) sorted(sources, key=lambda s: -(s.emission_auto or 0))[:3] # doctest: +SKIP [, , ]

Source code in georeader/readers/carbonmapper/api_queries.py
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
def list_sources(
    token: str,
    *,
    bbox: BBox | None = None,
    sectors: list[str] | None = None,
    gas: Gas | Literal["CH4"] = Gas.CH4,
) -> list[CMSource]:
    """List Carbon Mapper sources matching filters.

    Wraps the source listing endpoint (REST Catalog). Each item is
    parsed via :meth:`CMSource.from_geojson_feature`, which strips the
    source-name query-suffix.

    Parameters
    ----------
    token:
        Bearer token.
    bbox:
        ``(W, S, E, N)`` WGS-84 spatial filter (REST repeated-keys
        encoding).
    sectors:
        IPCC sector codes.
    gas:
        :data:`Gas.CH4` (default). **CH4-only for this PR**;
        ``Gas.CO2`` lands in a follow-up.

    Returns
    -------
    list[CMSource]

    Examples
    --------
    Top oil & gas sources in the Permian:

    >>> sources = list_sources(  # doctest: +SKIP
    ...     token,
    ...     bbox=(-104.5, 31.0, -101.5, 33.5),
    ...     sectors=["1B2"],
    ... )
    >>> sorted(sources, key=lambda s: -(s.emission_auto or 0))[:3]  # doctest: +SKIP
    [<CMSource ...>, <CMSource ...>, <CMSource ...>]
    """
    # The `download.get_sources` wrapper actually targets
    # `/plumes/annotated` (see its docstring) β€” the true source listing
    # lives at `/catalog/sources.geojson` and returns a GeoJSON
    # FeatureCollection. Hit it directly with REST repeated-keys bbox.
    params: list[tuple[str, str]] = [("plume_gas", str(gas))]
    if bbox is not None:
        for v in bbox:
            params.append(("bbox", str(v)))
    if sectors:
        for s in sectors:
            params.append(("sectors", s))
    resp = requests.get(
        f"{_dl.CATALOG_URL}/sources.geojson",
        params=params,
        headers=_dl._headers(token),
        timeout=60,
    )
    resp.raise_for_status()
    fc = resp.json()
    features = fc.get("features", []) if isinstance(fc, Mapping) else []
    return [CMSource.from_geojson_feature(f) for f in features]

list_plumes_for_tile(token, scene_id, *, gas=Gas.CH4)

All plumes attributed to a given L2B scene.

Carbon Mapper plume_ids embed the scene_id β€” plume_id = "{scene_id}-{part}" β€” so we filter the annotated plumes listing client-side by prefix.

Parameters

token: Bearer token. scene_id: L2B scene id, e.g. "tan20251212t185057c20s4001". gas: :data:Gas.CH4 (default). CH4-only for this PR; Gas.CO2 lands in a follow-up.

Returns

list[CMRawPlume]

Note

The current implementation pulls a 1 000-plume page and filters in Python. For high-volume scenes that may miss tail rows; pass a bbox filter or use :func:list_plumes directly when completeness matters.

Examples

plumes = list_plumes_for_tile( # doctest: +SKIP ... token, "tan20251212t185057c20s4001", ... ) [p.plume_id[-1] for p in plumes] # doctest: +SKIP ['A', 'B', 'C', 'E']

Source code in georeader/readers/carbonmapper/api_queries.py
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
def list_plumes_for_tile(
    token: str,
    scene_id: str,
    *,
    gas: Gas | Literal["CH4"] = Gas.CH4,
) -> list[CMRawPlume]:
    """All plumes attributed to a given L2B scene.

    Carbon Mapper plume_ids embed the scene_id β€”
    ``plume_id = "{scene_id}-{part}"`` β€” so we filter the annotated
    plumes listing client-side by prefix.

    Parameters
    ----------
    token:
        Bearer token.
    scene_id:
        L2B scene id, e.g. ``"tan20251212t185057c20s4001"``.
    gas:
        :data:`Gas.CH4` (default). **CH4-only for this PR**;
        ``Gas.CO2`` lands in a follow-up.

    Returns
    -------
    list[CMRawPlume]

    Note
    ----
    The current implementation pulls a 1 000-plume page and filters
    in Python. For high-volume scenes that may miss tail rows; pass a
    bbox filter or use :func:`list_plumes` directly when completeness
    matters.

    Examples
    --------
    >>> plumes = list_plumes_for_tile(  # doctest: +SKIP
    ...     token, "tan20251212t185057c20s4001",
    ... )
    >>> [p.plume_id[-1] for p in plumes]  # doctest: +SKIP
    ['A', 'B', 'C', 'E']
    """
    result = _dl.get_plumes_annotated(
        plume_gas=str(gas),
        limit=1_000,
        token=token,
    )
    items = result.get("items", []) if isinstance(result, Mapping) else []
    prefix = f"{scene_id}-"
    return [
        CMRawPlume(**row)
        for row in items
        if str(row.get("plume_id", "")).startswith(prefix)
    ]

list_plumes_for_source(token, source_name, *, limit=10000)

All plumes attributed to a Carbon Mapper source.

Wraps /catalog/source-plumes-csv/{source_name}. The CSV endpoint is single-shot (no pagination) β€” the result is fully materialised.

Strips the ?... query suffix from source_name automatically (data_model Β§2.2).

Parameters

token: Bearer token. source_name: Canonical or query-suffixed source name. limit: Cap the returned list. Defaults to 10 000 β€” CM sources rarely exceed a few hundred plumes, so this is just a safety cap.

Returns

list[CMRawPlume]

Examples

plumes = list_plumes_for_source( # doctest: +SKIP ... token, "CH4_1B2_100m_-104.17525_32.49125", ... ) len(plumes), plumes[0].plume_id[:3] # doctest: +SKIP (47, 'tan')

Source code in georeader/readers/carbonmapper/api_queries.py
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
def list_plumes_for_source(
    token: str,
    source_name: str,
    *,
    limit: int = 10_000,
) -> list[CMRawPlume]:
    """All plumes attributed to a Carbon Mapper source.

    Wraps ``/catalog/source-plumes-csv/{source_name}``. The CSV
    endpoint is single-shot (no pagination) β€” the result is fully
    materialised.

    Strips the ``?...`` query suffix from ``source_name`` automatically
    (``data_model Β§2.2``).

    Parameters
    ----------
    token:
        Bearer token.
    source_name:
        Canonical or query-suffixed source name.
    limit:
        Cap the returned list. Defaults to 10 000 β€” CM sources rarely
        exceed a few hundred plumes, so this is just a safety cap.

    Returns
    -------
    list[CMRawPlume]

    Examples
    --------
    >>> plumes = list_plumes_for_source(  # doctest: +SKIP
    ...     token, "CH4_1B2_100m_-104.17525_32.49125",
    ... )
    >>> len(plumes), plumes[0].plume_id[:3]  # doctest: +SKIP
    (47, 'tan')
    """
    import io
    import pandas as pd

    cleaned = _strip_query_suffix(source_name)
    csv_text = _dl.get_source_plumes_csv(cleaned, token=token)
    if not csv_text:
        return []
    df = pd.read_csv(io.StringIO(csv_text))
    if limit and len(df) > limit:
        df = df.head(limit)
    # CSV -> dict gives `float('nan')` for empty cells. Pydantic
    # str-typed fields like `sensitivity_mode` reject NaN; coerce
    # NaNs to None so optional fields fall back to their defaults.
    rows = df.to_dict(orient="records")
    cleaned: list[CMRawPlume] = []
    for row in rows:
        sane = {k: (None if isinstance(v, float) and v != v else v)
                for k, v in row.items()}
        cleaned.append(CMRawPlume(**sane))
    return cleaned

list_tiles_for_source(token, source_name, *, collection=DEFAULT_L2B_COLLECTION)

All distinct parent L2B tiles touched by a source's plumes.

Implementation:

  1. :func:list_plumes_for_source β€” every plume attributed to the source.
  2. {plume_id.rsplit("-", 1)[0] for ...} β€” distinct scene_ids.
  3. stac_search(ids=[...]) β€” resolve to STAC items.

Useful for tile-level backfill: given a chronic emitter, fetch every L2B scene that ever observed it, regardless of whether plumes were detected on a given pass.

Parameters

token: Bearer token. source_name: Canonical or query-suffixed source name. collection: STAC collection β€” defaults to :data:DEFAULT_L2B_COLLECTION.

Returns

list[CMTileItem] Empty list if the source has no plumes.

Examples

tiles = list_tiles_for_source( # doctest: +SKIP ... token, "CH4_1B2_100m_-104.17525_32.49125", ... ) sorted({t.platform for t in tiles}) # doctest: +SKIP ['EMIT', 'Tanager1']

Source code in georeader/readers/carbonmapper/api_queries.py
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
def list_tiles_for_source(
    token: str,
    source_name: str,
    *,
    collection: str = DEFAULT_L2B_COLLECTION,
) -> list[CMTileItem]:
    """All distinct parent L2B tiles touched by a source's plumes.

    Implementation:

    1. :func:`list_plumes_for_source` β€” every plume attributed to the
       source.
    2. ``{plume_id.rsplit("-", 1)[0] for ...}`` β€” distinct scene_ids.
    3. ``stac_search(ids=[...])`` β€” resolve to STAC items.

    Useful for tile-level backfill: given a chronic emitter, fetch
    every L2B scene that ever observed it, regardless of whether
    plumes were detected on a given pass.

    Parameters
    ----------
    token:
        Bearer token.
    source_name:
        Canonical or query-suffixed source name.
    collection:
        STAC collection β€” defaults to :data:`DEFAULT_L2B_COLLECTION`.

    Returns
    -------
    list[CMTileItem]
        Empty list if the source has no plumes.

    Examples
    --------
    >>> tiles = list_tiles_for_source(  # doctest: +SKIP
    ...     token, "CH4_1B2_100m_-104.17525_32.49125",
    ... )
    >>> sorted({t.platform for t in tiles})  # doctest: +SKIP
    ['EMIT', 'Tanager1']
    """
    plumes = list_plumes_for_source(token, source_name)
    scene_ids = sorted({_scene_id_from_plume(p.plume_id) for p in plumes})
    if not scene_ids:
        return []
    result = _dl.stac_search(
        collections=[collection], ids=scene_ids, limit=len(scene_ids), token=token,
    )
    features = result.get("features", []) if isinstance(result, Mapping) else []
    return [CMTileItem.from_stac_item(f) for f in features]

plume.py

Unified Pydantic model for Carbon Mapper plume records.

Handles payloads from both Carbon Mapper API formats:

  • CSV bulk export (/api/v1/catalog/plume-csv) β€” provides plume_latitude, plume_longitude, datetime, plume_bounds.
  • Annotated plume JSON (/api/v1/catalog/plumes/annotated) β€” provides geometry_json, scene_timestamp, validated, has_phme.

All fields except plume_id are optional so that the model can be constructed from either format without validation errors.

CH4 only for this PR. The catalog model surface is gas-agnostic (CMRawPlume.gas returns whatever the API gave us), but query helpers in :mod:api_queries are typed Literal["CH4"] to keep the supported-product surface explicit. CO2 lands in a follow-up.

Version timeline. Carbon Mapper bumps emission_version per processing-software release. v3a is the canonical STAC-exposed version family (in /stac/collections); v3c is the live processing version of newer plumes β€” reachable via direct asset URLs from /catalog/plume/{id} but not registered in STAC. The :attr:CMRawPlume.version property exposes this so callers can branch between STAC-item lookup (v3a) and URL-pattern derivation (v3c) β€” see :class:~georeader.readers.carbonmapper.image.CMPlumeImage, which handles both transparently.

This module is the API-side typed view of a Carbon Mapper plume record. Downstream consumers (e.g. UNEP IMEO MARS) may persist the record into their own tables / views; field-level docstrings below mirror the column comments on the src_carbon_mapper_plumes SQL view in pysat (UNEP-IMEO-MARS/pysat <https://github.com/UNEP-IMEO-MARS/pysat>_), so the upstream API and one downstream staging view share a single source of truth.

CARBONMAPPER_INSTRUMENTS = {'emi': 'EMIT', 'tan': 'Tanager-1', 'ang': 'AVIRIS-NG', 'gao': 'Global Airborne Observatory', 'av3': 'AVIRIS-3'} module-attribute

CM_INSTRUMENT_TO_SATELLITE = {'tan': 'Tanager1', 'ang': 'AVIRISNG', 'av3': 'AVIRIS3', 'emi': 'EMIT', 'gao': 'GAO'} module-attribute

CMRawPlume

Bases: BaseModel

Unified Carbon Mapper plume model.

Accepts payloads from both the CSV bulk-export endpoint and the annotated plume JSON endpoint. Only plume_id is required β€” all other fields default to None so either format can be parsed without errors.

Geometry is built automatically from whichever source is available:

  1. geometry_json (GeoJSON dict) β€” Point geometries are buffered by 0.001Β° to produce a small polygon.
  2. plume_bounds (bounding box) β€” converted to a shapely.box.

Note that geometry here is not the retrieved plume mask polygon β€” it's just the API's reported point/bounds. For the authoritative plume polygon, use :meth:~georeader.readers.carbonmapper.rasters.CMPlumeRaster.polygon, which extracts it from the L3A plume_tif band-4 alpha mask.

Downstream MARS staging-view counterpart

UNEP IMEO MARS persists this record into src_plume_staging_hist and exposes it via the src_carbon_mapper_plumes view (defined in pysat sql/view01_raw_carbon_mapper_plumes_view.sql <https://github.com/UNEP-IMEO-MARS/pysat/blob/main/sql/view01_raw_carbon_mapper_plumes_view.sql>_). Field-level docstrings below mirror that view's COMMENT ON COLUMN statements.

Mapping reference (CMRawPlume field β†’ SQL view column):

============================ ===================================== CMRawPlume field src_carbon_mapper_plumes column ============================ ===================================== plume_id source_id datetime_str / tile_date scene_timestamp published_at_str published_at modified_str modified plume_latitude lat plume_longitude lon plume_bounds_raw plume_bounds wind_source_auto wind_source wind_speed_avg_auto wind_speed_m_s wind_speed_std_auto wind_speed_std_m_s wind_direction_avg_auto wind_direction_deg wind_direction_std_auto wind_direction_std_deg emission_auto emission_rate_kg_h emission_uncertainty_auto emission_rate_uncertainty_kg_h ipcc_sector sector con_tif concentration_tif rgb_tif, rgb_png same names plume_tif, plume_png same names ============================ =====================================

Source code in georeader/readers/carbonmapper/plume.py
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
class CMRawPlume(BaseModel):
    """Unified Carbon Mapper plume model.

    Accepts payloads from both the CSV bulk-export endpoint and the
    annotated plume JSON endpoint. Only ``plume_id`` is required β€” all
    other fields default to ``None`` so either format can be parsed
    without errors.

    Geometry is built automatically from whichever source is available:

    1. ``geometry_json`` (GeoJSON dict) β€” Point geometries are buffered
       by 0.001Β° to produce a small polygon.
    2. ``plume_bounds`` (bounding box) β€” converted to a ``shapely.box``.

    Note that ``geometry`` here is **not** the retrieved plume mask
    polygon β€” it's just the API's reported point/bounds. For the
    authoritative plume polygon, use
    :meth:`~georeader.readers.carbonmapper.rasters.CMPlumeRaster.polygon`,
    which extracts it from the L3A ``plume_tif`` band-4 alpha mask.

    Downstream MARS staging-view counterpart
    ----------------------------------------
    UNEP IMEO MARS persists this record into ``src_plume_staging_hist``
    and exposes it via the **``src_carbon_mapper_plumes`` view** (defined
    in `pysat sql/view01_raw_carbon_mapper_plumes_view.sql
    <https://github.com/UNEP-IMEO-MARS/pysat/blob/main/sql/view01_raw_carbon_mapper_plumes_view.sql>`_).
    Field-level docstrings below mirror that view's
    ``COMMENT ON COLUMN`` statements.

    Mapping reference (CMRawPlume field β†’ SQL view column):

    ============================  =====================================
    ``CMRawPlume`` field          ``src_carbon_mapper_plumes`` column
    ============================  =====================================
    ``plume_id``                  ``source_id``
    ``datetime_str`` /            ``tile_date``
      ``scene_timestamp``
    ``published_at_str``          ``published_at``
    ``modified_str``              ``modified``
    ``plume_latitude``            ``lat``
    ``plume_longitude``           ``lon``
    ``plume_bounds_raw``          ``plume_bounds``
    ``wind_source_auto``          ``wind_source``
    ``wind_speed_avg_auto``       ``wind_speed_m_s``
    ``wind_speed_std_auto``       ``wind_speed_std_m_s``
    ``wind_direction_avg_auto``   ``wind_direction_deg``
    ``wind_direction_std_auto``   ``wind_direction_std_deg``
    ``emission_auto``             ``emission_rate_kg_h``
    ``emission_uncertainty_auto`` ``emission_rate_uncertainty_kg_h``
    ``ipcc_sector``               ``sector``
    ``con_tif``                   ``concentration_tif``
    ``rgb_tif``, ``rgb_png``      same names
    ``plume_tif``, ``plume_png``  same names
    ============================  =====================================
    """

    model_config = ConfigDict(
        arbitrary_types_allowed=True,
        populate_by_name=True,
        str_strip_whitespace=True,
        validate_assignment=True,
    )

    # Field descriptions below mirror the column comments on the
    # ``src_carbon_mapper_plumes`` view in pysat
    # (sql/view01_raw_carbon_mapper_plumes_view.sql) so the upstream API
    # docs, the staging-table view, and this in-memory model all share
    # one source of truth. Keep them in sync if the SQL view's COMMENT
    # ON COLUMN statements change.

    # --- Core identifiers ---
    plume_id: str = Field(
        description=(
            "Unique plume identifier in the format "
            "``{platform}{YYYYMMDD}{HHMMSS}-{part}``. The first three "
            "characters represent the platform (e.g. ``gao`` for Global "
            "Airborne Observatory) followed by the acquisition date and "
            "time in ISO 8601 UTC format. The ``-{part}`` suffix (e.g. "
            "``-A``) retains key information from the original radiance "
            "filename and indicates the order of multiple plumes "
            "detected in the same image."
        ),
    )
    gas: str | None = Field(
        default="CH4",
        description="The gas molecule detected during imaging operations.",
    )

    # --- Coordinates (CSV: required; JSON: derived from geometry_json) ---
    plume_latitude: float | None = Field(
        default=None, alias="plume_latitude",
        description="Latitude estimate of plume origin (decimal degrees, EPSG:4326).",
    )
    plume_longitude: float | None = Field(
        default=None, alias="plume_longitude",
        description="Longitude estimate of plume origin (decimal degrees, EPSG:4326).",
    )

    # --- Timestamps ---
    # CSV format uses "datetime"; annotated format uses "scene_timestamp"
    datetime_str: str | None = Field(
        default=None, alias="datetime",
        description=(
            "Acquisition time (UTC ISO 8601). Maps to the SQL view's "
            "``tile_date`` column. Set on CSV-format payloads; the "
            "annotated-JSON endpoint uses ``scene_timestamp`` instead."
        ),
    )
    scene_timestamp: str | None = Field(
        default=None,
        description=(
            "Acquisition time (UTC ISO 8601) β€” annotated-JSON variant of "
            "``datetime``. Either field may be populated, never both."
        ),
    )
    scene_uuid: str | None = Field(
        default=None,
        alias="scene_id",
        description=(
            "Internal Carbon Mapper scene UUID β€” what the API returns "
            "in the ``scene_id`` JSON field. **Not** the parseable scene "
            "name (e.g. ``tan20251212t185057c20s4001``); for that, use "
            "the :attr:`scene_id` property which derives from "
            "``plume_id.rsplit('-', 1)[0]`` and matches the STAC item id "
            "in the ``l2b-ch4-mfa-v3a`` collection."
        ),
    )
    published_at_str: str | None = Field(
        default=None, alias="published_at",
        description="Date and time the observation was published (UTC).",
    )
    modified_str: str | None = Field(
        default=None, alias="modified",
        description="Date and time the observation was last modified (UTC).",
    )

    # --- Emissions ---
    emission_auto: float | None = Field(
        default=None,
        description=(
            "Quantified emission rate of the plume [kg/hr], estimated "
            "using the Integrated Methane Enhancement (IME) method "
            "(Duren et al. 2019, *California's Methane Super-Emitters*, "
            "Nature)."
        ),
    )
    emission_uncertainty_auto: float | None = Field(
        default=None,
        description=(
            "Uncertainty in the emission rate [Β± kg/hr range], derived "
            "from uncertainty in IME and wind speed."
        ),
    )

    # --- Wind ---
    wind_speed_avg_auto: float | None = Field(
        default=None,
        description="Mean wind speed at the plume site [m/s].",
    )
    wind_speed_std_auto: float | None = Field(
        default=None,
        description="Standard deviation of wind speed [m/s].",
    )
    wind_direction_avg_auto: float | None = Field(
        default=None,
        description="Wind direction at the plume site [degrees].",
    )
    wind_direction_std_auto: float | None = Field(
        default=None,
        description="Standard deviation of wind direction [degrees].",
    )
    wind_source_auto: str | None = Field(
        default=None,
        description=(
            "Wind reanalysis source (e.g. ``HRRR``, ``ECMWF_IFS``, "
            "``ERA5``). Indicates which forecast/reanalysis product fed "
            "the IME quantification."
        ),
    )

    # --- Instrument / platform ---
    instrument: str | None = Field(
        default=None,
        description=(
            "Three-character sensor abbreviation: ``ang`` (AVIRIS-NG), "
            "``av3`` (AVIRIS-3), ``emi`` (EMIT), ``tan`` (Tanager-1), "
            "``gao`` (GAO)."
        ),
    )
    platform: str | None = Field(
        default=None,
        description="Unique name of the platform the instrument is attached to.",
    )
    provider: str | None = Field(
        default=None,
        description="Short description of the data provider's name.",
    )

    # --- Classification / metadata ---
    ipcc_sector: str | None = Field(
        default=None, alias="ipcc_sector",
        description=(
            "IPCC emissions sector (e.g. ``1B2`` for Oil & Gas) when "
            "Carbon Mapper attributes one. Reference: "
            "https://www.ipcc-nggip.iges.or.jp/public/gl/guidelin/ch1ri.pdf"
        ),
    )
    sector: str | None = Field(
        default=None,
        description=(
            "Carbon Mapper free-text sector category. Often a "
            "human-readable wrapper around ``ipcc_sector`` (e.g. "
            '``"Oil & Gas (1B2)"``).'
        ),
    )
    emission_cmf_type: str | None = Field(
        default=None, alias="emission_cmf_type",
        description=(
            "Statistical column-wise atmospheric retrieval algorithm "
            "used to threshold methane / carbon dioxide plumes from "
            "background concentrations (e.g. ``mfa``)."
        ),
    )
    mission_phase: str | None = Field(
        default=None,
        description=(
            "Operational mission phase, such as ``first_light`` or "
            "``production``."
        ),
    )
    emission_version: str | None = Field(
        default=None,
        description=(
            "Version label for the algorithm + calibration applied to "
            "produce this emission record. Pairs with reprocessing "
            "campaigns."
        ),
    )
    processing_software: str | None = Field(
        default=None,
        description=(
            "Software version used by the provider to process the raw "
            "satellite data (e.g. ``cmpro: 3.41.4``)."
        ),
    )
    gsd: float | None = Field(
        default=None,
        description=(
            "Native ground sample distance β€” the distance on the ground "
            "represented by the center-to-center spacing of pixels in "
            "the sensor's raw radiance data [meters]."
        ),
    )
    sensitivity_mode: str | None = Field(
        default=None,
        description=(
            "The sensor's configured detection threshold and "
            "radiometric settings, which affect signal-to-noise ratio "
            "(SNR), exposure time, and spectral fidelity."
        ),
    )
    off_nadir: float | None = Field(
        default=None,
        description=(
            "Angle between the satellite's sensor line of sight and the "
            "point directly below the satellite (nadir) [degrees]. "
            "Carbon Mapper publishes this on the plume; the equivalent "
            "STAC property at the L2B scene level is ``view:off_nadir``."
        ),
    )

    # --- Quality & validation (annotated JSON) ---
    plume_quality: str | None = Field(
        default=None,
        description=(
            "CM-side quality flag for the plume retrieval. Presence "
            "implies the record was reviewed by Carbon Mapper's "
            "pipeline."
        ),
    )
    validated: bool | None = Field(
        default=None,
        description="CM-side validation flag (annotated JSON only).",
    )
    validator_user: str | None = Field(
        default=None,
        description="Validator user id from the CM annotated payload.",
    )
    has_phme: bool | None = Field(
        default=None,
        description=(
            "Whether the plume has been Plume Height + Mass Estimated. "
            "Annotated JSON only."
        ),
    )
    detection_institution: str | None = Field(
        default=None,
        description="Detection institution string from the CM annotated payload.",
    )

    # --- Source linkage (annotated JSON) ---
    source_id: str | None = Field(
        default=None,
        description=(
            "Carbon Mapper-assigned emission-source id. Joins to the CM "
            "API's source endpoint."
        ),
    )
    source_name: str | None = Field(
        default=None,
        description=(
            "Carbon Mapper source-name string (e.g. "
            "``CH4_1B2_100m_-104.17525_32.49125``)."
        ),
    )

    # --- Assets ---
    plume_tif: str | None = Field(
        default=None,
        description=(
            "HTTPS link to a GeoTIFF of the delineated plume (L3A "
            "alpha-banded mask). "
            ":meth:`~georeader.readers.carbonmapper.rasters.CMPlumeRaster.polygon`"
            " extracts the polygon from band 4 of this file β€” the "
            "authoritative source for the retrieved plume shape."
        ),
    )
    plume_png: str | None = Field(
        default=None,
        description="HTTPS link to a PNG visualisation of the delineated plume.",
    )
    con_tif: str | None = Field(
        default=None,
        description=(
            "HTTPS link to a GeoTIFF pixel map of unsmoothed "
            "concentration values [ppmΒ·m]. The L2B-tile-level "
            "equivalent is the ``cmf`` asset on the parent STAC item."
        ),
    )
    rgb_tif: str | None = Field(
        default=None,
        description=(
            "HTTPS link to a 3-band, natural-colour, full-strip "
            "surface-reflectance GeoTIFF. The L2B-tile-level sibling "
            "lives in the ``l2b-rgb-v3a`` STAC collection."
        ),
    )
    rgb_png: str | None = Field(
        default=None,
        description=(
            "HTTPS link to a natural-colour, full-strip "
            "surface-reflectance PNG."
        ),
    )
    plume_rgb_png: str | None = Field(
        default=None,
        description="HTTPS link to a PNG of the plume overlaid on RGB.",
    )

    # --- Geometry sources ---
    geometry_json: dict | None = Field(
        default=None,
        description=(
            "Raw GeoJSON geometry dict from the CM payload β€” typically "
            "a Point or coarse Polygon. **Not** the retrieved plume "
            "polygon; for that, use ``CMPlumeRaster.polygon()`` against "
            "``plume_tif``."
        ),
    )
    plume_bounds_raw: Optional[Union[str, List[float], Tuple[float, float, float, float]]] = Field(
        default=None, alias="plume_bounds",
        description="Geographic bounds encompassing the plume image (W, S, E, N).",
    )

    # --- Derived ---
    geometry: BaseGeometry | None = Field(
        default=None,
        description=(
            "Shapely geometry built from ``geometry_json`` (preferred) "
            "or ``plume_bounds`` at validation time. **Not** the "
            "retrieved plume mask β€” same caveat as ``geometry_json``."
        ),
    )

    # ------------------------------------------------------------------ #
    # Field validators                                                     #
    # ------------------------------------------------------------------ #

    @field_validator(
        "plume_latitude",
        "plume_longitude",
        "gsd",
        "off_nadir",
        "emission_auto",
        "emission_uncertainty_auto",
        "wind_speed_avg_auto",
        "wind_speed_std_auto",
        "wind_direction_avg_auto",
        "wind_direction_std_auto",
        mode="before",
    )
    @classmethod
    def _coerce_float(cls, v: Any) -> float | None:
        return _to_float(v)

    @field_validator("validated", "has_phme", mode="before")
    @classmethod
    def _coerce_bool(cls, v: Any) -> bool | None:
        if v is None:
            return None
        if isinstance(v, bool):
            return v
        if isinstance(v, str):
            return v.lower() in ("true", "1", "yes")
        return bool(v)

    # ------------------------------------------------------------------ #
    # Model validator                                                      #
    # ------------------------------------------------------------------ #

    @model_validator(mode="after")
    def _build_geometry(self) -> "CMRawPlume":
        """Build shapely geometry from ``geometry_json`` or ``plume_bounds``."""
        geom: BaseGeometry | None = None

        # Priority 1: GeoJSON
        if self.geometry_json:
            try:
                geom = shape(self.geometry_json)
            except Exception:
                geom = None
            geom_type = self.geometry_json.get("type", "")
            if geom_type == "Point" and geom is not None:
                # Buffer by ~111 m to get a small polygon
                geom = geom.buffer(0.001)
            # Fill lat/lon from Point coordinates if not set
            if (self.plume_latitude is None or self.plume_longitude is None) and geom_type == "Point":
                coords = self.geometry_json.get("coordinates")
                if coords and len(coords) >= 2:
                    object.__setattr__(self, "plume_longitude", float(coords[0]))
                    object.__setattr__(self, "plume_latitude", float(coords[1]))

        # Priority 2: Bounding box
        if geom is None:
            b = _parse_bounds(self.plume_bounds_raw)
            if b is not None:
                try:
                    geom = box(*b)
                except Exception:
                    geom = None

        object.__setattr__(self, "geometry", geom)
        return self

    # ------------------------------------------------------------------ #
    # Properties                                                           #
    # ------------------------------------------------------------------ #

    @property
    def observation_datetime(self) -> datetime | None:
        """Parse observation time from ``datetime_str`` or ``scene_timestamp``."""
        return _parse_iso_datetime(self.datetime_str) or _parse_iso_datetime(self.scene_timestamp)

    @property
    def published_at(self) -> datetime | None:
        return _parse_iso_datetime(self.published_at_str)

    @property
    def modified_at(self) -> datetime | None:
        return _parse_iso_datetime(self.modified_str)

    @property
    def lat(self) -> float | None:
        return self.plume_latitude

    @property
    def lon(self) -> float | None:
        return self.plume_longitude

    @property
    def geometry_wkt(self) -> str | None:
        return self.geometry.wkt if self.geometry is not None else None

    @property
    def wind_u(self) -> float | None:
        """Eastward wind component (m/s), meteorological convention."""
        u, _ = decompose_wind(self.wind_speed_avg_auto, self.wind_direction_avg_auto)
        return u

    @property
    def wind_v(self) -> float | None:
        """Northward wind component (m/s), meteorological convention."""
        _, v = decompose_wind(self.wind_speed_avg_auto, self.wind_direction_avg_auto)
        return v

    @property
    def instrument_name(self) -> str | None:
        """Human-readable instrument name from :data:`CARBONMAPPER_INSTRUMENTS`.

        The lookup is case-insensitive β€” upstream payloads occasionally
        report ``"GAO"`` while ``plume_id`` prefixes are lowercase, so
        the table key is normalised at lookup time rather than relying
        on every caller to lowercase first.
        """
        if self.instrument is None:
            return None
        return CARBONMAPPER_INSTRUMENTS.get(
            self.instrument.lower(), self.instrument,
        )

    @property
    def scene_id(self) -> str:
        """Parent L2B scene id, derived from ``plume_id``.

        Equivalent to ``plume_id.rsplit('-', 1)[0]`` β€” same string used
        as the STAC item id in the ``l2b-ch4-mfa-v3a`` collection. Use
        this to bridge from a plume to its parent scene without an HTTP
        round-trip:

        >>> raw.scene_id                       # doctest: +SKIP
        'tan20251212t185057c20s4001'
        >>> tile = api_queries.get_tile(token, raw.scene_id)  # doctest: +SKIP

        Distinct from :attr:`scene_uuid`, which is the API's internal
        UUID for the scene.
        """
        return self.plume_id.rsplit("-", 1)[0]

    @property
    def version(self) -> str | None:
        """Processing version (``"v3a"`` / ``"v3b"`` / ``"v3c"`` / ...).

        Re-exposes :attr:`emission_version` as a more obvious branch
        point for STAC-vs-CDN access: ``v3a`` plumes are STAC-resident,
        ``v3c`` plumes are reachable only via the URL-pattern derivation
        in :class:`~georeader.readers.carbonmapper.image.CMPlumeImage`.
        Returns ``None`` if the upstream payload didn't include
        ``emission_version`` (older CSV exports).
        """
        return self.emission_version

    # ------------------------------------------------------------------ #
    # Serialisation                                                        #
    # ------------------------------------------------------------------ #

    def to_source_dict(self) -> Dict[str, Any]:
        """Serialise to a dict suitable for round-tripping through :meth:`from_raw`."""
        d: Dict[str, Any] = {"plume_id": self.plume_id, "gas": self.gas}

        # Coordinates
        if self.plume_latitude is not None:
            d["plume_latitude"] = self.plume_latitude
        if self.plume_longitude is not None:
            d["plume_longitude"] = self.plume_longitude

        # Timestamps
        if self.datetime_str is not None:
            d["datetime"] = self.datetime_str
        if self.scene_timestamp is not None:
            d["scene_timestamp"] = self.scene_timestamp
        # Round-trip the API's `scene_id` (UUID) under its on-the-wire
        # name; the parseable form is derived via the property.
        if self.scene_uuid is not None:
            d["scene_id"] = self.scene_uuid
        if self.published_at_str is not None:
            d["published_at"] = self.published_at_str
        if self.modified_str is not None:
            d["modified"] = self.modified_str

        # Emissions
        d["emission_auto"] = self.emission_auto
        d["emission_uncertainty_auto"] = self.emission_uncertainty_auto

        # Wind
        d["wind_speed_avg_auto"] = self.wind_speed_avg_auto
        d["wind_speed_std_auto"] = self.wind_speed_std_auto
        d["wind_direction_avg_auto"] = self.wind_direction_avg_auto
        d["wind_direction_std_auto"] = self.wind_direction_std_auto
        d["wind_source_auto"] = self.wind_source_auto

        # Instrument / platform
        d["instrument"] = self.instrument
        d["platform"] = self.platform
        d["provider"] = self.provider

        # Classification
        if self.ipcc_sector is not None:
            d["ipcc_sector"] = self.ipcc_sector
        if self.sector is not None:
            d["sector"] = self.sector
        d["emission_cmf_type"] = self.emission_cmf_type
        d["mission_phase"] = self.mission_phase
        d["emission_version"] = self.emission_version
        d["processing_software"] = self.processing_software
        d["gsd"] = self.gsd
        d["sensitivity_mode"] = self.sensitivity_mode
        d["off_nadir"] = self.off_nadir

        # Quality / validation
        if self.plume_quality is not None:
            d["plume_quality"] = self.plume_quality
        if self.validated is not None:
            d["validated"] = self.validated
        if self.validator_user is not None:
            d["validator_user"] = self.validator_user
        if self.has_phme is not None:
            d["has_phme"] = self.has_phme
        if self.detection_institution is not None:
            d["detection_institution"] = self.detection_institution

        # Source linkage
        if self.source_id is not None:
            d["source_id"] = self.source_id
        if self.source_name is not None:
            d["source_name"] = self.source_name

        # Assets
        d["plume_tif"] = self.plume_tif
        d["plume_png"] = self.plume_png
        d["con_tif"] = self.con_tif
        d["rgb_tif"] = self.rgb_tif
        d["rgb_png"] = self.rgb_png
        if self.plume_rgb_png is not None:
            d["plume_rgb_png"] = self.plume_rgb_png

        # Geometry sources
        if self.geometry_json is not None:
            d["geometry_json"] = self.geometry_json
        if self.plume_bounds_raw is not None:
            d["plume_bounds"] = self.plume_bounds_raw

        return d

    # ------------------------------------------------------------------ #
    # Factory classmethods                                                 #
    # ------------------------------------------------------------------ #

    @classmethod
    def from_raw(cls, raw: Union[str, Dict[str, Any]]) -> "CMRawPlume":
        """Create from a JSON string or dict (CSV row or annotated-plume payload)."""
        if isinstance(raw, str):
            raw = json.loads(raw)
        return cls(**raw)

    # ------------------------------------------------------------------ #
    # Representation                                                       #
    # ------------------------------------------------------------------ #

    def _short_wkt_preview(self, max_len: int = 160) -> str | None:
        if not self.geometry:
            return None
        txt = self.geometry.wkt.replace("\n", " ").strip()
        return txt if len(txt) <= max_len else txt[: max_len - 3] + "..."

    def __str__(self) -> str:
        geom = self.geometry
        geom_type = getattr(geom, "geom_type", None)
        area = round(geom.area, 6) if geom is not None else None
        dt = self.observation_datetime.isoformat() if self.observation_datetime else None
        return (
            f"{self.__class__.__name__}\n"
            f"  plume_id: {self.plume_id}\n"
            f"  observation_datetime (UTC): {dt}\n"
            f"  lat: {self.lat}\n"
            f"  lon: {self.lon}\n"
            f"  instrument: {self.instrument}\n"
            f"  platform: {self.platform}\n"
            f"  geometry_type: {geom_type}\n"
            f"  geometry_area_deg2: {area}\n"
            f"  emission_auto: {self.emission_auto}\n"
            f"  emission_uncertainty_auto: {self.emission_uncertainty_auto}\n"
            f"  wind_speed_avg_auto: {self.wind_speed_avg_auto}\n"
            f"  wind_direction_avg_auto: {self.wind_direction_avg_auto}\n"
            f"  gas: {self.gas}\n"
            f"  validated: {self.validated}\n"
        )

    def __repr__(self) -> str:
        geom = self.geometry
        geom_type = getattr(geom, "geom_type", None)
        area = geom.area if geom is not None else None
        return (
            f"{self.__class__.__name__}(\n"
            f"  plume_id={self.plume_id!r},\n"
            f"  lat={self.lat},\n"
            f"  lon={self.lon},\n"
            f"  gas={self.gas!r},\n"
            f"  instrument={self.instrument!r},\n"
            f"  platform={self.platform!r},\n"
            f"  emission_auto={self.emission_auto},\n"
            f"  emission_uncertainty_auto={self.emission_uncertainty_auto},\n"
            f"  wind_speed_avg_auto={self.wind_speed_avg_auto},\n"
            f"  wind_direction_avg_auto={self.wind_direction_avg_auto},\n"
            f"  validated={self.validated},\n"
            f"  geometry_type={geom_type},\n"
            f"  geometry_area_deg2={area},\n"
            f"  geometry_wkt_preview={self._short_wkt_preview()!r}\n"
            f")"
        )

instrument_name property

Human-readable instrument name from :data:CARBONMAPPER_INSTRUMENTS.

The lookup is case-insensitive β€” upstream payloads occasionally report "GAO" while plume_id prefixes are lowercase, so the table key is normalised at lookup time rather than relying on every caller to lowercase first.

observation_datetime property

Parse observation time from datetime_str or scene_timestamp.

scene_id property

Parent L2B scene id, derived from plume_id.

Equivalent to plume_id.rsplit('-', 1)[0] β€” same string used as the STAC item id in the l2b-ch4-mfa-v3a collection. Use this to bridge from a plume to its parent scene without an HTTP round-trip:

raw.scene_id # doctest: +SKIP 'tan20251212t185057c20s4001' tile = api_queries.get_tile(token, raw.scene_id) # doctest: +SKIP

Distinct from :attr:scene_uuid, which is the API's internal UUID for the scene.

version property

Processing version ("v3a" / "v3b" / "v3c" / ...).

Re-exposes :attr:emission_version as a more obvious branch point for STAC-vs-CDN access: v3a plumes are STAC-resident, v3c plumes are reachable only via the URL-pattern derivation in :class:~georeader.readers.carbonmapper.image.CMPlumeImage. Returns None if the upstream payload didn't include emission_version (older CSV exports).

wind_u property

Eastward wind component (m/s), meteorological convention.

wind_v property

Northward wind component (m/s), meteorological convention.

from_raw(raw) classmethod

Create from a JSON string or dict (CSV row or annotated-plume payload).

Source code in georeader/readers/carbonmapper/plume.py
953
954
955
956
957
958
@classmethod
def from_raw(cls, raw: Union[str, Dict[str, Any]]) -> "CMRawPlume":
    """Create from a JSON string or dict (CSV row or annotated-plume payload)."""
    if isinstance(raw, str):
        raw = json.loads(raw)
    return cls(**raw)

to_source_dict()

Serialise to a dict suitable for round-tripping through :meth:from_raw.

Source code in georeader/readers/carbonmapper/plume.py
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
def to_source_dict(self) -> Dict[str, Any]:
    """Serialise to a dict suitable for round-tripping through :meth:`from_raw`."""
    d: Dict[str, Any] = {"plume_id": self.plume_id, "gas": self.gas}

    # Coordinates
    if self.plume_latitude is not None:
        d["plume_latitude"] = self.plume_latitude
    if self.plume_longitude is not None:
        d["plume_longitude"] = self.plume_longitude

    # Timestamps
    if self.datetime_str is not None:
        d["datetime"] = self.datetime_str
    if self.scene_timestamp is not None:
        d["scene_timestamp"] = self.scene_timestamp
    # Round-trip the API's `scene_id` (UUID) under its on-the-wire
    # name; the parseable form is derived via the property.
    if self.scene_uuid is not None:
        d["scene_id"] = self.scene_uuid
    if self.published_at_str is not None:
        d["published_at"] = self.published_at_str
    if self.modified_str is not None:
        d["modified"] = self.modified_str

    # Emissions
    d["emission_auto"] = self.emission_auto
    d["emission_uncertainty_auto"] = self.emission_uncertainty_auto

    # Wind
    d["wind_speed_avg_auto"] = self.wind_speed_avg_auto
    d["wind_speed_std_auto"] = self.wind_speed_std_auto
    d["wind_direction_avg_auto"] = self.wind_direction_avg_auto
    d["wind_direction_std_auto"] = self.wind_direction_std_auto
    d["wind_source_auto"] = self.wind_source_auto

    # Instrument / platform
    d["instrument"] = self.instrument
    d["platform"] = self.platform
    d["provider"] = self.provider

    # Classification
    if self.ipcc_sector is not None:
        d["ipcc_sector"] = self.ipcc_sector
    if self.sector is not None:
        d["sector"] = self.sector
    d["emission_cmf_type"] = self.emission_cmf_type
    d["mission_phase"] = self.mission_phase
    d["emission_version"] = self.emission_version
    d["processing_software"] = self.processing_software
    d["gsd"] = self.gsd
    d["sensitivity_mode"] = self.sensitivity_mode
    d["off_nadir"] = self.off_nadir

    # Quality / validation
    if self.plume_quality is not None:
        d["plume_quality"] = self.plume_quality
    if self.validated is not None:
        d["validated"] = self.validated
    if self.validator_user is not None:
        d["validator_user"] = self.validator_user
    if self.has_phme is not None:
        d["has_phme"] = self.has_phme
    if self.detection_institution is not None:
        d["detection_institution"] = self.detection_institution

    # Source linkage
    if self.source_id is not None:
        d["source_id"] = self.source_id
    if self.source_name is not None:
        d["source_name"] = self.source_name

    # Assets
    d["plume_tif"] = self.plume_tif
    d["plume_png"] = self.plume_png
    d["con_tif"] = self.con_tif
    d["rgb_tif"] = self.rgb_tif
    d["rgb_png"] = self.rgb_png
    if self.plume_rgb_png is not None:
        d["plume_rgb_png"] = self.plume_rgb_png

    # Geometry sources
    if self.geometry_json is not None:
        d["geometry_json"] = self.geometry_json
    if self.plume_bounds_raw is not None:
        d["plume_bounds"] = self.plume_bounds_raw

    return d

decompose_wind(speed, direction_deg)

Convert wind speed + meteorological direction to (u, v) components.

Meteorological convention: 0Β° = wind from North, 90Β° = wind from East. Returns the eastward (u) and northward (v) wind vector components.

Source code in georeader/readers/carbonmapper/plume.py
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
def decompose_wind(
    speed: float | None,
    direction_deg: float | None,
) -> tuple[float | None, float | None]:
    """Convert wind speed + meteorological direction to (u, v) components.

    Meteorological convention: 0Β° = wind *from* North, 90Β° = wind *from* East.
    Returns the eastward (u) and northward (v) wind vector components.
    """
    if speed is None or direction_deg is None:
        return None, None
    direction_rad = math.radians(direction_deg)
    wind_u = -speed * math.sin(direction_rad)
    wind_v = -speed * math.cos(direction_rad)
    return wind_u, wind_v

Typed model for a Carbon Mapper source (DBSCAN cluster of plumes).

A Carbon Mapper source groups all plumes detected at the same geographic location into a persistent point-source record. Sources are addressed by a deterministic name of the form {gas}_{sector}_{footprint_m}m_{lon}_{lat} β€” e.g. "CH4_1B2_100m_-104.17525_32.49125".

This module is the API-side typed view of a Carbon Mapper source. Downstream consumers may persist it into their own tables, but this package deliberately does not assume any particular DB schema.

Notable quirks handled here

  • /catalog/sources.geojson features sometimes return source_name with a stray query-string fragment appended ("...?plume_gas=CH4&bbox=..."). :func:_strip_query_suffix removes it; :meth:CMSource.from_geojson_feature calls it always so callers never see the dirty form.
  • The endpoints return either a GeoJSON Feature (with properties / geometry) or a flat dict; the higher-level :mod:georeader.readers.carbonmapper.api_queries normalises these before invoking :meth:from_geojson_feature.

CMSource dataclass

Typed view of a Carbon Mapper source (cluster of plumes).

Frozen β€” instances are immutable and hashable. The raw dict captures the full upstream properties payload so consumers can reach for fields not yet exposed on the dataclass without round- tripping through the API.

Attributes

source_name: Canonical name (no ?... suffix). Stable across CM API revisions for the same physical site. gas: Gas species β€” typically "CH4" or "CO2". sector: IPCC sector code, e.g. "1B2" (Oil & Gas), "6A" (Solid Waste), "1B1a" (Coal Mining). point: Centroid as a Shapely :class:shapely.geometry.Point in WGS-84. plume_count: Number of plumes Carbon Mapper has attributed to this source. persistence: Carbon Mapper's persistence metric (overpasses-with-detection / total-overpasses), in [0, 1]. emission_auto: Persistence-weighted average emission rate in kg/h. None when CM has not produced an aggregate estimate. emission_uncertainty_auto: Companion uncertainty for emission_auto, in kg/h. first_observation, last_observation: Earliest and latest detection datetimes (UTC-aware). raw: Original properties mapping from the API response.

Examples

Parse from a /catalog/sources.geojson feature:

feature = { ... "properties": { ... "source_name": "CH4_1B2_100m_-104.17525_32.49125?plume_gas=CH4", ... "sector": "1B2", "gas": "CH4", ... "plume_count": 12, "persistence": 0.42, ... "emission_auto": 250.0, ... }, ... "geometry": {"type": "Point", ... "coordinates": [-104.17525, 32.49125]}, ... } src = CMSource.from_geojson_feature(feature) src.source_name # query suffix stripped 'CH4_1B2_100m_-104.17525_32.49125' src.point.x, src.point.y (-104.17525, 32.49125) src.plume_count, src.sector (12, '1B2')

Source code in georeader/readers/carbonmapper/source.py
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
@dataclass(frozen=True)
class CMSource:
    """Typed view of a Carbon Mapper source (cluster of plumes).

    Frozen β€” instances are immutable and hashable. The ``raw`` dict
    captures the full upstream properties payload so consumers can
    reach for fields not yet exposed on the dataclass without round-
    tripping through the API.

    Attributes
    ----------
    source_name:
        Canonical name (no ``?...`` suffix). Stable across CM API
        revisions for the same physical site.
    gas:
        Gas species β€” typically ``"CH4"`` or ``"CO2"``.
    sector:
        IPCC sector code, e.g. ``"1B2"`` (Oil & Gas), ``"6A"`` (Solid
        Waste), ``"1B1a"`` (Coal Mining).
    point:
        Centroid as a Shapely :class:`shapely.geometry.Point` in WGS-84.
    plume_count:
        Number of plumes Carbon Mapper has attributed to this source.
    persistence:
        Carbon Mapper's persistence metric (overpasses-with-detection /
        total-overpasses), in ``[0, 1]``.
    emission_auto:
        Persistence-weighted average emission rate in ``kg/h``. ``None``
        when CM has not produced an aggregate estimate.
    emission_uncertainty_auto:
        Companion uncertainty for ``emission_auto``, in ``kg/h``.
    first_observation, last_observation:
        Earliest and latest detection datetimes (UTC-aware).
    raw:
        Original ``properties`` mapping from the API response.

    Examples
    --------
    Parse from a ``/catalog/sources.geojson`` feature:

    >>> feature = {
    ...     "properties": {
    ...         "source_name": "CH4_1B2_100m_-104.17525_32.49125?plume_gas=CH4",
    ...         "sector": "1B2", "gas": "CH4",
    ...         "plume_count": 12, "persistence": 0.42,
    ...         "emission_auto": 250.0,
    ...     },
    ...     "geometry": {"type": "Point",
    ...                  "coordinates": [-104.17525, 32.49125]},
    ... }
    >>> src = CMSource.from_geojson_feature(feature)
    >>> src.source_name              # query suffix stripped
    'CH4_1B2_100m_-104.17525_32.49125'
    >>> src.point.x, src.point.y
    (-104.17525, 32.49125)
    >>> src.plume_count, src.sector
    (12, '1B2')
    """

    source_name: str
    gas: str
    sector: str
    point: Point
    plume_count: int
    persistence: float
    emission_auto: float | None = None
    emission_uncertainty_auto: float | None = None
    first_observation: datetime | None = None
    last_observation: datetime | None = None
    raw: dict = field(default_factory=dict)

    @classmethod
    def from_geojson_feature(cls, feature: dict) -> "CMSource":
        """Parse a ``/catalog/sources.geojson`` feature into a CMSource.

        Always strips the ``source_name`` query-string suffix
        (``?plume_gas=...``) β€” this is the canonical strip site, so
        downstream code can treat ``CMSource.source_name`` as clean.

        Parameters
        ----------
        feature:
            GeoJSON Feature dict with at least ``"geometry"`` (Point)
            and ``"properties"`` (with ``source_name`` and friends).

        Returns
        -------
        CMSource
            Typed source record with the suffix stripped.

        Raises
        ------
        ValueError
            If ``feature["geometry"]`` does not carry a Point coordinate
            pair.

        Examples
        --------
        >>> feature = {
        ...     "properties": {"source_name": "x?bbox=1", "sector": "1B2",
        ...                    "gas": "CH4", "plume_count": 1,
        ...                    "persistence": 0.5},
        ...     "geometry": {"type": "Point", "coordinates": [-100.0, 30.0]},
        ... }
        >>> CMSource.from_geojson_feature(feature).source_name
        'x'
        """
        props = dict(feature.get("properties") or {})
        geom = feature.get("geometry") or {}
        coords = geom.get("coordinates") or (None, None)
        lon, lat = (coords + [None, None])[:2] if isinstance(coords, list) else (None, None)

        if lon is None or lat is None:
            raise ValueError(
                f"feature is missing Point coordinates: {feature!r}"
            )

        return cls(
            source_name=_strip_query_suffix(str(props.get("source_name", ""))),
            gas=str(props.get("gas", "") or ""),
            sector=str(props.get("sector", "") or ""),
            point=Point(float(lon), float(lat)),
            plume_count=int(props.get("plume_count") or 0),
            persistence=float(props.get("persistence") or 0.0),
            emission_auto=_to_float(props.get("emission_auto")),
            emission_uncertainty_auto=_to_float(
                props.get("emission_uncertainty_auto")
            ),
            first_observation=_parse_iso_datetime(props.get("first_observation")),
            last_observation=_parse_iso_datetime(props.get("last_observation")),
            raw=props,
        )

from_geojson_feature(feature) classmethod

Parse a /catalog/sources.geojson feature into a CMSource.

Always strips the source_name query-string suffix (?plume_gas=...) β€” this is the canonical strip site, so downstream code can treat CMSource.source_name as clean.

Parameters

feature: GeoJSON Feature dict with at least "geometry" (Point) and "properties" (with source_name and friends).

Returns

CMSource Typed source record with the suffix stripped.

Raises

ValueError If feature["geometry"] does not carry a Point coordinate pair.

Examples

feature = { ... "properties": {"source_name": "x?bbox=1", "sector": "1B2", ... "gas": "CH4", "plume_count": 1, ... "persistence": 0.5}, ... "geometry": {"type": "Point", "coordinates": [-100.0, 30.0]}, ... } CMSource.from_geojson_feature(feature).source_name 'x'

Source code in georeader/readers/carbonmapper/source.py
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
@classmethod
def from_geojson_feature(cls, feature: dict) -> "CMSource":
    """Parse a ``/catalog/sources.geojson`` feature into a CMSource.

    Always strips the ``source_name`` query-string suffix
    (``?plume_gas=...``) β€” this is the canonical strip site, so
    downstream code can treat ``CMSource.source_name`` as clean.

    Parameters
    ----------
    feature:
        GeoJSON Feature dict with at least ``"geometry"`` (Point)
        and ``"properties"`` (with ``source_name`` and friends).

    Returns
    -------
    CMSource
        Typed source record with the suffix stripped.

    Raises
    ------
    ValueError
        If ``feature["geometry"]`` does not carry a Point coordinate
        pair.

    Examples
    --------
    >>> feature = {
    ...     "properties": {"source_name": "x?bbox=1", "sector": "1B2",
    ...                    "gas": "CH4", "plume_count": 1,
    ...                    "persistence": 0.5},
    ...     "geometry": {"type": "Point", "coordinates": [-100.0, 30.0]},
    ... }
    >>> CMSource.from_geojson_feature(feature).source_name
    'x'
    """
    props = dict(feature.get("properties") or {})
    geom = feature.get("geometry") or {}
    coords = geom.get("coordinates") or (None, None)
    lon, lat = (coords + [None, None])[:2] if isinstance(coords, list) else (None, None)

    if lon is None or lat is None:
        raise ValueError(
            f"feature is missing Point coordinates: {feature!r}"
        )

    return cls(
        source_name=_strip_query_suffix(str(props.get("source_name", ""))),
        gas=str(props.get("gas", "") or ""),
        sector=str(props.get("sector", "") or ""),
        point=Point(float(lon), float(lat)),
        plume_count=int(props.get("plume_count") or 0),
        persistence=float(props.get("persistence") or 0.0),
        emission_auto=_to_float(props.get("emission_auto")),
        emission_uncertainty_auto=_to_float(
            props.get("emission_uncertainty_auto")
        ),
        first_observation=_parse_iso_datetime(props.get("first_observation")),
        last_observation=_parse_iso_datetime(props.get("last_observation")),
        raw=props,
    )

Carbon Mapper L2B scene raster wrapper.

:class:CMImageRaster exposes every loadable L2B scene asset (cmf / cmf-unortho / uncertainty / uncertainty-unortho / artifact-mask / rgb / uas) as lazy properties backed by :class:~georeader.rasterio_reader.RasterioReader (or plain text for the uas.txt sidecar).

Per-plume L3A products (mask, concentrations, IME-clipped concentrations, RGB, outline) live in :mod:~georeader.readers.carbonmapper.image β€” :class:~georeader.readers.carbonmapper.image.CMPlumeImage is the counterpart to this class for plume-level data.

Intentionally NOT wrapped:

  • PNG assets (rgb_png etc.) β€” un-georeferenced, not COGs.
  • Per-plume con_tif from the catalog REST surface β€” duplicates the column-density crop already provided by CMPlumeImage.

Pure raster wrappers β€” no DB binding, no blob upload. The DB-bound classes (CarbonMapperTile, CarbonMapperLocationImage) and the analyst notebooks consume them.

CM_L2B_BANDS = ('cmf', 'cmf-unortho', 'uncertainty', 'uncertainty-unortho', 'artifact-mask', 'rgb') module-attribute

DEFAULT_L2B_RGB_COLLECTION = 'l2b-rgb-v3a' module-attribute

CMImageRaster dataclass

L2B scene exposed as four georeader-backed rasters.

Lazy: instantiating the dataclass does NOT issue HTTP / blob reads; access .cmf / .rgb / etc. or call :meth:read_window / :meth:read_polygon to trigger I/O.

Attributes:

Name Type Description
scene_id str

CM L2B item id (e.g. "tan20251212t185057c20s4001").

asset_paths Mapping[str, PathLike]

Mapping of band name β†’ URL (https://) or local / blob path. artifact-mask may be missing β€” the accessor returns None.

overview_level Optional[int]

Forwarded to RasterioReader. None for full resolution; integer for COG overviews (faster previews).

Source code in georeader/readers/carbonmapper/rasters.py
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
@dataclass(repr=False)
class CMImageRaster:
    """L2B scene exposed as four georeader-backed rasters.

    Lazy: instantiating the dataclass does NOT issue HTTP / blob reads;
    access ``.cmf`` / ``.rgb`` / etc. or call :meth:`read_window` /
    :meth:`read_polygon` to trigger I/O.

    Attributes:
        scene_id: CM L2B item id (e.g. ``"tan20251212t185057c20s4001"``).
        asset_paths: Mapping of band name β†’ URL (``https://``) or
            local / blob path. ``artifact-mask`` may be missing β€” the
            accessor returns ``None``.
        overview_level: Forwarded to ``RasterioReader``. ``None`` for
            full resolution; integer for COG overviews (faster previews).
    """

    scene_id: str
    asset_paths: Mapping[str, PathLike]
    overview_level: Optional[int] = None

    # ---- Constructors --------------------------------------------------

    @classmethod
    def from_cm_tile_item(cls, item: CMTileItem) -> "CMImageRaster":
        """Build from the lightweight STAC item (Phase 0.2).

        STAC asset keys carry file extensions (``cmf.tif``,
        ``uncertainty.tif``, ``artifact-mask.tif``, ``uas.txt``,
        ``*-unortho.tif`` variants). This method strips the
        appropriate extension and retains every key listed in
        :data:`CM_L2B_BANDS` plus ``uas`` (the text sidecar).
        """
        paths: dict[str, PathLike] = {}
        for key, url in item.asset_urls.items():
            if not url:
                continue
            if key.endswith(".tif"):
                stripped = key[:-4]
            elif key.endswith(".txt"):
                stripped = key[:-4]
            else:
                stripped = key
            if stripped in _CM_L2B_KEYS_ALL:
                paths[stripped] = url
        return cls(scene_id=item.scene_id, asset_paths=paths)

    def with_rgb(self, rgb_item: CMTileItem) -> "CMImageRaster":
        """Return a copy with ``rgb`` merged in from a sibling STAC item.

        The CH4 (``l2b-ch4-mfa-v3a``) and RGB (``l2b-rgb-v3a``) L2B
        collections share ``scene_id`` and pixel grid, but each STAC
        item only exposes its own assets. Fetch both with
        :func:`api_queries.get_tile` (passing ``collection=...``) and
        compose them via this method:

        >>> ir = CMImageRaster.from_cm_tile_item(ch4_item)
        >>> ir = ir.with_rgb(rgb_item)
        >>> ir.rgb is not None
        True

        Raises:
            ValueError: If ``rgb_item.scene_id`` doesn't match
                ``self.scene_id`` (mismatched scenes don't share a grid
                β€” usually a programming error).
        """
        if rgb_item.scene_id != self.scene_id:
            raise ValueError(
                f"scene_id mismatch: {self.scene_id!r} vs {rgb_item.scene_id!r}"
            )
        # Pick the rgb GeoTIFF (with or without `.tif` extension);
        # ignore everything else on the rgb item.
        new_paths = dict(self.asset_paths)
        for key, url in rgb_item.asset_urls.items():
            if not url:
                continue
            stripped = key[:-4] if key.endswith(".tif") else key
            if stripped == "rgb":
                new_paths["rgb"] = url
                break
        return CMImageRaster(
            scene_id=self.scene_id,
            asset_paths=new_paths,
            overview_level=self.overview_level,
        )

    @classmethod
    def from_scene_id(
        cls,
        scene_id: str,
        *,
        token: str,
        l2b_collection_candidates: Sequence[str] = DEFAULT_L2B_CH4_COLLECTION_CANDIDATES,
        rgb_collection_candidates: Sequence[str] = DEFAULT_L2B_RGB_COLLECTION_CANDIDATES,
        with_rgb: bool = True,
        overview_level: int | None = None,
        http_timeout: float = 30.0,
    ) -> CMImageRaster:
        """Build by deriving L2B asset URLs from the scene_id (URL-pattern).

        Bypasses STAC entirely β€” derives every asset URL by templating
        against the verified asset-proxy pattern (see
        :func:`_l2b_asset_url`) and probing the candidate collections
        in order. Required for 2026 plumes (v3c/v3d L3A) whose L2B
        parent scenes are **not** in ``/stac/collections``.

        Parameters
        ----------
        scene_id:
            L2B scene id, equal to ``plume_id.rsplit("-", 1)[0]`` for
            any plume that came from this scene. Must follow the
            ``<inst><YYYYMMDD>t<HHMMSS>...`` convention so the date
            can be parsed.
        token:
            Bearer token. Required β€” the asset-proxy URLs return 401
            without it.
        l2b_collection_candidates:
            L2B CH4 collection IDs to probe, in order. First one to
            serve a 200/206 on ``cmf.tif`` wins. Defaults to
            :data:`DEFAULT_L2B_CH4_COLLECTION_CANDIDATES` β€”
            ``("l2b-ch4-mfa-v3c", "l2b-ch4-mfa-v3a")``.
        rgb_collection_candidates:
            L2B RGB sibling collection IDs probed identically (on
            ``rgb.tif``). Defaults to
            :data:`DEFAULT_L2B_RGB_COLLECTION_CANDIDATES`.
        with_rgb:
            When ``True`` (default), probe the RGB sibling collections
            and attach the ``rgb`` URL on success. When ``False``,
            ``self.rgb`` will be ``None``.
        overview_level:
            Forwarded to :class:`RasterioReader`.
        http_timeout:
            Per-probe range-GET timeout (seconds).

        Returns
        -------
        CMImageRaster
            With ``asset_paths`` populated for the 6 L2B CH4 assets
            (``cmf``, ``cmf-unortho``, ``uncertainty``,
            ``uncertainty-unortho``, ``artifact-mask``, ``uas``) and,
            when ``with_rgb=True``, the ``rgb`` sibling URL.

        Raises
        ------
        CMSceneNotPublished
            When every candidate L2B collection 404s for ``scene_id``
            β€” the scene either hasn't been processed yet or only
            exists in a collection variant not listed in
            ``l2b_collection_candidates``. Catch in ETL paths that
            want to defer rather than error.
        ValueError
            When ``scene_id`` doesn't carry an 8-digit date at
            positions ``[3:11]``.

        Examples
        --------
        >>> tile = CMImageRaster.from_scene_id(  # doctest: +SKIP
        ...     "tan20260331t181625c77s4001", token=tok,
        ... )
        >>> tile.cmf  # doctest: +SKIP
        <RasterioReader …/l2b-ch4-mfa-v3c/2026/03/31/…>
        """
        l2b_coll = _probe_l2b_collection(
            scene_id,
            l2b_collection_candidates,
            probe_asset="cmf.tif",
            token=token,
            http_timeout=http_timeout,
        )
        if l2b_coll is None:
            raise CMSceneNotPublished(scene_id)

        # Build the 6 CH4-collection asset URLs from the winning prefix.
        # Extensions are baked in β€” `_open` strips nothing, so keys must
        # match the lazy-property names exactly (without extensions).
        asset_paths: dict[str, PathLike] = {
            "cmf":                 _l2b_asset_url(l2b_coll, scene_id, "cmf.tif"),
            "cmf-unortho":         _l2b_asset_url(l2b_coll, scene_id, "cmf-unortho.tif"),
            "uncertainty":         _l2b_asset_url(l2b_coll, scene_id, "uncertainty.tif"),
            "uncertainty-unortho": _l2b_asset_url(l2b_coll, scene_id, "uncertainty-unortho.tif"),
            "artifact-mask":       _l2b_asset_url(l2b_coll, scene_id, "artifact-mask.tif"),
            "uas":                 _l2b_asset_url(l2b_coll, scene_id, "uas.txt"),
        }

        if with_rgb:
            rgb_coll = _probe_l2b_collection(
                scene_id,
                rgb_collection_candidates,
                probe_asset="rgb.tif",
                token=token,
                http_timeout=http_timeout,
            )
            if rgb_coll is not None:
                asset_paths["rgb"] = _l2b_asset_url(rgb_coll, scene_id, "rgb.tif")

        return cls(
            scene_id=scene_id,
            asset_paths=asset_paths,
            overview_level=overview_level,
        )

    @classmethod
    def from_local(cls, scene_dir: PathLike) -> "CMImageRaster":
        """Build from a downloaded scene directory.

        Picks up every L2B asset present (``cmf.tif`` / ``rgb.tif`` /
        ``uncertainty.tif`` / ``artifact-mask.tif`` and the
        un-orthorectified variants), plus the ``uas.txt`` sidecar.
        Missing files become absent keys in ``asset_paths``.
        """
        d = Path(scene_dir)
        paths: dict[str, PathLike] = {}
        for band in CM_L2B_BANDS:
            p = d / f"{band}.tif"
            if p.exists():
                paths[band] = str(p)
        uas_path = d / "uas.txt"
        if uas_path.exists():
            paths["uas"] = str(uas_path)
        return cls(scene_id=d.name, asset_paths=paths)

    # ---- Lazy band readers --------------------------------------------

    @cached_property
    def cmf(self) -> RasterioReader:
        """CH4 matched-filter retrieval, orthorectified (ppmΒ·m).
        Always present on L2B-CH4 items."""
        return self._open("cmf")

    @cached_property
    def cmf_unortho(self) -> Optional[RasterioReader]:
        """CH4 retrieval in raw sensor frame (pre-orthorectification).
        ``None`` for older collection variants (e.g. ``mfm-v1``) that
        don't ship the unortho sibling."""
        return self._open_optional("cmf-unortho")

    @cached_property
    def rgb(self) -> Optional[RasterioReader]:
        """3-band uint8 RGB. ``None`` for L2B-CH4 collections (RGB lives
        in a separate STAC collection β€” fetch and pass via
        ``asset_paths`` or compose via :meth:`with_rgb`)."""
        return self._open_optional("rgb")

    @cached_property
    def uncertainty(self) -> RasterioReader:
        """Companion uncertainty raster aligned with ``cmf``."""
        return self._open("uncertainty")

    @cached_property
    def uncertainty_unortho(self) -> Optional[RasterioReader]:
        """Per-pixel uncertainty in raw sensor frame. ``None`` for
        older collection variants without the unortho sibling."""
        return self._open_optional("uncertainty-unortho")

    @cached_property
    def artifact_mask(self) -> Optional[RasterioReader]:
        """Artefact mask (covers ~25% of scene). Flags un-orthorectified
        strip pixels and geometric anomalies β€” **not** a cloud mask.
        ``None`` if absent."""
        return self._open_optional("artifact-mask")

    @cached_property
    def uas(self) -> Optional[str]:
        """UAS sensor-metadata sidecar β€” raw text from ``uas.txt``.

        Lazy-fetched on first access (one HTTP GET if the path is a
        URL, or a file read for local paths) and cached as a string.
        Callers parse the structure as needed; we don't impose a
        schema. Returns ``None`` if no ``uas`` URL/path was supplied.

        Auth: rasterio's curl session is configured via the
        ``GDAL_HTTP_HEADERS`` env var (set by the standard reader
        bootstrap). We re-use that header here so a single
        ``Authorization: Bearer <token>`` setup applies to every
        L2B asset, raster or text alike.
        """
        path = self.asset_paths.get("uas")
        if path is None:
            return None
        sp = str(path)
        if sp.startswith(("http://", "https://")):
            headers: dict[str, str] = {}
            gdal_hdr = os.environ.get("GDAL_HTTP_HEADERS", "")
            if gdal_hdr.lower().startswith("authorization:"):
                headers["Authorization"] = gdal_hdr.split(":", 1)[1].strip()
            r = requests.get(sp, headers=headers, timeout=30)
            r.raise_for_status()
            return r.text
        with open(sp, "r") as fh:
            return fh.read()

    # ---- Geometric metadata (pulled from cmf as the canonical band) ---

    @property
    def crs(self) -> str:
        return str(self.cmf.crs)

    @property
    def transform(self):
        return self.cmf.transform

    @property
    def bounds(self) -> BBox:
        b = self.cmf.bounds
        return (float(b[0]), float(b[1]), float(b[2]), float(b[3]))

    @property
    def shape(self) -> tuple[int, int]:
        return (self.cmf.height, self.cmf.width)

    # ---- Read helpers (delegate to georeader.read) --------------------

    def read_polygon(
        self,
        polygon: BaseGeometry,
        *,
        crs_polygon: str = "EPSG:4326",
        bands: Iterable[str] = CM_L2B_BANDS,
    ) -> dict[str, Optional[GeoData]]:
        """Read a polygon clip from the requested bands.

        Args:
            polygon: Clip geometry.
            crs_polygon: CRS of ``polygon``. Defaults to ``"EPSG:4326"``.
            bands: Subset of band names. Bands whose asset is missing
                or whose window has zero overlap return ``None``.

        Returns:
            ``{"cmf": <GeoData>, "rgb": <GeoData>, ...}`` β€” windowed
            ``RasterioReader`` instances (lazy, satisfying the
            :class:`GeoData` protocol). Call ``.load()`` to materialise
            as :class:`GeoTensor`.
        """
        out: dict[str, Optional[GeoData]] = {}
        for band in bands:
            if self.asset_paths.get(band) is None:
                out[band] = None
                continue
            # `uas` is a text sidecar, not a raster β€” skip the band
            # reader path. Callers reading text sidecars use the
            # `.uas` property directly.
            if band == "uas":
                continue
            reader = self._open(band)
            # `boundless=False` makes `read_from_polygon` return `None`
            # for windows that don't intersect the raster (e.g. an
            # artifact-mask whose un-orthorectified strip falls outside
            # the requested AOI), instead of allocating a fill-valued
            # tensor the size of the requested window. Real CRS / I/O
            # errors are left to propagate β€” the prior bare
            # `except Exception` swallowed those silently.
            #
            # `read_from_polygon` returns ``GeoData | NDArray``; with
            # ``return_only_data=False`` the GeoData arm is the one we
            # always hit. ``RasterioReader`` satisfies the ``GeoData``
            # protocol structurally, but ty doesn't currently infer
            # that β€” cast for clarity.
            result = read.read_from_polygon(
                cast(GeoData, reader),
                polygon=polygon,
                crs_polygon=crs_polygon,
                boundless=False,
            )
            out[band] = cast(GeoData, result) if result is not None else None
        return out

    def read_window(
        self,
        bounds_4326: BBox,
        *,
        bands: Iterable[str] = CM_L2B_BANDS,
    ) -> dict[str, Optional[GeoData]]:
        """Read a WGS-84 bbox window from the requested bands."""
        return self.read_polygon(box(*bounds_4326), bands=bands)

    def read_window_to_crs(
        self,
        bounds_4326: BBox,
        crs_dst: str,
        *,
        bands: Iterable[str] = CM_L2B_BANDS,
    ) -> dict[str, Optional[GeoTensor]]:
        """Read a window then reproject each band to ``crs_dst``.

        Reprojection materialises the data β€” values are
        :class:`GeoTensor`, not lazy readers.
        """
        crops = self.read_window(bounds_4326, bands=bands)
        # `read_to_crs` returns ``GeoTensor | NDArray``; same narrowing
        # rationale as ``read_from_polygon`` above.
        return {
            band: (
                cast(GeoTensor, read.read_to_crs(geo, crs_dst))
                if geo is not None
                else None
            )
            for band, geo in crops.items()
        }

    # ---- Internals -----------------------------------------------------

    def _open(self, band: str) -> RasterioReader:
        path = self.asset_paths.get(band)
        if path is None:
            raise KeyError(f"Asset {band!r} not present on {self.scene_id}")
        return RasterioReader(str(path), overview_level=self.overview_level)

    def _open_optional(self, band: str) -> Optional[RasterioReader]:
        if self.asset_paths.get(band) is None:
            return None
        return self._open(band)

    # ---- Repr ---------------------------------------------------------

    def __repr__(self) -> str:
        present = [b for b in CM_L2B_BANDS if b in self.asset_paths]
        missing = [b for b in CM_L2B_BANDS if b not in self.asset_paths]
        extra = sorted(set(self.asset_paths) - set(CM_L2B_BANDS))
        ov = self.overview_level if self.overview_level is not None else "full"
        lines = [
            "CMImageRaster",
            f"  scene_id:       {self.scene_id}",
            f"  bands present:  {present or '<none>'}",
        ]
        if missing:
            lines.append(f"  bands missing:  {missing}")
        if extra:
            lines.append(f"  extra keys:     {extra}")
        lines.append(f"  overview_level: {ov}")
        return "\n".join(lines)

    __str__ = __repr__

artifact_mask cached property

Artefact mask (covers ~25% of scene). Flags un-orthorectified strip pixels and geometric anomalies β€” not a cloud mask. None if absent.

cmf cached property

CH4 matched-filter retrieval, orthorectified (ppmΒ·m). Always present on L2B-CH4 items.

cmf_unortho cached property

CH4 retrieval in raw sensor frame (pre-orthorectification). None for older collection variants (e.g. mfm-v1) that don't ship the unortho sibling.

rgb cached property

3-band uint8 RGB. None for L2B-CH4 collections (RGB lives in a separate STAC collection β€” fetch and pass via asset_paths or compose via :meth:with_rgb).

uas cached property

UAS sensor-metadata sidecar β€” raw text from uas.txt.

Lazy-fetched on first access (one HTTP GET if the path is a URL, or a file read for local paths) and cached as a string. Callers parse the structure as needed; we don't impose a schema. Returns None if no uas URL/path was supplied.

Auth: rasterio's curl session is configured via the GDAL_HTTP_HEADERS env var (set by the standard reader bootstrap). We re-use that header here so a single Authorization: Bearer <token> setup applies to every L2B asset, raster or text alike.

uncertainty cached property

Companion uncertainty raster aligned with cmf.

uncertainty_unortho cached property

Per-pixel uncertainty in raw sensor frame. None for older collection variants without the unortho sibling.

from_cm_tile_item(item) classmethod

Build from the lightweight STAC item (Phase 0.2).

STAC asset keys carry file extensions (cmf.tif, uncertainty.tif, artifact-mask.tif, uas.txt, *-unortho.tif variants). This method strips the appropriate extension and retains every key listed in :data:CM_L2B_BANDS plus uas (the text sidecar).

Source code in georeader/readers/carbonmapper/rasters.py
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
@classmethod
def from_cm_tile_item(cls, item: CMTileItem) -> "CMImageRaster":
    """Build from the lightweight STAC item (Phase 0.2).

    STAC asset keys carry file extensions (``cmf.tif``,
    ``uncertainty.tif``, ``artifact-mask.tif``, ``uas.txt``,
    ``*-unortho.tif`` variants). This method strips the
    appropriate extension and retains every key listed in
    :data:`CM_L2B_BANDS` plus ``uas`` (the text sidecar).
    """
    paths: dict[str, PathLike] = {}
    for key, url in item.asset_urls.items():
        if not url:
            continue
        if key.endswith(".tif"):
            stripped = key[:-4]
        elif key.endswith(".txt"):
            stripped = key[:-4]
        else:
            stripped = key
        if stripped in _CM_L2B_KEYS_ALL:
            paths[stripped] = url
    return cls(scene_id=item.scene_id, asset_paths=paths)

from_local(scene_dir) classmethod

Build from a downloaded scene directory.

Picks up every L2B asset present (cmf.tif / rgb.tif / uncertainty.tif / artifact-mask.tif and the un-orthorectified variants), plus the uas.txt sidecar. Missing files become absent keys in asset_paths.

Source code in georeader/readers/carbonmapper/rasters.py
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
@classmethod
def from_local(cls, scene_dir: PathLike) -> "CMImageRaster":
    """Build from a downloaded scene directory.

    Picks up every L2B asset present (``cmf.tif`` / ``rgb.tif`` /
    ``uncertainty.tif`` / ``artifact-mask.tif`` and the
    un-orthorectified variants), plus the ``uas.txt`` sidecar.
    Missing files become absent keys in ``asset_paths``.
    """
    d = Path(scene_dir)
    paths: dict[str, PathLike] = {}
    for band in CM_L2B_BANDS:
        p = d / f"{band}.tif"
        if p.exists():
            paths[band] = str(p)
    uas_path = d / "uas.txt"
    if uas_path.exists():
        paths["uas"] = str(uas_path)
    return cls(scene_id=d.name, asset_paths=paths)

from_scene_id(scene_id, *, token, l2b_collection_candidates=DEFAULT_L2B_CH4_COLLECTION_CANDIDATES, rgb_collection_candidates=DEFAULT_L2B_RGB_COLLECTION_CANDIDATES, with_rgb=True, overview_level=None, http_timeout=30.0) classmethod

Build by deriving L2B asset URLs from the scene_id (URL-pattern).

Bypasses STAC entirely β€” derives every asset URL by templating against the verified asset-proxy pattern (see :func:_l2b_asset_url) and probing the candidate collections in order. Required for 2026 plumes (v3c/v3d L3A) whose L2B parent scenes are not in /stac/collections.

Parameters

scene_id: L2B scene id, equal to plume_id.rsplit("-", 1)[0] for any plume that came from this scene. Must follow the <inst><YYYYMMDD>t<HHMMSS>... convention so the date can be parsed. token: Bearer token. Required β€” the asset-proxy URLs return 401 without it. l2b_collection_candidates: L2B CH4 collection IDs to probe, in order. First one to serve a 200/206 on cmf.tif wins. Defaults to :data:DEFAULT_L2B_CH4_COLLECTION_CANDIDATES β€” ("l2b-ch4-mfa-v3c", "l2b-ch4-mfa-v3a"). rgb_collection_candidates: L2B RGB sibling collection IDs probed identically (on rgb.tif). Defaults to :data:DEFAULT_L2B_RGB_COLLECTION_CANDIDATES. with_rgb: When True (default), probe the RGB sibling collections and attach the rgb URL on success. When False, self.rgb will be None. overview_level: Forwarded to :class:RasterioReader. http_timeout: Per-probe range-GET timeout (seconds).

Returns

CMImageRaster With asset_paths populated for the 6 L2B CH4 assets (cmf, cmf-unortho, uncertainty, uncertainty-unortho, artifact-mask, uas) and, when with_rgb=True, the rgb sibling URL.

Raises

CMSceneNotPublished When every candidate L2B collection 404s for scene_id β€” the scene either hasn't been processed yet or only exists in a collection variant not listed in l2b_collection_candidates. Catch in ETL paths that want to defer rather than error. ValueError When scene_id doesn't carry an 8-digit date at positions [3:11].

Examples

tile = CMImageRaster.from_scene_id( # doctest: +SKIP ... "tan20260331t181625c77s4001", token=tok, ... ) tile.cmf # doctest: +SKIP

Source code in georeader/readers/carbonmapper/rasters.py
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
@classmethod
def from_scene_id(
    cls,
    scene_id: str,
    *,
    token: str,
    l2b_collection_candidates: Sequence[str] = DEFAULT_L2B_CH4_COLLECTION_CANDIDATES,
    rgb_collection_candidates: Sequence[str] = DEFAULT_L2B_RGB_COLLECTION_CANDIDATES,
    with_rgb: bool = True,
    overview_level: int | None = None,
    http_timeout: float = 30.0,
) -> CMImageRaster:
    """Build by deriving L2B asset URLs from the scene_id (URL-pattern).

    Bypasses STAC entirely β€” derives every asset URL by templating
    against the verified asset-proxy pattern (see
    :func:`_l2b_asset_url`) and probing the candidate collections
    in order. Required for 2026 plumes (v3c/v3d L3A) whose L2B
    parent scenes are **not** in ``/stac/collections``.

    Parameters
    ----------
    scene_id:
        L2B scene id, equal to ``plume_id.rsplit("-", 1)[0]`` for
        any plume that came from this scene. Must follow the
        ``<inst><YYYYMMDD>t<HHMMSS>...`` convention so the date
        can be parsed.
    token:
        Bearer token. Required β€” the asset-proxy URLs return 401
        without it.
    l2b_collection_candidates:
        L2B CH4 collection IDs to probe, in order. First one to
        serve a 200/206 on ``cmf.tif`` wins. Defaults to
        :data:`DEFAULT_L2B_CH4_COLLECTION_CANDIDATES` β€”
        ``("l2b-ch4-mfa-v3c", "l2b-ch4-mfa-v3a")``.
    rgb_collection_candidates:
        L2B RGB sibling collection IDs probed identically (on
        ``rgb.tif``). Defaults to
        :data:`DEFAULT_L2B_RGB_COLLECTION_CANDIDATES`.
    with_rgb:
        When ``True`` (default), probe the RGB sibling collections
        and attach the ``rgb`` URL on success. When ``False``,
        ``self.rgb`` will be ``None``.
    overview_level:
        Forwarded to :class:`RasterioReader`.
    http_timeout:
        Per-probe range-GET timeout (seconds).

    Returns
    -------
    CMImageRaster
        With ``asset_paths`` populated for the 6 L2B CH4 assets
        (``cmf``, ``cmf-unortho``, ``uncertainty``,
        ``uncertainty-unortho``, ``artifact-mask``, ``uas``) and,
        when ``with_rgb=True``, the ``rgb`` sibling URL.

    Raises
    ------
    CMSceneNotPublished
        When every candidate L2B collection 404s for ``scene_id``
        β€” the scene either hasn't been processed yet or only
        exists in a collection variant not listed in
        ``l2b_collection_candidates``. Catch in ETL paths that
        want to defer rather than error.
    ValueError
        When ``scene_id`` doesn't carry an 8-digit date at
        positions ``[3:11]``.

    Examples
    --------
    >>> tile = CMImageRaster.from_scene_id(  # doctest: +SKIP
    ...     "tan20260331t181625c77s4001", token=tok,
    ... )
    >>> tile.cmf  # doctest: +SKIP
    <RasterioReader …/l2b-ch4-mfa-v3c/2026/03/31/…>
    """
    l2b_coll = _probe_l2b_collection(
        scene_id,
        l2b_collection_candidates,
        probe_asset="cmf.tif",
        token=token,
        http_timeout=http_timeout,
    )
    if l2b_coll is None:
        raise CMSceneNotPublished(scene_id)

    # Build the 6 CH4-collection asset URLs from the winning prefix.
    # Extensions are baked in β€” `_open` strips nothing, so keys must
    # match the lazy-property names exactly (without extensions).
    asset_paths: dict[str, PathLike] = {
        "cmf":                 _l2b_asset_url(l2b_coll, scene_id, "cmf.tif"),
        "cmf-unortho":         _l2b_asset_url(l2b_coll, scene_id, "cmf-unortho.tif"),
        "uncertainty":         _l2b_asset_url(l2b_coll, scene_id, "uncertainty.tif"),
        "uncertainty-unortho": _l2b_asset_url(l2b_coll, scene_id, "uncertainty-unortho.tif"),
        "artifact-mask":       _l2b_asset_url(l2b_coll, scene_id, "artifact-mask.tif"),
        "uas":                 _l2b_asset_url(l2b_coll, scene_id, "uas.txt"),
    }

    if with_rgb:
        rgb_coll = _probe_l2b_collection(
            scene_id,
            rgb_collection_candidates,
            probe_asset="rgb.tif",
            token=token,
            http_timeout=http_timeout,
        )
        if rgb_coll is not None:
            asset_paths["rgb"] = _l2b_asset_url(rgb_coll, scene_id, "rgb.tif")

    return cls(
        scene_id=scene_id,
        asset_paths=asset_paths,
        overview_level=overview_level,
    )

read_polygon(polygon, *, crs_polygon='EPSG:4326', bands=CM_L2B_BANDS)

Read a polygon clip from the requested bands.

Parameters:

Name Type Description Default
polygon BaseGeometry

Clip geometry.

required
crs_polygon str

CRS of polygon. Defaults to "EPSG:4326".

'EPSG:4326'
bands Iterable[str]

Subset of band names. Bands whose asset is missing or whose window has zero overlap return None.

CM_L2B_BANDS

Returns:

Name Type Description
dict[str, Optional[GeoData]]

{"cmf": <GeoData>, "rgb": <GeoData>, ...} β€” windowed

dict[str, Optional[GeoData]]

RasterioReader instances (lazy, satisfying the

dict[str, Optional[GeoData]]

class:GeoData protocol). Call .load() to materialise

as dict[str, Optional[GeoData]]

class:GeoTensor.

Source code in georeader/readers/carbonmapper/rasters.py
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
def read_polygon(
    self,
    polygon: BaseGeometry,
    *,
    crs_polygon: str = "EPSG:4326",
    bands: Iterable[str] = CM_L2B_BANDS,
) -> dict[str, Optional[GeoData]]:
    """Read a polygon clip from the requested bands.

    Args:
        polygon: Clip geometry.
        crs_polygon: CRS of ``polygon``. Defaults to ``"EPSG:4326"``.
        bands: Subset of band names. Bands whose asset is missing
            or whose window has zero overlap return ``None``.

    Returns:
        ``{"cmf": <GeoData>, "rgb": <GeoData>, ...}`` β€” windowed
        ``RasterioReader`` instances (lazy, satisfying the
        :class:`GeoData` protocol). Call ``.load()`` to materialise
        as :class:`GeoTensor`.
    """
    out: dict[str, Optional[GeoData]] = {}
    for band in bands:
        if self.asset_paths.get(band) is None:
            out[band] = None
            continue
        # `uas` is a text sidecar, not a raster β€” skip the band
        # reader path. Callers reading text sidecars use the
        # `.uas` property directly.
        if band == "uas":
            continue
        reader = self._open(band)
        # `boundless=False` makes `read_from_polygon` return `None`
        # for windows that don't intersect the raster (e.g. an
        # artifact-mask whose un-orthorectified strip falls outside
        # the requested AOI), instead of allocating a fill-valued
        # tensor the size of the requested window. Real CRS / I/O
        # errors are left to propagate β€” the prior bare
        # `except Exception` swallowed those silently.
        #
        # `read_from_polygon` returns ``GeoData | NDArray``; with
        # ``return_only_data=False`` the GeoData arm is the one we
        # always hit. ``RasterioReader`` satisfies the ``GeoData``
        # protocol structurally, but ty doesn't currently infer
        # that β€” cast for clarity.
        result = read.read_from_polygon(
            cast(GeoData, reader),
            polygon=polygon,
            crs_polygon=crs_polygon,
            boundless=False,
        )
        out[band] = cast(GeoData, result) if result is not None else None
    return out

read_window(bounds_4326, *, bands=CM_L2B_BANDS)

Read a WGS-84 bbox window from the requested bands.

Source code in georeader/readers/carbonmapper/rasters.py
567
568
569
570
571
572
573
574
def read_window(
    self,
    bounds_4326: BBox,
    *,
    bands: Iterable[str] = CM_L2B_BANDS,
) -> dict[str, Optional[GeoData]]:
    """Read a WGS-84 bbox window from the requested bands."""
    return self.read_polygon(box(*bounds_4326), bands=bands)

read_window_to_crs(bounds_4326, crs_dst, *, bands=CM_L2B_BANDS)

Read a window then reproject each band to crs_dst.

Reprojection materialises the data β€” values are :class:GeoTensor, not lazy readers.

Source code in georeader/readers/carbonmapper/rasters.py
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
def read_window_to_crs(
    self,
    bounds_4326: BBox,
    crs_dst: str,
    *,
    bands: Iterable[str] = CM_L2B_BANDS,
) -> dict[str, Optional[GeoTensor]]:
    """Read a window then reproject each band to ``crs_dst``.

    Reprojection materialises the data β€” values are
    :class:`GeoTensor`, not lazy readers.
    """
    crops = self.read_window(bounds_4326, bands=bands)
    # `read_to_crs` returns ``GeoTensor | NDArray``; same narrowing
    # rationale as ``read_from_polygon`` above.
    return {
        band: (
            cast(GeoTensor, read.read_to_crs(geo, crs_dst))
            if geo is not None
            else None
        )
        for band, geo in crops.items()
    }

with_rgb(rgb_item)

Return a copy with rgb merged in from a sibling STAC item.

The CH4 (l2b-ch4-mfa-v3a) and RGB (l2b-rgb-v3a) L2B collections share scene_id and pixel grid, but each STAC item only exposes its own assets. Fetch both with :func:api_queries.get_tile (passing collection=...) and compose them via this method:

ir = CMImageRaster.from_cm_tile_item(ch4_item) ir = ir.with_rgb(rgb_item) ir.rgb is not None True

Raises:

Type Description
ValueError

If rgb_item.scene_id doesn't match self.scene_id (mismatched scenes don't share a grid β€” usually a programming error).

Source code in georeader/readers/carbonmapper/rasters.py
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
def with_rgb(self, rgb_item: CMTileItem) -> "CMImageRaster":
    """Return a copy with ``rgb`` merged in from a sibling STAC item.

    The CH4 (``l2b-ch4-mfa-v3a``) and RGB (``l2b-rgb-v3a``) L2B
    collections share ``scene_id`` and pixel grid, but each STAC
    item only exposes its own assets. Fetch both with
    :func:`api_queries.get_tile` (passing ``collection=...``) and
    compose them via this method:

    >>> ir = CMImageRaster.from_cm_tile_item(ch4_item)
    >>> ir = ir.with_rgb(rgb_item)
    >>> ir.rgb is not None
    True

    Raises:
        ValueError: If ``rgb_item.scene_id`` doesn't match
            ``self.scene_id`` (mismatched scenes don't share a grid
            β€” usually a programming error).
    """
    if rgb_item.scene_id != self.scene_id:
        raise ValueError(
            f"scene_id mismatch: {self.scene_id!r} vs {rgb_item.scene_id!r}"
        )
    # Pick the rgb GeoTIFF (with or without `.tif` extension);
    # ignore everything else on the rgb item.
    new_paths = dict(self.asset_paths)
    for key, url in rgb_item.asset_urls.items():
        if not url:
            continue
        stripped = key[:-4] if key.endswith(".tif") else key
        if stripped == "rgb":
            new_paths["rgb"] = url
            break
    return CMImageRaster(
        scene_id=self.scene_id,
        asset_paths=new_paths,
        overview_level=self.overview_level,
    )

Rasterize Carbon Mapper sources (point clusters) onto a target grid.

Carbon Mapper sources are point geometries (DBSCAN-clustered plume locations). For training labels, QA overlays, and source-prior features it is useful to project them onto the same grid as an L2B scene as a binary mask. This module provides:

  • :func:rasterize_sources β€” one-shot function: list of points β†’ :class:~georeader.geotensor.GeoTensor mask.
  • :class:CMSourceRaster β€” lazy wrapper that mirrors :class:~georeader.readers.carbonmapper.rasters.CMImageRaster shape (read_polygon / read_window / read_window_to_crs) so callers can compose the source mask with the L2B rasters.

Both delegate the actual burn-in to :func:georeader.rasterize.rasterize_geopandas_like / :func:~georeader.rasterize.rasterize_from_geopandas β€” no custom rasterio.features call lives in this module.

The Carbon Mapper API does not publish a sources raster β€” these helpers build it client-side from :func:list_sources (or any iterable of :class:~georeader.readers.carbonmapper.source.CMSource).

CMSourceRaster dataclass

Lazy binary-mask raster of Carbon Mapper sources on a target grid.

Mirrors the read-helper surface of :class:~georeader.readers.carbonmapper.rasters.CMImageRaster so callers can compose source masks with L2B reads.

Attributes

sources: Source points to rasterize. transform, shape, crs: Target grid spec. Use :meth:from_cmtileitem or :meth:from_geodata to inherit the spec from an existing raster. buffer_m: Per-point disk radius in metres. 0 β†’ single pixel.

Source code in georeader/readers/carbonmapper/sources_raster.py
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
@dataclass(repr=False)
class CMSourceRaster:
    """Lazy binary-mask raster of Carbon Mapper sources on a target grid.

    Mirrors the read-helper surface of
    :class:`~georeader.readers.carbonmapper.rasters.CMImageRaster` so
    callers can compose source masks with L2B reads.

    Attributes
    ----------
    sources:
        Source points to rasterize.
    transform, shape, crs:
        Target grid spec. Use :meth:`from_cmtileitem` or
        :meth:`from_geodata` to inherit the spec from an existing
        raster.
    buffer_m:
        Per-point disk radius in metres. ``0`` β†’ single pixel.
    """

    sources: Sequence[SourceLike]
    transform: rasterio.Affine
    shape: tuple[int, int]
    crs: rasterio.crs.CRS
    buffer_m: float = 0.0

    # ---- Constructors ----

    @classmethod
    def from_geodata(
        cls,
        sources: Sequence[SourceLike],
        template: GeoData,
        *,
        buffer_m: float = 0.0,
    ) -> "CMSourceRaster":
        """Build a source raster aligned to an existing :class:`GeoData`."""
        return cls(
            sources=sources,
            transform=template.transform,
            shape=(template.shape[-2], template.shape[-1]),
            crs=rasterio.crs.CRS.from_user_input(template.crs),
            buffer_m=buffer_m,
        )

    @classmethod
    def from_cmtileitem(
        cls,
        sources: Sequence[SourceLike],
        tile: CMTileItem,
        *,
        buffer_m: float = 0.0,
    ) -> "CMSourceRaster":
        """Build a source raster aligned to an L2B :class:`CMTileItem`.

        Resolves the tile's ``cmf`` GeoTIFF header to inherit
        ``(transform, shape, crs)``. Issues one HEAD/GET-range read.
        """
        cmf_url = tile.assets.get("cmf") or tile.assets.get("ch4-mfa")
        if cmf_url is None:
            raise ValueError(
                f"CMTileItem {tile.scene_id!r} has no 'cmf' asset to align to."
            )
        with rasterio.open(cmf_url) as ds:
            return cls(
                sources=sources,
                transform=ds.transform,
                shape=(ds.height, ds.width),
                crs=ds.crs,
                buffer_m=buffer_m,
            )

    # ---- Eager render ----

    def load(self) -> GeoTensor:
        """Rasterize all sources onto the full grid."""
        return rasterize_sources(
            self.sources,
            transform=self.transform,
            shape=self.shape,
            crs=self.crs,
            buffer_m=self.buffer_m,
        )

    # ---- Read helpers (mirror CMImageRaster) ----

    def read_polygon(
        self,
        polygon: BaseGeometry,
        *,
        crs_polygon: str = "EPSG:4326",
    ) -> GeoTensor:
        """Read a polygon clip of the source mask."""
        full = self.load()
        # `read_from_polygon` returns ``GeoData | NDArray``; with the
        # default ``return_only_data=False`` the GeoData arm is the one
        # we always hit.
        return cast(
            GeoTensor,
            read.read_from_polygon(
                cast(GeoData, full),
                polygon=polygon,
                crs_polygon=crs_polygon,
            ),
        )

    def read_window(self, bounds_4326: BBox) -> GeoTensor:
        """Read a WGS-84 bbox window of the source mask."""
        return self.read_polygon(box(*bounds_4326))

    def read_window_to_crs(
        self,
        bounds_4326: BBox,
        crs_dst: str,
    ) -> GeoTensor:
        """Read a window then reproject the mask to ``crs_dst``."""
        crop = self.read_window(bounds_4326)
        return cast(GeoTensor, read.read_to_crs(crop, crs_dst))

    # ---- Repr ----

    def __repr__(self) -> str:
        return (
            f"{type(self).__name__}(n_sources={len(self.sources)}, "
            f"shape={self.shape}, buffer_m={self.buffer_m}, crs={self.crs})"
        )

from_cmtileitem(sources, tile, *, buffer_m=0.0) classmethod

Build a source raster aligned to an L2B :class:CMTileItem.

Resolves the tile's cmf GeoTIFF header to inherit (transform, shape, crs). Issues one HEAD/GET-range read.

Source code in georeader/readers/carbonmapper/sources_raster.py
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
@classmethod
def from_cmtileitem(
    cls,
    sources: Sequence[SourceLike],
    tile: CMTileItem,
    *,
    buffer_m: float = 0.0,
) -> "CMSourceRaster":
    """Build a source raster aligned to an L2B :class:`CMTileItem`.

    Resolves the tile's ``cmf`` GeoTIFF header to inherit
    ``(transform, shape, crs)``. Issues one HEAD/GET-range read.
    """
    cmf_url = tile.assets.get("cmf") or tile.assets.get("ch4-mfa")
    if cmf_url is None:
        raise ValueError(
            f"CMTileItem {tile.scene_id!r} has no 'cmf' asset to align to."
        )
    with rasterio.open(cmf_url) as ds:
        return cls(
            sources=sources,
            transform=ds.transform,
            shape=(ds.height, ds.width),
            crs=ds.crs,
            buffer_m=buffer_m,
        )

from_geodata(sources, template, *, buffer_m=0.0) classmethod

Build a source raster aligned to an existing :class:GeoData.

Source code in georeader/readers/carbonmapper/sources_raster.py
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
@classmethod
def from_geodata(
    cls,
    sources: Sequence[SourceLike],
    template: GeoData,
    *,
    buffer_m: float = 0.0,
) -> "CMSourceRaster":
    """Build a source raster aligned to an existing :class:`GeoData`."""
    return cls(
        sources=sources,
        transform=template.transform,
        shape=(template.shape[-2], template.shape[-1]),
        crs=rasterio.crs.CRS.from_user_input(template.crs),
        buffer_m=buffer_m,
    )

load()

Rasterize all sources onto the full grid.

Source code in georeader/readers/carbonmapper/sources_raster.py
281
282
283
284
285
286
287
288
289
def load(self) -> GeoTensor:
    """Rasterize all sources onto the full grid."""
    return rasterize_sources(
        self.sources,
        transform=self.transform,
        shape=self.shape,
        crs=self.crs,
        buffer_m=self.buffer_m,
    )

read_polygon(polygon, *, crs_polygon='EPSG:4326')

Read a polygon clip of the source mask.

Source code in georeader/readers/carbonmapper/sources_raster.py
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
def read_polygon(
    self,
    polygon: BaseGeometry,
    *,
    crs_polygon: str = "EPSG:4326",
) -> GeoTensor:
    """Read a polygon clip of the source mask."""
    full = self.load()
    # `read_from_polygon` returns ``GeoData | NDArray``; with the
    # default ``return_only_data=False`` the GeoData arm is the one
    # we always hit.
    return cast(
        GeoTensor,
        read.read_from_polygon(
            cast(GeoData, full),
            polygon=polygon,
            crs_polygon=crs_polygon,
        ),
    )

read_window(bounds_4326)

Read a WGS-84 bbox window of the source mask.

Source code in georeader/readers/carbonmapper/sources_raster.py
313
314
315
def read_window(self, bounds_4326: BBox) -> GeoTensor:
    """Read a WGS-84 bbox window of the source mask."""
    return self.read_polygon(box(*bounds_4326))

read_window_to_crs(bounds_4326, crs_dst)

Read a window then reproject the mask to crs_dst.

Source code in georeader/readers/carbonmapper/sources_raster.py
317
318
319
320
321
322
323
324
def read_window_to_crs(
    self,
    bounds_4326: BBox,
    crs_dst: str,
) -> GeoTensor:
    """Read a window then reproject the mask to ``crs_dst``."""
    crop = self.read_window(bounds_4326)
    return cast(GeoTensor, read.read_to_crs(crop, crs_dst))

rasterize_sources(sources, *, transform, shape, crs, buffer_m=0.0)

Rasterize source points onto a target grid as a binary mask.

Each source contributes a value of 1 at its pixel; if buffer_m > 0 a disk of that radius (in metres) is stamped instead. Sources falling outside the grid are silently dropped.

Delegates to :func:georeader.rasterize.rasterize_from_geopandas.

Parameters

sources: Iterable of :class:CMSource, Shapely :class:Point, or (lon, lat) tuples β€” all interpreted as WGS-84 lon/lat. transform: Affine transform of the target grid. shape: (height, width) of the target grid. crs: CRS of the target grid. Must be projected when buffer_m > 0. buffer_m: Buffer radius in metres applied around each source point. 0 (default) β†’ all_touched single-pixel stamp per source.

Returns

GeoTensor 2D mask of shape with values in {0, 1}.

Raises

ValueError If buffer_m > 0 and crs is geographic, or if shape is not 2D.

Source code in georeader/readers/carbonmapper/sources_raster.py
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
def rasterize_sources(
    sources: Iterable[SourceLike],
    *,
    transform: rasterio.Affine,
    shape: tuple[int, int],
    crs: Union[str, rasterio.crs.CRS],
    buffer_m: float = 0.0,
) -> GeoTensor:
    """Rasterize source points onto a target grid as a binary mask.

    Each source contributes a value of ``1`` at its pixel; if
    ``buffer_m > 0`` a disk of that radius (in metres) is stamped
    instead. Sources falling outside the grid are silently dropped.

    Delegates to
    :func:`georeader.rasterize.rasterize_from_geopandas`.

    Parameters
    ----------
    sources:
        Iterable of :class:`CMSource`, Shapely :class:`Point`, or
        ``(lon, lat)`` tuples β€” all interpreted as WGS-84 lon/lat.
    transform:
        Affine transform of the target grid.
    shape:
        ``(height, width)`` of the target grid.
    crs:
        CRS of the target grid. Must be projected when
        ``buffer_m > 0``.
    buffer_m:
        Buffer radius in metres applied around each source point.
        ``0`` (default) β†’ ``all_touched`` single-pixel stamp per source.

    Returns
    -------
    GeoTensor
        2D mask of ``shape`` with values in ``{0, 1}``.

    Raises
    ------
    ValueError
        If ``buffer_m > 0`` and ``crs`` is geographic, or if ``shape``
        is not 2D.
    """
    if len(shape) != 2:
        raise ValueError(f"Expected (H, W) shape, got {shape}")
    crs_obj = rasterio.crs.CRS.from_user_input(crs)

    gdf = _sources_gdf(sources)
    if len(gdf) == 0:
        return GeoTensor(
            np.zeros(shape, dtype=np.uint8),
            transform=transform, crs=crs_obj, fill_value_default=0,
        )

    if buffer_m > 0:
        gdf = _apply_buffer(gdf, crs_obj, buffer_m)
        all_touched = False
    else:
        gdf = gdf.to_crs(crs_obj)
        all_touched = True  # stamp the pixel containing each point

    height, width = shape
    window_out = rasterio.windows.Window(0, 0, width=width, height=height)
    return cast(
        GeoTensor,
        rasterize_from_geopandas(
            gdf,
            column="value",
            transform=transform,
            window_out=window_out,
            crs_out=crs_obj,
            fill=0,
            all_touched=all_touched,
        ),
    )

rasterize_sources_like(sources, data_like, *, buffer_m=0.0)

Rasterize sources onto an existing :class:GeoData grid.

Thin wrapper around :func:georeader.rasterize.rasterize_geopandas_like.

Source code in georeader/readers/carbonmapper/sources_raster.py
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
def rasterize_sources_like(
    sources: Iterable[SourceLike],
    data_like: GeoData,
    *,
    buffer_m: float = 0.0,
) -> GeoTensor:
    """Rasterize sources onto an existing :class:`GeoData` grid.

    Thin wrapper around
    :func:`georeader.rasterize.rasterize_geopandas_like`.
    """
    crs_obj = rasterio.crs.CRS.from_user_input(data_like.crs)
    gdf = _sources_gdf(sources)
    if len(gdf) == 0:
        return GeoTensor(
            np.zeros(data_like.shape[-2:], dtype=np.uint8),
            transform=data_like.transform,
            crs=crs_obj,
            fill_value_default=0,
        )

    if buffer_m > 0:
        gdf = _apply_buffer(gdf, crs_obj, buffer_m)
        all_touched = False
    else:
        gdf = gdf.to_crs(crs_obj)
        all_touched = True

    return cast(
        GeoTensor,
        rasterize_geopandas_like(
            gdf, data_like=data_like, column="value",
            fill=0, all_touched=all_touched,
        ),
    )

config.py

Lightweight credentials and configuration handler for the Carbon Mapper Data Platform API.

Credentials can be supplied in three ways (checked in priority order):

  1. Environment variables β€” set CARBONMAPPER_TOKEN (access token), CARBONMAPPER_EMAIL and CARBONMAPPER_PASSWORD (login credentials).
  2. Config file β€” a JSON file at one of the well-known paths listed in :data:CONFIG_SEARCH_PATHS, or a custom path passed to :meth:CarbonMapperConfig.load. The canonical location matches the sibling readers (emit.py / S2_SAFE_reader.py): ~/.georeader/auth_carbonmapper.json.
  3. Explicit arguments β€” pass token= directly to API functions in download.py.

If no config file exists when :meth:CarbonMapperConfig.load is called without an explicit path and no env-var credentials are set, a placeholder ~/.georeader/auth_carbonmapper.json is auto-created with stub values so users have a clear edit target.

Quick start

from georeader.readers.carbonmapper.config import CarbonMapperConfig cfg = CarbonMapperConfig.load() token = cfg.get_token() # resolves from env var or file

β€” or β€” store credentials in the default config file:

cfg.email = "user@example.com" cfg.password = "s3cret" cfg.save() # writes to ~/.georeader/auth_carbonmapper.json

References

  • API docs : https://api.carbonmapper.org/api/v1/docs
  • Registration : https://data.carbonmapper.org

CarbonMapperConfig

Simple credentials and configuration container for the Carbon Mapper API.

Attributes

token: A pre-obtained JWT bearer token. If set, it takes precedence over email / password when :meth:get_token is called. email: Registered Carbon Mapper account e-mail address. password: Account password. Stored only in memory or in the config file on disk β€” never sent anywhere except the token endpoint. extra: Any additional key/value pairs loaded from or saved to the config file (for forward compatibility).

Examples

Load from environment or disk and retrieve a usable token:

cfg = CarbonMapperConfig.load() token = cfg.get_token() # may return None if no credentials found if token: ... data = get_plumes_annotated(plume_gas="CH4", token=token)

Persist credentials to the default config file:

cfg = CarbonMapperConfig(email="user@example.com", password="s3cret") cfg.save()

Reset (delete) the stored config file:

CarbonMapperConfig.reset()

Source code in georeader/readers/carbonmapper/config.py
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
class CarbonMapperConfig:
    """Simple credentials and configuration container for the Carbon Mapper API.

    Attributes
    ----------
    token:
        A pre-obtained JWT bearer token.  If set, it takes precedence over
        *email* / *password* when :meth:`get_token` is called.
    email:
        Registered Carbon Mapper account e-mail address.
    password:
        Account password.  Stored only in memory or in the config file on
        disk β€” never sent anywhere except the token endpoint.
    extra:
        Any additional key/value pairs loaded from or saved to the config
        file (for forward compatibility).

    Examples
    --------
    Load from environment or disk and retrieve a usable token:

    >>> cfg = CarbonMapperConfig.load()
    >>> token = cfg.get_token()  # may return None if no credentials found
    >>> if token:
    ...     data = get_plumes_annotated(plume_gas="CH4", token=token)

    Persist credentials to the default config file:

    >>> cfg = CarbonMapperConfig(email="user@example.com", password="s3cret")
    >>> cfg.save()

    Reset (delete) the stored config file:

    >>> CarbonMapperConfig.reset()
    """

    def __init__(
        self,
        *,
        token: str | None = None,
        email: str | None = None,
        password: str | None = None,
        **extra: Any,
    ) -> None:
        self.token = token
        self.email = email
        self.password = password
        self.extra: dict[str, Any] = extra

    # ------------------------------------------------------------------ #
    # Class-level factory / persistence methods                            #
    # ------------------------------------------------------------------ #

    @classmethod
    def from_env(cls) -> "CarbonMapperConfig":
        """Build a :class:`CarbonMapperConfig` purely from environment variables.

        Reads :envvar:`CARBONMAPPER_TOKEN`, :envvar:`CARBONMAPPER_EMAIL`,
        and :envvar:`CARBONMAPPER_PASSWORD`.  Fields that are absent from
        the environment are left as ``None``.

        Returns
        -------
        CarbonMapperConfig
            A new config object populated from the environment.

        Examples
        --------
        >>> import os
        >>> os.environ["CARBONMAPPER_TOKEN"] = "eyJ..."
        >>> cfg = CarbonMapperConfig.from_env()
        >>> cfg.token
        'eyJ...'
        """
        return cls(
            token=os.environ.get(_ENV_TOKEN),
            email=os.environ.get(_ENV_EMAIL),
            password=os.environ.get(_ENV_PASSWORD),
        )

    @classmethod
    def from_file(cls, path: Path | str) -> "CarbonMapperConfig":
        """Load a :class:`CarbonMapperConfig` from a specific JSON file.

        Parameters
        ----------
        path:
            Path to a JSON config file containing any combination of the
            keys ``"token"``, ``"email"``, ``"password"``, plus any extra
            fields.

        Returns
        -------
        CarbonMapperConfig
            Config populated from the file.

        Raises
        ------
        FileNotFoundError
            If *path* does not exist.
        json.JSONDecodeError
            If the file cannot be parsed as JSON.

        Examples
        --------
        >>> cfg = CarbonMapperConfig.from_file("~/.georeader/auth_carbonmapper.json")
        """
        path = Path(path).expanduser().resolve()
        with path.open() as fh:
            data: dict[str, Any] = json.load(fh)
        token = data.pop("token", None)
        email = data.pop("email", None) or data.pop("username", None)
        password = data.pop("password", None)
        # Filter stub values β€” if the user hasn't yet edited a freshly
        # auto-created placeholder, treat the fields as un-set rather
        # than letting ``"SET-EMAIL"`` flow into has_credentials() as
        # if it were a real value.
        if email == _PLACEHOLDER_EMAIL:
            email = None
        if password == _PLACEHOLDER_PASSWORD:
            password = None
        return cls(token=token, email=email, password=password, **data)

    @classmethod
    def load(
        cls,
        path: Path | str | None = None,
        *,
        create_placeholder: bool = True,
    ) -> "CarbonMapperConfig":
        """Load config using the standard resolution order.

        Resolution order
        ~~~~~~~~~~~~~~~~
        1. If *path* is given, load that file.
        2. Otherwise search :data:`CONFIG_SEARCH_PATHS` for the first file
           that exists.
        3. Overlay environment variables β€” env values overwrite file values.
        4. If still nothing is configured (no file found, no env vars set)
           AND ``create_placeholder`` is True, write a stub config to
           :data:`DEFAULT_SAVE_PATH` with ``SET-EMAIL`` / ``SET-PASSWORD``
           placeholders so users have a clear edit target. Matches the
           ``emit.py`` / ``S2_SAFE_reader.py`` behaviour.

        Parameters
        ----------
        path:
            Optional explicit path to a config file.  Skips the search
            when provided.
        create_placeholder:
            When ``True`` (default), auto-create a stub config file at
            :data:`DEFAULT_SAVE_PATH` if no credentials could be
            resolved. Set to ``False`` in tests / non-interactive
            contexts to keep the filesystem untouched.

        Returns
        -------
        CarbonMapperConfig
            The resolved config.  Fields without a value (from file *and*
            env) are ``None``.

        Examples
        --------
        >>> cfg = CarbonMapperConfig.load()
        >>> print(cfg.email)   # None if not configured

        >>> cfg = CarbonMapperConfig.load("~/my_project/.carbonmapper.json")
        """
        cfg: CarbonMapperConfig | None = None
        loaded_from_file = False

        # 1. Explicit path
        if path is not None:
            resolved = Path(path).expanduser().resolve()
            if resolved.exists():
                try:
                    cfg = cls.from_file(resolved)
                    loaded_from_file = True
                    logger.debug("Loaded Carbon Mapper config from %s", resolved)
                except Exception as exc:
                    logger.warning("Failed to load config from %s: %s", resolved, exc)
            else:
                logger.warning("Config path %s does not exist; ignoring.", resolved)

        # 2. Search well-known paths
        if cfg is None:
            for candidate in CONFIG_SEARCH_PATHS:
                resolved_candidate = candidate.expanduser().resolve()
                if resolved_candidate.exists():
                    try:
                        cfg = cls.from_file(resolved_candidate)
                        loaded_from_file = True
                        logger.debug("Loaded Carbon Mapper config from %s", resolved_candidate)
                        break
                    except Exception as exc:
                        logger.warning(
                            "Failed to load config from %s: %s",
                            resolved_candidate,
                            exc,
                        )

        if cfg is None:
            cfg = cls()

        # 3. Overlay environment variables (env takes priority over file)
        env_token = os.environ.get(_ENV_TOKEN)
        env_email = os.environ.get(_ENV_EMAIL)
        env_password = os.environ.get(_ENV_PASSWORD)
        if env_token:
            cfg.token = env_token
        if env_email:
            cfg.email = env_email
        if env_password:
            cfg.password = env_password

        # 4. Placeholder β€” only when caller didn't pass an explicit path,
        #    no config file was found, and env vars didn't supply creds.
        if (
            create_placeholder
            and path is None
            and not loaded_from_file
            and not cfg.has_credentials()
        ):
            _create_placeholder_config()

        return cfg

    # ------------------------------------------------------------------ #
    # Persistence                                                          #
    # ------------------------------------------------------------------ #

    def save(self, path: Path | str | None = None) -> Path:
        """Persist the config to a JSON file.

        Parameters
        ----------
        path:
            Destination file path.  Defaults to
            :data:`DEFAULT_SAVE_PATH`
            (``~/.georeader/auth_carbonmapper.json``), matching the
            sibling-reader convention (emit, S2). User-level location
            outside the working tree so credentials are never
            accidentally committed.

        Returns
        -------
        Path
            The resolved path of the file that was written.

        Examples
        --------
        >>> cfg = CarbonMapperConfig(email="user@example.com", password="s3cret")
        >>> saved_path = cfg.save()
        >>> print(saved_path)
        /home/user/.georeader/auth_carbonmapper.json
        """
        if path is not None:
            dest = Path(path).expanduser().resolve()
        else:
            dest = DEFAULT_SAVE_PATH.expanduser().resolve()
        dest.parent.mkdir(parents=True, exist_ok=True)
        data: dict[str, Any] = {**self.extra}
        if self.token is not None:
            data["token"] = self.token
        if self.email is not None:
            data["email"] = self.email
        if self.password is not None:
            data["password"] = self.password
        dest.write_text(json.dumps(data, indent=2))
        try:
            os.chmod(dest, 0o600)
        except PermissionError:
            logger.warning(
                "Carbon Mapper config saved to %s but restrictive permissions "
                "(0o600) could not be set due to insufficient permissions.",
                dest,
            )
        except OSError as exc:
            logger.warning(
                "Carbon Mapper config saved to %s but setting restrictive "
                "permissions (0o600) failed: %s",
                dest,
                exc,
            )
        logger.info("Carbon Mapper config saved to %s", dest)
        return dest

    @classmethod
    def reset(cls, path: Path | str | None = None) -> None:
        """Delete the stored config file, if it exists.

        Parameters
        ----------
        path:
            Path to the config file to remove.  Defaults to
            :data:`DEFAULT_SAVE_PATH`
            (``~/.georeader/auth_carbonmapper.json``).

        Examples
        --------
        >>> CarbonMapperConfig.reset()  # removes ~/.georeader/auth_carbonmapper.json
        """
        dest = (
            Path(path).expanduser().resolve()
            if path is not None
            else DEFAULT_SAVE_PATH.expanduser().resolve()
        )
        if dest.exists():
            dest.unlink()
            logger.info("Carbon Mapper config removed: %s", dest)
        else:
            logger.debug("No config file to remove at %s", dest)

    # ------------------------------------------------------------------ #
    # Token resolution                                                     #
    # ------------------------------------------------------------------ #

    def get_token(self) -> str | None:
        """Return the best available bearer token.

        If :attr:`token` is set, it is returned directly.  Otherwise
        ``None`` is returned β€” callers that need a fresh token should call
        :meth:`refresh_access_token` or
        :func:`~georeader.readers.carbonmapper.download.obtain_token`
        with :attr:`email` and :attr:`password`.

        Returns
        -------
        str or None
            A JWT bearer token string, or ``None`` if none is configured.

        Examples
        --------
        >>> cfg = CarbonMapperConfig.load()
        >>> token = cfg.get_token()
        >>> if token is None:
        ...     token = cfg.refresh_access_token()
        """
        return self.token

    def refresh_access_token(self) -> str:
        """Obtain a fresh JWT access token using stored email/password.

        Calls :func:`~georeader.readers.carbonmapper.download.obtain_token` with the
        stored :attr:`email` and :attr:`password`, updates :attr:`token`
        in-place, and returns the new access token.

        Returns
        -------
        str
            The new JWT access token.

        Raises
        ------
        ValueError
            If *email* or *password* is not set.
        requests.HTTPError
            If the Carbon Mapper API rejects the credentials.

        Examples
        --------
        >>> cfg = CarbonMapperConfig.load()  # ~/.georeader/auth_carbonmapper.json
        >>> token = cfg.refresh_access_token()
        """
        if not self.email or not self.password:
            raise ValueError(
                "Cannot refresh token: email and password are required. "
                "Provide them via config file, environment variables, or "
                "constructor arguments."
            )
        from georeader.readers.carbonmapper.download import obtain_token

        tokens = obtain_token(self.email, self.password)
        self.token = tokens["access"]
        self.extra["refresh"] = tokens.get("refresh")
        logger.info("Carbon Mapper access token refreshed for %s", self.email)
        return self.token

    def has_credentials(self) -> bool:
        """Return ``True`` if any usable credentials are present.

        A config is considered to have credentials when at least one of the
        following is set: :attr:`token`, or both :attr:`email` *and*
        :attr:`password`.

        Examples
        --------
        >>> cfg = CarbonMapperConfig(email="u@example.com", password="pw")
        >>> cfg.has_credentials()
        True
        >>> CarbonMapperConfig().has_credentials()
        False
        """
        return bool(self.token) or bool(self.email and self.password)

    # ------------------------------------------------------------------ #
    # String representations                                               #
    # ------------------------------------------------------------------ #

    def __repr__(self) -> str:
        return (
            f"CarbonMapperConfig("
            f"email={self.email!r}, "
            f"has_token={self.token is not None}, "
            f"has_password={self.password is not None}"
            f")"
        )

from_env() classmethod

Build a :class:CarbonMapperConfig purely from environment variables.

Reads :envvar:CARBONMAPPER_TOKEN, :envvar:CARBONMAPPER_EMAIL, and :envvar:CARBONMAPPER_PASSWORD. Fields that are absent from the environment are left as None.

Returns

CarbonMapperConfig A new config object populated from the environment.

Examples

import os os.environ["CARBONMAPPER_TOKEN"] = "eyJ..." cfg = CarbonMapperConfig.from_env() cfg.token 'eyJ...'

Source code in georeader/readers/carbonmapper/config.py
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
@classmethod
def from_env(cls) -> "CarbonMapperConfig":
    """Build a :class:`CarbonMapperConfig` purely from environment variables.

    Reads :envvar:`CARBONMAPPER_TOKEN`, :envvar:`CARBONMAPPER_EMAIL`,
    and :envvar:`CARBONMAPPER_PASSWORD`.  Fields that are absent from
    the environment are left as ``None``.

    Returns
    -------
    CarbonMapperConfig
        A new config object populated from the environment.

    Examples
    --------
    >>> import os
    >>> os.environ["CARBONMAPPER_TOKEN"] = "eyJ..."
    >>> cfg = CarbonMapperConfig.from_env()
    >>> cfg.token
    'eyJ...'
    """
    return cls(
        token=os.environ.get(_ENV_TOKEN),
        email=os.environ.get(_ENV_EMAIL),
        password=os.environ.get(_ENV_PASSWORD),
    )

from_file(path) classmethod

Load a :class:CarbonMapperConfig from a specific JSON file.

Parameters

path: Path to a JSON config file containing any combination of the keys "token", "email", "password", plus any extra fields.

Returns

CarbonMapperConfig Config populated from the file.

Raises

FileNotFoundError If path does not exist. json.JSONDecodeError If the file cannot be parsed as JSON.

Examples

cfg = CarbonMapperConfig.from_file("~/.georeader/auth_carbonmapper.json")

Source code in georeader/readers/carbonmapper/config.py
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
@classmethod
def from_file(cls, path: Path | str) -> "CarbonMapperConfig":
    """Load a :class:`CarbonMapperConfig` from a specific JSON file.

    Parameters
    ----------
    path:
        Path to a JSON config file containing any combination of the
        keys ``"token"``, ``"email"``, ``"password"``, plus any extra
        fields.

    Returns
    -------
    CarbonMapperConfig
        Config populated from the file.

    Raises
    ------
    FileNotFoundError
        If *path* does not exist.
    json.JSONDecodeError
        If the file cannot be parsed as JSON.

    Examples
    --------
    >>> cfg = CarbonMapperConfig.from_file("~/.georeader/auth_carbonmapper.json")
    """
    path = Path(path).expanduser().resolve()
    with path.open() as fh:
        data: dict[str, Any] = json.load(fh)
    token = data.pop("token", None)
    email = data.pop("email", None) or data.pop("username", None)
    password = data.pop("password", None)
    # Filter stub values β€” if the user hasn't yet edited a freshly
    # auto-created placeholder, treat the fields as un-set rather
    # than letting ``"SET-EMAIL"`` flow into has_credentials() as
    # if it were a real value.
    if email == _PLACEHOLDER_EMAIL:
        email = None
    if password == _PLACEHOLDER_PASSWORD:
        password = None
    return cls(token=token, email=email, password=password, **data)

get_token()

Return the best available bearer token.

If :attr:token is set, it is returned directly. Otherwise None is returned β€” callers that need a fresh token should call :meth:refresh_access_token or :func:~georeader.readers.carbonmapper.download.obtain_token with :attr:email and :attr:password.

Returns

str or None A JWT bearer token string, or None if none is configured.

Examples

cfg = CarbonMapperConfig.load() token = cfg.get_token() if token is None: ... token = cfg.refresh_access_token()

Source code in georeader/readers/carbonmapper/config.py
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
def get_token(self) -> str | None:
    """Return the best available bearer token.

    If :attr:`token` is set, it is returned directly.  Otherwise
    ``None`` is returned β€” callers that need a fresh token should call
    :meth:`refresh_access_token` or
    :func:`~georeader.readers.carbonmapper.download.obtain_token`
    with :attr:`email` and :attr:`password`.

    Returns
    -------
    str or None
        A JWT bearer token string, or ``None`` if none is configured.

    Examples
    --------
    >>> cfg = CarbonMapperConfig.load()
    >>> token = cfg.get_token()
    >>> if token is None:
    ...     token = cfg.refresh_access_token()
    """
    return self.token

has_credentials()

Return True if any usable credentials are present.

A config is considered to have credentials when at least one of the following is set: :attr:token, or both :attr:email and :attr:password.

Examples

cfg = CarbonMapperConfig(email="u@example.com", password="pw") cfg.has_credentials() True CarbonMapperConfig().has_credentials() False

Source code in georeader/readers/carbonmapper/config.py
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
def has_credentials(self) -> bool:
    """Return ``True`` if any usable credentials are present.

    A config is considered to have credentials when at least one of the
    following is set: :attr:`token`, or both :attr:`email` *and*
    :attr:`password`.

    Examples
    --------
    >>> cfg = CarbonMapperConfig(email="u@example.com", password="pw")
    >>> cfg.has_credentials()
    True
    >>> CarbonMapperConfig().has_credentials()
    False
    """
    return bool(self.token) or bool(self.email and self.password)

load(path=None, *, create_placeholder=True) classmethod

Load config using the standard resolution order.

Resolution order ~~~~~~~~~~~~~~~~ 1. If path is given, load that file. 2. Otherwise search :data:CONFIG_SEARCH_PATHS for the first file that exists. 3. Overlay environment variables β€” env values overwrite file values. 4. If still nothing is configured (no file found, no env vars set) AND create_placeholder is True, write a stub config to :data:DEFAULT_SAVE_PATH with SET-EMAIL / SET-PASSWORD placeholders so users have a clear edit target. Matches the emit.py / S2_SAFE_reader.py behaviour.

Parameters

path: Optional explicit path to a config file. Skips the search when provided. create_placeholder: When True (default), auto-create a stub config file at :data:DEFAULT_SAVE_PATH if no credentials could be resolved. Set to False in tests / non-interactive contexts to keep the filesystem untouched.

Returns

CarbonMapperConfig The resolved config. Fields without a value (from file and env) are None.

Examples

cfg = CarbonMapperConfig.load() print(cfg.email) # None if not configured

cfg = CarbonMapperConfig.load("~/my_project/.carbonmapper.json")

Source code in georeader/readers/carbonmapper/config.py
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
@classmethod
def load(
    cls,
    path: Path | str | None = None,
    *,
    create_placeholder: bool = True,
) -> "CarbonMapperConfig":
    """Load config using the standard resolution order.

    Resolution order
    ~~~~~~~~~~~~~~~~
    1. If *path* is given, load that file.
    2. Otherwise search :data:`CONFIG_SEARCH_PATHS` for the first file
       that exists.
    3. Overlay environment variables β€” env values overwrite file values.
    4. If still nothing is configured (no file found, no env vars set)
       AND ``create_placeholder`` is True, write a stub config to
       :data:`DEFAULT_SAVE_PATH` with ``SET-EMAIL`` / ``SET-PASSWORD``
       placeholders so users have a clear edit target. Matches the
       ``emit.py`` / ``S2_SAFE_reader.py`` behaviour.

    Parameters
    ----------
    path:
        Optional explicit path to a config file.  Skips the search
        when provided.
    create_placeholder:
        When ``True`` (default), auto-create a stub config file at
        :data:`DEFAULT_SAVE_PATH` if no credentials could be
        resolved. Set to ``False`` in tests / non-interactive
        contexts to keep the filesystem untouched.

    Returns
    -------
    CarbonMapperConfig
        The resolved config.  Fields without a value (from file *and*
        env) are ``None``.

    Examples
    --------
    >>> cfg = CarbonMapperConfig.load()
    >>> print(cfg.email)   # None if not configured

    >>> cfg = CarbonMapperConfig.load("~/my_project/.carbonmapper.json")
    """
    cfg: CarbonMapperConfig | None = None
    loaded_from_file = False

    # 1. Explicit path
    if path is not None:
        resolved = Path(path).expanduser().resolve()
        if resolved.exists():
            try:
                cfg = cls.from_file(resolved)
                loaded_from_file = True
                logger.debug("Loaded Carbon Mapper config from %s", resolved)
            except Exception as exc:
                logger.warning("Failed to load config from %s: %s", resolved, exc)
        else:
            logger.warning("Config path %s does not exist; ignoring.", resolved)

    # 2. Search well-known paths
    if cfg is None:
        for candidate in CONFIG_SEARCH_PATHS:
            resolved_candidate = candidate.expanduser().resolve()
            if resolved_candidate.exists():
                try:
                    cfg = cls.from_file(resolved_candidate)
                    loaded_from_file = True
                    logger.debug("Loaded Carbon Mapper config from %s", resolved_candidate)
                    break
                except Exception as exc:
                    logger.warning(
                        "Failed to load config from %s: %s",
                        resolved_candidate,
                        exc,
                    )

    if cfg is None:
        cfg = cls()

    # 3. Overlay environment variables (env takes priority over file)
    env_token = os.environ.get(_ENV_TOKEN)
    env_email = os.environ.get(_ENV_EMAIL)
    env_password = os.environ.get(_ENV_PASSWORD)
    if env_token:
        cfg.token = env_token
    if env_email:
        cfg.email = env_email
    if env_password:
        cfg.password = env_password

    # 4. Placeholder β€” only when caller didn't pass an explicit path,
    #    no config file was found, and env vars didn't supply creds.
    if (
        create_placeholder
        and path is None
        and not loaded_from_file
        and not cfg.has_credentials()
    ):
        _create_placeholder_config()

    return cfg

refresh_access_token()

Obtain a fresh JWT access token using stored email/password.

Calls :func:~georeader.readers.carbonmapper.download.obtain_token with the stored :attr:email and :attr:password, updates :attr:token in-place, and returns the new access token.

Returns

str The new JWT access token.

Raises

ValueError If email or password is not set. requests.HTTPError If the Carbon Mapper API rejects the credentials.

Examples

cfg = CarbonMapperConfig.load() # ~/.georeader/auth_carbonmapper.json token = cfg.refresh_access_token()

Source code in georeader/readers/carbonmapper/config.py
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
def refresh_access_token(self) -> str:
    """Obtain a fresh JWT access token using stored email/password.

    Calls :func:`~georeader.readers.carbonmapper.download.obtain_token` with the
    stored :attr:`email` and :attr:`password`, updates :attr:`token`
    in-place, and returns the new access token.

    Returns
    -------
    str
        The new JWT access token.

    Raises
    ------
    ValueError
        If *email* or *password* is not set.
    requests.HTTPError
        If the Carbon Mapper API rejects the credentials.

    Examples
    --------
    >>> cfg = CarbonMapperConfig.load()  # ~/.georeader/auth_carbonmapper.json
    >>> token = cfg.refresh_access_token()
    """
    if not self.email or not self.password:
        raise ValueError(
            "Cannot refresh token: email and password are required. "
            "Provide them via config file, environment variables, or "
            "constructor arguments."
        )
    from georeader.readers.carbonmapper.download import obtain_token

    tokens = obtain_token(self.email, self.password)
    self.token = tokens["access"]
    self.extra["refresh"] = tokens.get("refresh")
    logger.info("Carbon Mapper access token refreshed for %s", self.email)
    return self.token

reset(path=None) classmethod

Delete the stored config file, if it exists.

Parameters

path: Path to the config file to remove. Defaults to :data:DEFAULT_SAVE_PATH (~/.georeader/auth_carbonmapper.json).

Examples

CarbonMapperConfig.reset() # removes ~/.georeader/auth_carbonmapper.json

Source code in georeader/readers/carbonmapper/config.py
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
@classmethod
def reset(cls, path: Path | str | None = None) -> None:
    """Delete the stored config file, if it exists.

    Parameters
    ----------
    path:
        Path to the config file to remove.  Defaults to
        :data:`DEFAULT_SAVE_PATH`
        (``~/.georeader/auth_carbonmapper.json``).

    Examples
    --------
    >>> CarbonMapperConfig.reset()  # removes ~/.georeader/auth_carbonmapper.json
    """
    dest = (
        Path(path).expanduser().resolve()
        if path is not None
        else DEFAULT_SAVE_PATH.expanduser().resolve()
    )
    if dest.exists():
        dest.unlink()
        logger.info("Carbon Mapper config removed: %s", dest)
    else:
        logger.debug("No config file to remove at %s", dest)

save(path=None)

Persist the config to a JSON file.

Parameters

path: Destination file path. Defaults to :data:DEFAULT_SAVE_PATH (~/.georeader/auth_carbonmapper.json), matching the sibling-reader convention (emit, S2). User-level location outside the working tree so credentials are never accidentally committed.

Returns

Path The resolved path of the file that was written.

Examples

cfg = CarbonMapperConfig(email="user@example.com", password="s3cret") saved_path = cfg.save() print(saved_path) /home/user/.georeader/auth_carbonmapper.json

Source code in georeader/readers/carbonmapper/config.py
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
def save(self, path: Path | str | None = None) -> Path:
    """Persist the config to a JSON file.

    Parameters
    ----------
    path:
        Destination file path.  Defaults to
        :data:`DEFAULT_SAVE_PATH`
        (``~/.georeader/auth_carbonmapper.json``), matching the
        sibling-reader convention (emit, S2). User-level location
        outside the working tree so credentials are never
        accidentally committed.

    Returns
    -------
    Path
        The resolved path of the file that was written.

    Examples
    --------
    >>> cfg = CarbonMapperConfig(email="user@example.com", password="s3cret")
    >>> saved_path = cfg.save()
    >>> print(saved_path)
    /home/user/.georeader/auth_carbonmapper.json
    """
    if path is not None:
        dest = Path(path).expanduser().resolve()
    else:
        dest = DEFAULT_SAVE_PATH.expanduser().resolve()
    dest.parent.mkdir(parents=True, exist_ok=True)
    data: dict[str, Any] = {**self.extra}
    if self.token is not None:
        data["token"] = self.token
    if self.email is not None:
        data["email"] = self.email
    if self.password is not None:
        data["password"] = self.password
    dest.write_text(json.dumps(data, indent=2))
    try:
        os.chmod(dest, 0o600)
    except PermissionError:
        logger.warning(
            "Carbon Mapper config saved to %s but restrictive permissions "
            "(0o600) could not be set due to insufficient permissions.",
            dest,
        )
    except OSError as exc:
        logger.warning(
            "Carbon Mapper config saved to %s but setting restrictive "
            "permissions (0o600) failed: %s",
            dest,
            exc,
        )
    logger.info("Carbon Mapper config saved to %s", dest)
    return dest

download.py

Carbon Mapper Data Platform API client for the marsml pipeline.

Provides typed wrappers around three Carbon Mapper APIs:

1. **REST Catalog API**  β€” plumes, sources, scenes, plume CSV, assets
2. **STAC API**          β€” spatiotemporal search across collections
3. **Asset Download**    β€” GeoTIFF retrievals, RGB imagery, plume PNGs

Authentication

Most read endpoints work without a token, but some (scenes, related plumes, STAC tokens) require a Bearer token. Use :func:obtain_token or :meth:~georeader.readers.carbonmapper.config.CarbonMapperConfig.refresh_access_token to obtain one from credentials in ~/.georeader/auth_carbonmapper.json.

References

  • API Docs : https://api.carbonmapper.org/api/v1/docs
  • STAC Root : https://api.carbonmapper.org/api/v1/stac/
  • Registration : https://data.carbonmapper.org

obtain_token(email, password)

Exchange credentials for a JWT access/refresh token pair.

Parameters

email: Registered Carbon Mapper account e-mail address. password: Account password.

Returns

dict A mapping with at least two keys:

- ``"access"``  β€” short-lived JWT bearer token (use in API calls).
- ``"refresh"`` β€” long-lived refresh token (use with :func:`refresh_token`).

Examples

tokens = obtain_token("user@example.com", "s3cret") access_token = tokens["access"] data = get_plumes_annotated(plume_gas="CH4", limit=5, token=access_token)

Source code in georeader/readers/carbonmapper/download.py
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
def obtain_token(email: str, password: str) -> dict:
    """
    Exchange credentials for a JWT access/refresh token pair.

    Parameters
    ----------
    email:
        Registered Carbon Mapper account e-mail address.
    password:
        Account password.

    Returns
    -------
    dict
        A mapping with at least two keys:

        - ``"access"``  β€” short-lived JWT bearer token (use in API calls).
        - ``"refresh"`` β€” long-lived refresh token (use with :func:`refresh_token`).

    Examples
    --------
    >>> tokens = obtain_token("user@example.com", "s3cret")
    >>> access_token = tokens["access"]
    >>> data = get_plumes_annotated(plume_gas="CH4", limit=5, token=access_token)
    """
    return _post(f"{BASE_URL}/token/pair", {"email": email, "password": password})

refresh_token(refresh)

Refresh an expired access token using a refresh token.

Parameters

refresh: The "refresh" value previously returned by :func:obtain_token.

Returns

dict A mapping with a new "access" token (and optionally a new "refresh" token if the server rotates them).

Examples

tokens = obtain_token("user@example.com", "s3cret") new_tokens = refresh_token(tokens["refresh"]) access_token = new_tokens["access"]

Source code in georeader/readers/carbonmapper/download.py
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
def refresh_token(refresh: str) -> dict:
    """
    Refresh an expired access token using a refresh token.

    Parameters
    ----------
    refresh:
        The ``"refresh"`` value previously returned by :func:`obtain_token`.

    Returns
    -------
    dict
        A mapping with a new ``"access"`` token (and optionally a new
        ``"refresh"`` token if the server rotates them).

    Examples
    --------
    >>> tokens = obtain_token("user@example.com", "s3cret")
    >>> new_tokens = refresh_token(tokens["refresh"])
    >>> access_token = new_tokens["access"]
    """
    return _post(f"{BASE_URL}/token/refresh", {"refresh": refresh})

download_asset(asset_key, dest, token=None)

Download a raster asset (GeoTIFF or PNG) by its storage key.

Parameters

asset_key: The path portion after /catalog/asset/. For example::

    l2b-ch4-mf-v1/2016/10/08/ang20161008t211637/ang20161008t211637_l2b-ch4-mf-v1_cmf.tif

Asset keys are available in STAC item ``assets[name]["href"]``
entries and can be derived from the plume ``plume_tif`` /
``con_tif`` / ``rgb_tif`` URLs.

dest: Local file path where the asset will be written. Parent directories are created automatically. token: Optional Bearer token for authenticated access.

Returns

Path The resolved path of the downloaded file.

Examples

download_asset( ... "l2b-ch4-mf-v1/2016/10/08/ang20161008t211637/ang20161008t211637_l2b-ch4-mf-v1_cmf.tif", ... dest="./retrieval.tif", ... )

.. note:: For plume dicts returned by :func:get_plumes_annotated, prefer :func:download_plume_assets which handles all assets at once.

Source code in georeader/readers/carbonmapper/download.py
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
def download_asset(asset_key: str, dest: Path | str, token: str | None = None) -> Path:
    """
    Download a raster asset (GeoTIFF or PNG) by its storage key.

    Parameters
    ----------
    asset_key:
        The path portion after ``/catalog/asset/``.  For example::

            l2b-ch4-mf-v1/2016/10/08/ang20161008t211637/ang20161008t211637_l2b-ch4-mf-v1_cmf.tif

        Asset keys are available in STAC item ``assets[name]["href"]``
        entries and can be derived from the plume ``plume_tif`` /
        ``con_tif`` / ``rgb_tif`` URLs.
    dest:
        Local file path where the asset will be written.  Parent
        directories are created automatically.
    token:
        Optional Bearer token for authenticated access.

    Returns
    -------
    Path
        The resolved path of the downloaded file.

    Examples
    --------
    >>> download_asset(
    ...     "l2b-ch4-mf-v1/2016/10/08/ang20161008t211637/ang20161008t211637_l2b-ch4-mf-v1_cmf.tif",
    ...     dest="./retrieval.tif",
    ... )

    .. note::
        For plume dicts returned by :func:`get_plumes_annotated`, prefer
        :func:`download_plume_assets` which handles all assets at once.
    """
    dest = Path(dest)
    url = f"{CATALOG_URL}/asset/{asset_key}"
    resp = requests.get(url, headers=_headers(token), timeout=120, stream=True)
    resp.raise_for_status()
    dest.parent.mkdir(parents=True, exist_ok=True)
    with open(dest, "wb") as f:
        for chunk in resp.iter_content(chunk_size=8192):
            f.write(chunk)
    logger.info("Downloaded %s β†’ %s (%d bytes)", asset_key, dest, dest.stat().st_size)
    return dest

download_plume_assets(plume, dest_dir)

Download all available raster assets for a single plume.

Given a plume dict returned by :func:get_plumes_annotated, download every non-null asset URL into dest_dir and return a mapping of asset type to local file path.

Parameters

plume: A single plume dict as returned in get_plumes_annotated()["items"]. The function inspects the "plume_png", "plume_tif", "con_tif", "rgb_png", "rgb_tif", and "plume_rgb_png" keys for download URLs. dest_dir: Directory into which assets are downloaded. Created automatically if it does not already exist.

Returns

dict[str, Path] Mapping of asset type β†’ local :class:~pathlib.Path for each successfully downloaded asset. Assets that are missing (null) or whose download fails are omitted. Example::

    {
        "plume_png": Path("./plumes/emi20240420t101448p07050-A_plume.png"),
        "plume_tif": Path("./plumes/emi20240420t101448p07050-A_plume.tif"),
        "con_tif": Path("./plumes/emi20240420t101448p07050-A_con.tif"),
        "rgb_png": Path("./plumes/emi20240420t101448p07050-A_rgb.png"),
    }

Examples

.. code-block:: python

result = get_plumes_annotated(plume_gas="CH4", limit=1, qualities=["good"])
plume = result["items"][0]
downloaded = download_plume_assets(plume, "./plume_data/")
for asset_type, path in downloaded.items():
    print(asset_type, "β†’", path)
Source code in georeader/readers/carbonmapper/download.py
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
def download_plume_assets(plume: dict, dest_dir: Path | str) -> dict[str, Path]:
    """
    Download all available raster assets for a single plume.

    Given a plume dict returned by :func:`get_plumes_annotated`, download
    every non-null asset URL into *dest_dir* and return a mapping of asset
    type to local file path.

    Parameters
    ----------
    plume:
        A single plume dict as returned in
        ``get_plumes_annotated()["items"]``.  The function inspects the
        ``"plume_png"``, ``"plume_tif"``, ``"con_tif"``, ``"rgb_png"``,
        ``"rgb_tif"``, and ``"plume_rgb_png"`` keys for download URLs.
    dest_dir:
        Directory into which assets are downloaded.  Created automatically
        if it does not already exist.

    Returns
    -------
    dict[str, Path]
        Mapping of asset type β†’ local :class:`~pathlib.Path` for each
        successfully downloaded asset.  Assets that are missing (``null``)
        or whose download fails are omitted.  Example::

            {
                "plume_png": Path("./plumes/emi20240420t101448p07050-A_plume.png"),
                "plume_tif": Path("./plumes/emi20240420t101448p07050-A_plume.tif"),
                "con_tif": Path("./plumes/emi20240420t101448p07050-A_con.tif"),
                "rgb_png": Path("./plumes/emi20240420t101448p07050-A_rgb.png"),
            }

    Examples
    --------
    .. code-block:: python

        result = get_plumes_annotated(plume_gas="CH4", limit=1, qualities=["good"])
        plume = result["items"][0]
        downloaded = download_plume_assets(plume, "./plume_data/")
        for asset_type, path in downloaded.items():
            print(asset_type, "β†’", path)
    """
    dest_dir = Path(dest_dir)
    dest_dir.mkdir(parents=True, exist_ok=True)
    plume_name = plume.get("plume_id", "unknown")
    downloaded: dict[str, Path] = {}

    asset_keys = ["plume_png", "plume_tif", "con_tif", "rgb_png", "rgb_tif", "plume_rgb_png"]
    for key in asset_keys:
        url = plume.get(key)
        if not url:
            continue
        suffix = ".tif" if key.endswith("tif") else ".png"
        short = key.replace("_tif", "").replace("_png", "")
        local = dest_dir / f"{plume_name}_{short}{suffix}"
        try:
            resp = requests.get(url, timeout=120, stream=True)
            resp.raise_for_status()
            with open(local, "wb") as f:
                for chunk in resp.iter_content(8192):
                    f.write(chunk)
            downloaded[key] = local
            logger.info("  %s β†’ %s", key, local)
        except requests.RequestException as exc:
            logger.warning("  %s download failed: %s", key, exc)
    return downloaded

Cross-collection STAC item search.

Searches across one or more STAC collections using spatial and temporal filters and returns matching items as a GeoJSON FeatureCollection.

Parameters

collections: List of STAC collection IDs to search. If None, all collections are searched. Example: ["l2b-ch4-mfa-v3", "l4a-combined-ch4-v3a"]. bbox: Bounding-box spatial filter as (west_lon, south_lat, east_lon, north_lat) in WGS 84. datetime_range: RFC 3339 time interval string, e.g. "2024-01-01T00:00:00Z/2024-06-01T00:00:00Z". limit: Maximum number of items to return. token: Optional Bearer token for authenticated requests.

Returns

dict A GeoJSON FeatureCollection mapping. Key fields:

- ``"type"``     β€” ``"FeatureCollection"``.
- ``"features"`` β€” list of STAC item GeoJSON Features.  Each
  Feature has:

  - ``"id"``         β€” item ID.
  - ``"geometry"``   β€” GeoJSON geometry of the scene footprint.
  - ``"properties"`` β€” item metadata (datetime, collection, etc.).
  - ``"assets"``     β€” dict of named assets, each with an
    ``"href"`` download URL and media type.

- ``"context"``  β€” pagination info (``matched``, ``returned``).

Examples

Search CH4 retrievals in the Permian Basin:

result = stac_search( ... collections=["l2b-ch4-mfa-v3"], ... bbox=(-104.5, 31.0, -101.5, 33.5), ... datetime_range="2024-01-01T00:00:00Z/2024-06-01T00:00:00Z", ... limit=5, ... ) for feat in result["features"]: ... print(feat["id"], list(feat["assets"].keys()))

Search across multiple collections simultaneously:

result = stac_search( ... collections=["l4a-combined-ch4-v3a", "l2b-rgb-v3a"], ... bbox=(-104.5, 31.0, -101.5, 33.5), ... limit=3, ... )

Source code in georeader/readers/carbonmapper/download.py
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
def stac_search(
    *,
    collections: list[str] | None = None,
    bbox: tuple[float, float, float, float] | None = None,
    datetime_range: str | None = None,
    ids: list[str] | None = None,
    limit: int = 10,
    token: str | None = None,
) -> dict:
    """
    Cross-collection STAC item search.

    Searches across one or more STAC collections using spatial and
    temporal filters and returns matching items as a GeoJSON
    FeatureCollection.

    Parameters
    ----------
    collections:
        List of STAC collection IDs to search.  If ``None``, all
        collections are searched.  Example:
        ``["l2b-ch4-mfa-v3", "l4a-combined-ch4-v3a"]``.
    bbox:
        Bounding-box spatial filter as
        ``(west_lon, south_lat, east_lon, north_lat)`` in WGS 84.
    datetime_range:
        RFC 3339 time interval string, e.g.
        ``"2024-01-01T00:00:00Z/2024-06-01T00:00:00Z"``.
    limit:
        Maximum number of items to return.
    token:
        Optional Bearer token for authenticated requests.

    Returns
    -------
    dict
        A GeoJSON FeatureCollection mapping.  Key fields:

        - ``"type"``     β€” ``"FeatureCollection"``.
        - ``"features"`` β€” list of STAC item GeoJSON Features.  Each
          Feature has:

          - ``"id"``         β€” item ID.
          - ``"geometry"``   β€” GeoJSON geometry of the scene footprint.
          - ``"properties"`` β€” item metadata (datetime, collection, etc.).
          - ``"assets"``     β€” dict of named assets, each with an
            ``"href"`` download URL and media type.

        - ``"context"``  β€” pagination info (``matched``, ``returned``).

    Examples
    --------
    Search CH4 retrievals in the Permian Basin:

    >>> result = stac_search(
    ...     collections=["l2b-ch4-mfa-v3"],
    ...     bbox=(-104.5, 31.0, -101.5, 33.5),
    ...     datetime_range="2024-01-01T00:00:00Z/2024-06-01T00:00:00Z",
    ...     limit=5,
    ... )
    >>> for feat in result["features"]:
    ...     print(feat["id"], list(feat["assets"].keys()))

    Search across multiple collections simultaneously:

    >>> result = stac_search(
    ...     collections=["l4a-combined-ch4-v3a", "l2b-rgb-v3a"],
    ...     bbox=(-104.5, 31.0, -101.5, 33.5),
    ...     limit=3,
    ... )
    """
    params: dict[str, Any] = {"limit": limit}
    if collections:
        params["collections"] = ",".join(collections)
    if ids:
        params["ids"] = ",".join(ids)
    params.update(_stac_bbox_param(bbox))
    if datetime_range:
        params["datetime"] = datetime_range
    return cast(dict, _get(f"{STAC_URL}/search", params=params, token=token))

stac_get_items(collection_id, *, limit=10, bbox=None, datetime_range=None, token=None)

Get items from a STAC collection (OGC API Features compliant).

Parameters

collection_id: Identifier of the STAC collection to query, e.g. "l4a-combined-ch4-v3a" or "l2b-rgb-v3a". limit: Maximum number of items to return. bbox: Bounding-box spatial filter as (west_lon, south_lat, east_lon, north_lat) in WGS 84. datetime_range: RFC 3339 time interval string. token: Optional Bearer token for authenticated requests.

Returns

dict A GeoJSON FeatureCollection. Each Feature has "assets" containing download links for GeoTIFFs, PNGs, and other raster products, with "href" and media-type annotations.

Examples

items = stac_get_items("l4a-combined-ch4-v3a", limit=5) for feat in items["features"]: ... print(feat["id"], feat.get("properties", {}).get("datetime"))

Source code in georeader/readers/carbonmapper/download.py
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
def stac_get_items(
    collection_id: str,
    *,
    limit: int = 10,
    bbox: tuple[float, float, float, float] | None = None,
    datetime_range: str | None = None,
    token: str | None = None,
) -> dict:
    """
    Get items from a STAC collection (OGC API Features compliant).

    Parameters
    ----------
    collection_id:
        Identifier of the STAC collection to query, e.g.
        ``"l4a-combined-ch4-v3a"`` or ``"l2b-rgb-v3a"``.
    limit:
        Maximum number of items to return.
    bbox:
        Bounding-box spatial filter as
        ``(west_lon, south_lat, east_lon, north_lat)`` in WGS 84.
    datetime_range:
        RFC 3339 time interval string.
    token:
        Optional Bearer token for authenticated requests.

    Returns
    -------
    dict
        A GeoJSON FeatureCollection.  Each Feature has ``"assets"``
        containing download links for GeoTIFFs, PNGs, and other raster
        products, with ``"href"`` and media-type annotations.

    Examples
    --------
    >>> items = stac_get_items("l4a-combined-ch4-v3a", limit=5)
    >>> for feat in items["features"]:
    ...     print(feat["id"], feat.get("properties", {}).get("datetime"))
    """
    params: dict[str, Any] = {"limit": limit}
    params.update(_stac_bbox_param(bbox))
    if datetime_range:
        params["datetime"] = datetime_range
    return cast(dict, _get(f"{STAC_URL}/collections/{collection_id}/items", params=params, token=token))