Skip to content

Carbon Mapper

The Carbon Mapper reader is a typed client for Carbon Mapper's two API surfaces (REST catalog + STAC) that hides three real-world inconveniences:

  1. Two protocols, two bbox conventions β€” the REST catalog wants repeated ?bbox=W&bbox=S&... keys, STAC wants ?bbox=W,S,E,N. Mix them up and the server 422s with no useful error.
  2. Three resource types with hand-rolled joins β€” a plume is a single detection, a tile (a.k.a. scene) is the L2B raster it was detected in, and a source is the DBSCAN cluster of plumes at one physical site. The API exposes them via different endpoints with no FK-style links; this layer does the joins.
  3. Inconsistent error shapes β€” 404s look different per resource. We translate them to a small typed exception hierarchy so callers can except CMPlumeNotFound rather than string-match requests.HTTPError.

The catalog ships methane and COβ‚‚ retrievals from Tanager-1, EMIT, AVIRIS-3, AVIRIS-NG, and GAO. Plume detection is operational on all platforms; published L2B scenes lag plume publication by weeks-to-months for Tanager (see Publication lag).

> CH4 only in this notebook. The reader is gas-agnostic > internally, but query helpers are typed Literal["CH4"] for now; > CO2 lands in a follow-up.

This notebook walks the layers bottom-up:

Layer Module Use when
Raw HTTP download.py You need a field this layer doesn't expose, or you're prototyping a new endpoint wrapper.
Typed query api_queries.py Default. Returns CMRawPlume / CMTileItem / CMSource, never raw dicts.
Cross-resolution api_queries.get_*_for_* / get_plume_context One call β†’ (plume, tile, source) β€” the typical ingestion shape.
Per-plume image image.py / CMPlumeImage Per-plume product bundle (mask, concentrations, IME, RGB, outline). Handles v3a (STAC) + v3c (CDN-only) via URL-pattern derivation.
L2B scene raster rasters.py / CMImageRaster Scene-level CMF retrieval and RGB sibling.

Companion: products_explore.ipynb covers the raster wrappers in depth.

Install

The Carbon Mapper reader is gated behind the [carbonmapper] extra to keep georeader-spaceml's base install minimal. Install with:

pip install 'georeader-spaceml[carbonmapper]'

This pulls in pydantic (for CMRawPlume) and requests (for the HTTP client). No Azure or other cloud-vendor SDKs are required.

Authentication

Every cell below hits the live API and needs a Bearer token. CarbonMapperConfig.load() resolves credentials in this priority order:

  1. CARBONMAPPER_TOKEN environment variable β€” one-shot, no refresh.
  2. CARBONMAPPER_EMAIL + CARBONMAPPER_PASSWORD environment variables β€” refreshable via obtain_token.
  3. Config file at the canonical location (matches sibling readers like emit.py / S2_SAFE_reader.py):
  4. ~/.georeader/auth_carbonmapper.json ← canonical

Legacy fallbacks (still honoured if present): - ./config/carbonmapper_token.json - ~/.config/carbonmapper/config.json - ~/.carbonmapper.json - ./.carbonmapper.json

On first run, if no config file is found and no env vars are set, a stub ~/.georeader/auth_carbonmapper.json is created with placeholder values for you to edit.

Sign up for a developer account at api.carbonmapper.org β€” the free tier covers all the calls in this notebook.

Publication lag

Carbon Mapper's plume catalog and STAC catalog publish on different cadences. As of late 2025:

Asset Latency from acquisition
Plume (L4A, in /catalog/plumes/...) hours to days
Tile / scene (L2B, in /stac/collections/l2b-ch4-mfa-v3a/...) weeks to months (Tanager)

Practical consequences:

  • api_queries.list_plumes(...) returns plumes whose parent L2B scene is not yet in STAC. Don't expect every plume to round- trip through get_tile_for_plume β€” it returns None when the parent is unpublished, and get_tile() raises CMSceneNotPublished.
  • For ingestion pipelines, treat CMSceneNotPublished as defer-and-retry, not an error.
  • The plume's geometry, emission_auto, and wind are authoritative without the L2B raster β€” you only need the L2B for visualisation / re-quantification / model retraining.

Setup

from datetime import datetime, timezone

from georeader.readers.carbonmapper import (
    CMAPIError,
    CMPlumeNotFound,
    CMSceneNotPublished,
    CMSource,
    CMSourceNotFound,
    CMTileItem,
    CarbonMapperConfig,
    api_queries,
    download,
)

# --- 429-resilient HTTP -----------------------------------------------
# CarbonMapper rate-limits per account. Β§Β§ 5–6 below fire dozens of
# catalog probes back-to-back and can trip the per-minute cap. Mount
# a retry-aware adapter on a shared Session and re-bind the `requests`
# module shortcuts to route through it β€” so every HTTP call in this
# kernel (including the bare `requests.get(...)` calls in Β§Β§ 5–6 and
# the ones inside georeader's `.download` helpers) gets automatic 429
# backoff honouring `Retry-After`, plus exponential backoff for 5xx.
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

_cm_session = requests.Session()
_cm_session.mount("https://", HTTPAdapter(max_retries=Retry(
    total=8,
    backoff_factor=2.0,           # 2, 4, 8, 16, 32, 64 s
    status_forcelist=(429, 500, 502, 503, 504),
    respect_retry_after_header=True,
    allowed_methods=frozenset(["GET", "POST"]),
)))
requests.get = _cm_session.get
requests.post = _cm_session.post
requests.request = _cm_session.request

# Resolve a Bearer token from env / config file (see "Authentication").
TOKEN = CarbonMapperConfig.load().refresh_access_token()

# Protagonist plume β€” Tanager-1 over the Permian basin, 2025-12-12.
PLUME_ID = "tan20251212t185057c20s4001-E"
SCENE_ID = PLUME_ID.rsplit("-", 1)[0]
PERMIAN_BBOX = (-104.5, 32.0, -103.5, 32.8)  # (W, S, E, N)
print(f"plume = {PLUME_ID}")
print(f"scene = {SCENE_ID}")
plume = tan20251212t185057c20s4001-E
scene = tan20251212t185057c20s4001

Domain model

Three resource types, with these relationships:

Relationships
Parent Cardinality Child Meaning
SOURCE 1 β€” N PLUME DBSCAN clusters detections at one physical site
TILE 1 β€” N PLUME L2B scene contains the detected plumes
Entity properties
Entity Field Type Notes
PLUME plume_id string tan20251212t185057...-E
emission_auto float kg/h
geometry polygon β€”
wind_u_v float from CM forecast
SOURCE source_name string {gas}_{sector}_{m}m_{lon}_{lat}
plume_count int across all scenes
emission_auto float site-aggregate (kg/h)
TILE scene_id string plume_id.rsplit('-',1)[0]
platform string tan / emi / ang / av3 / gao
acquired datetime L2B GeoTIFF, may lag publication
  • Plume β€” one detection. Carries emission_auto (kg/h), wind, geometry, and a plume_id that encodes the source-instrument prefix and acquisition timestamp (e.g. tan20251212t185057c20s4001-E).
  • Tile (or scene) β€” the L2B GeoTIFF the plume was extracted from. One tile contains 0..N plumes. scene_id is plume_id.rsplit('-', 1)[0].
  • Source β€” DBSCAN cluster of plumes at the same physical site. One source contains 1..N plumes across many scenes / dates. Identified by the deterministic key {gas}_{sector}_{footprint_m}m_{lon}_{lat}.

A plume always has a parent scene (encoded in the id), but the parent L2B item may not be published in STAC yet (see Publication lag). A plume may not yet be clustered into a source if it's the first detection at that site.

1 Β· Typed models

Two frozen dataclasses you'll see flowing through every other call. Worth a minute up front so you know what you're getting back.

1.1 CMSource β€” DBSCAN-clustered point source

Carbon Mapper aggregates plumes detected at the same physical location into a source β€” a deterministic point-source record addressed by {gas}_{sector}_{footprint_m}m_{lon}_{lat}. The /catalog/sources.geojson endpoint returns features whose source_name carries a stray ?plume_gas=... query suffix that must be stripped before using the value as a key into other endpoints (the suffix is an accidental bleed from the geojson endpoint's filtering query string β€” Carbon Mapper plans to fix it upstream, but the strip is defensive in the meantime). CMSource.from_geojson_feature does the strip unconditionally so downstream code can treat source_name as canonical.

feature = {
    "properties": {
        "source_name": "CH4_1B2_100m_-104.17525_32.49125?plume_gas=CH4&bbox=...",
        "sector": "1B2",
        "gas": "CH4",
        "plume_count": 12,
        "persistence": 0.42,
        "emission_auto": 250.0,
        "emission_uncertainty_auto": 35.0,
    },
    "geometry": {"type": "Point", "coordinates": [-104.17525, 32.49125]},
}
src = CMSource.from_geojson_feature(feature)
print(src.source_name)                        # suffix stripped
print((src.point.x, src.point.y))             # (-104.17525, 32.49125)
print(f"{src.plume_count} plumes Β· sector {src.sector} Β· gas {src.gas}")
CH4_1B2_100m_-104.17525_32.49125
(-104.17525, 32.49125)
12 plumes Β· sector 1B2 Β· gas CH4

1.2 CMTileItem β€” typed wrapper over a STAC item

Frozen dataclass exposing the fields we use in practice (scene_id, collection, datetime, platform, bbox, geometry, asset_urls). The full properties dict and the raw STAC item stay attached for one-off field access.

stac_item = download.stac_get_item("l2b-ch4-mfa-v3a", SCENE_ID, token=TOKEN)
tile = CMTileItem.from_stac_item(stac_item)
print(f"{tile.scene_id} Β· {tile.platform} Β· {tile.datetime}")
print(f"bbox  : {tile.bbox}")
print(f"assets: {sorted(tile.asset_urls)[:5]}")
tan20251212t185057c20s4001 Β· tan Β· 2025-12-12 18:50:57+00:00
bbox  : (-104.5861937, 31.6662665, -103.9359726, 33.0604707)
assets: ['artifact-mask.tif', 'cmf-unortho.tif', 'cmf.tif', 'uas.txt', 'uncertainty-unortho.tif']

2 Β· download.py β€” raw HTTP wrappers

You usually shouldn't reach here β€” api_queries.py is the supported surface. We expose it because (a) the bbox encoding is non-obvious and worth understanding, and (b) the REST endpoints carry fields the typed layer doesn't yet model (e.g. raw CSV exports). Thin endpoint wrappers β€” same return shape as the upstream JSON, but with bbox-encoding, retries, and Bearer auth handled.

2.1 bbox encoding β€” REST vs STAC

Carbon Mapper's two API surfaces disagree on bbox shape:

  • REST Catalog (/catalog/...) wants repeated keys: ?bbox=W&bbox=S&bbox=E&bbox=N. Comma-joined returns 422.
  • STAC (/stac/...) wants the comma-joined form: ?bbox=W,S,E,N.

_rest_bbox_params returns a list-valued dict (requests serialises lists as repeated keys); _stac_bbox_param returns the comma-joined string.

from georeader.readers.carbonmapper.download import (
    _rest_bbox_params, _stac_bbox_param,
)

print(_rest_bbox_params(PERMIAN_BBOX))
# {'bbox': ['-104.5', '32.0', '-103.5', '32.8']}

print(_stac_bbox_param(PERMIAN_BBOX))
# {'bbox': '-104.5,32.0,-103.5,32.8'}
{'bbox': ['-104.5', '32.0', '-103.5', '32.8']}
{'bbox': '-104.5,32.0,-103.5,32.8'}

2.2 Endpoint wrappers

# stac_get_item β€” one STAC item by collection + scene_id
item = download.stac_get_item("l2b-ch4-mfa-v3a", SCENE_ID, token=TOKEN)
print(f"{item['id']} {item['properties']['datetime']}")
tan20251212t185057c20s4001 2025-12-12T18:50:57Z

# get_source_for_plume_name β€” find the source for our protagonist plume
src_dict = download.get_source_for_plume_name(PLUME_ID, token=TOKEN)
SOURCE_NAME = src_dict["source_name"]
print(SOURCE_NAME)
CH4_1B2_100m_-104.17525_32.49125

# get_source_by_name β€” REST source record (flat dict with properties)
src_dict = download.get_source_by_name(SOURCE_NAME, token=TOKEN)
print(f"plumes: {src_dict.get('plume_count')}  emission: {src_dict.get('emission_auto')} kg/h")
plumes: None  emission: None kg/h

# get_source_plumes_csv β€” every plume attributed to one source as CSV text
csv_text = download.get_source_plumes_csv(SOURCE_NAME, token=TOKEN)
print(csv_text[:200])
plume_id,plume_latitude,plume_longitude,datetime,country,state_province,ipcc_sector,gas,emission_cmf_type,plume_bounds,instrument,mission_phase,published_at,modified,emission_version,processing_softwa

# stac_search β€” accepts ids= for direct STAC item lookup
fc = download.stac_search(
    collections=["l2b-ch4-mfa-v3a"],
    ids=[SCENE_ID],
    limit=5,
    token=TOKEN,
)
print(f"{len(fc['features'])} feature(s) returned")
1 feature(s) returned

3 Β· api_queries.py β€” typed query layer

The default surface for downstream code. Three families:

  1. Single-resource fetchers β€” get_plume, get_tile, get_source. Translate 404s to typed exceptions.
  2. List helpers β€” list_plumes, list_tiles, list_sources. Take a bbox + datetime range + filters.
  3. Cross-resolution β€” given a plume, get its tile / source / full context. Given a source, get all its plumes / tiles. The join logic is hidden so callers don't reinvent scene_id derivation, dedup, etc.

3.1 Single-resource fetchers

plume = api_queries.get_plume(TOKEN, PLUME_ID)
print(f"{plume.plume_id}  gas={plume.gas}  emission_auto={plume.emission_auto} kg/h")
print(f"scene_id={plume.scene_id}")
tan20251212t185057c20s4001-E  gas=CH4  emission_auto=1007.6564374669618 kg/h
scene_id=tan20251212t185057c20s4001

tile = api_queries.get_tile(TOKEN, SCENE_ID)  # default collection l2b-ch4-mfa-v3a
print(f"{tile.scene_id} Β· {tile.platform}")
print(f"bbox: {tile.bbox}")
tan20251212t185057c20s4001 Β· tan
bbox: (-104.5861937, 31.6662665, -103.9359726, 33.0604707)

source = api_queries.get_source(TOKEN, SOURCE_NAME)
print(f"{source.source_name}")
print(f"plumes: {source.plume_count}  sector: {source.sector}  emission: {source.emission_auto} kg/h")
CH4_1B2_100m_-104.17525_32.49125
plumes: 0  sector:   emission: None kg/h

3.2 Typed exceptions

404s are translated to typed CMAPIError subclasses β€” easy to catch one resource type without swallowing real failures. All three inherit from CMAPIError so a single except CMAPIError catches every documented failure mode; anything else is an HTTP / network error and should be allowed to propagate.

try:
    # Well-formed but non-existent plume id (the API 422s on malformed input).
    api_queries.get_plume(TOKEN, "tan29991231t000000c00s4001-Z")
except CMPlumeNotFound as exc:
    print(f"caught CMPlumeNotFound: {exc}")

try:
    api_queries.get_source(TOKEN, "CH4_1B2_100m_0_0")
except CMSourceNotFound as exc:
    print(f"caught CMSourceNotFound: {exc}")

try:
    # A scene whose L2B item is not yet published in STAC.
    api_queries.get_tile(TOKEN, "tan29991231t000000c00s4001")
except CMSceneNotPublished as exc:
    print(f"caught CMSceneNotPublished: {exc}")

# All three inherit from CMAPIError if you want a single catch.
print(f"isinstance(CMPlumeNotFound(...), CMAPIError) -> "
      f"{isinstance(CMPlumeNotFound('x'), CMAPIError)}")
caught CMPlumeNotFound: Plume not found: tan29991231t000000c00s4001-Z

caught CMSourceNotFound: Source not found: CH4_1B2_100m_0_0
caught CMSceneNotPublished: L2B scene not published: tan29991231t000000c00s4001
isinstance(CMPlumeNotFound(...), CMAPIError) -> True

3.3 List helpers

dt_min = datetime(2025, 12, 1, tzinfo=timezone.utc)
dt_max = datetime(2025, 12, 31, tzinfo=timezone.utc)

plumes = api_queries.list_plumes(
    TOKEN,
    bbox=PERMIAN_BBOX,
    datetime_min=dt_min,
    datetime_max=dt_max,
    gas="CH4",
)
print(f"{len(plumes)} plumes")
for p in plumes[:3]:
    print(f"  {p.plume_id}  emission_auto={p.emission_auto}")
22 plumes
  tan20251212t185057c20s4001-C  emission_auto=876.1038259991867
  tan20251212t185057c20s4001-D  emission_auto=432.5738771282001
  tan20251212t185057c20s4001-E  emission_auto=1007.6564374669618

# NB: STAC search caps page size at 100 β€” pass an explicit limit until
# pagination lands.
tiles = api_queries.list_tiles(
    TOKEN,
    bbox=PERMIAN_BBOX,
    datetime_min=dt_min,
    datetime_max=dt_max,
    limit=50,
)
print(f"{len(tiles)} tiles")
for t in tiles[:3]:
    print(f"  {t.scene_id} {t.platform} {t.datetime}")
4 tiles
  tan20251212t185057c20s4001 tan 2025-12-12 18:50:57+00:00
  tan20251210t183649c71s4001 tan 2025-12-10 18:36:49+00:00
  tan20251210t183749c38s4001 tan 2025-12-10 18:37:49+00:00

sources = api_queries.list_sources(TOKEN, bbox=PERMIAN_BBOX, gas="CH4")
print(f"{len(sources)} sources")
for s in sources[:3]:
    print(f"  {s.source_name}  plumes={s.plume_count}  emission={s.emission_auto}")
10593 sources
  CH4_6A_500m_-117.26768_34.59375  plumes=3  emission=23.0636206375622
  CH4_6A_500m_-118.51707_34.32769  plumes=485  emission=1070.9700550816046
  CH4_6A_500m_-119.38080_36.39176  plumes=34  emission=174.91747754329324

3.4 Cross-resolution helpers

The headline value-add β€” these do the join work the upstream API leaves to the caller (scene-id derivation, source lookup-by-plume, dedup of scene_ids per source, etc.).

# plume β†’ tile (returns None when the L2B scene isn't published yet)
tile = api_queries.get_tile_for_plume(TOKEN, PLUME_ID)
print(f"tile  : {tile and tile.scene_id}")
tile  : tan20251212t185057c20s4001

# plume β†’ source (returns None when CM hasn't clustered this plume yet)
source = api_queries.get_source_for_plume(TOKEN, PLUME_ID)
print(f"source: {source and source.source_name}")
source: CH4_1B2_100m_-104.17525_32.49125

# One call β†’ (plume, tile|None, source|None) β€” the typical ingestion shape.
plume, tile, source = api_queries.get_plume_context(TOKEN, PLUME_ID)
print(f"plume  : {plume.plume_id}")
print(f"  tile  : {tile and tile.scene_id}")
print(f"  source: {source and source.source_name}")
plume  : tan20251212t185057c20s4001-E
  tile  : tan20251212t185057c20s4001
  source: CH4_1B2_100m_-104.17525_32.49125

# tile β†’ all plumes in that scene
plumes = api_queries.list_plumes_for_tile(TOKEN, SCENE_ID)
print(f"{len(plumes)} plumes in scene {SCENE_ID}")
0 plumes in scene tan20251212t185057c20s4001

# source β†’ every plume attributed to it (parsed from CSV)
plumes = api_queries.list_plumes_for_source(TOKEN, SOURCE_NAME)
print(f"{len(plumes)} plumes for source {SOURCE_NAME}")

# source β†’ distinct parent tiles (dedups scene_ids before STAC ids= search)
tiles = api_queries.list_tiles_for_source(TOKEN, SOURCE_NAME)
print(f"{len(tiles)} unique parent tiles")
1 plumes for source CH4_1B2_100m_-104.17525_32.49125

1 unique parent tiles

4 Β· End-to-end mini-workflow

Tie it together: take a bbox + date range, list plumes, expand each into full context, count how many have a published L2B parent and how many are clustered into a source.

plumes = api_queries.list_plumes(
    TOKEN, bbox=PERMIAN_BBOX, datetime_min=dt_min, datetime_max=dt_max, gas="CH4",
)

n_with_tile = n_with_source = 0
for p in plumes[:25]:  # cap to keep the request count modest
    _, tile, source = api_queries.get_plume_context(TOKEN, p.plume_id)
    n_with_tile += tile is not None
    n_with_source += source is not None

print(f"checked       : {min(len(plumes), 25)}")
print(f"L2B published : {n_with_tile}")
print(f"in a source   : {n_with_source}")
checked       : 22
L2B published : 22
in a source   : 22

The 22 / 22 / 22 saturation reads as: in this Permian month, every plume detection has both (a) a parent L2B published in STAC and (b) a clustered source. That's typical for archives older than ~3 months β€” the publication lag has caught up. Run the same query over the most recent 30 days and you'll see N / 0 / 0 (plumes exist, scenes pending) β€” which is what your ingestion pipeline needs to handle gracefully via CMSceneNotPublished.

5 Β· Plume catalog stats β€” what's in the live catalog right now

Cells below hit the live API at notebook-execution time, so numbers will drift between runs. Each cell calls one or more /catalog/plumes/annotated?limit=1 requests and reads the total_count field β€” no large data transfer. Total runtime β‰ˆ 25s.

If the API is rate-limiting at the moment you re-run, drop the section: the prose narrative stands on its own.

> CH4-only β€” plume_gas="CH4" is implicit on every call below.

5.1 Headline counts

Total CH4 plumes in the catalog, split by instrument. The instrument filter is case-sensitive upstream β€” tan / emi / ang / av3 are lowercase, GAO is uppercase. Other obvious filter names are silently ignored: use plume_gas not gas, and instrument not platform.

import pandas as pd
import requests

BASE = "https://api.carbonmapper.org/api/v1"
H = {"Authorization": f"Bearer {TOKEN}"}


def plume_count(**filters) -> int | None:
    """`total_count` for the plume catalog under the given filters."""
    r = requests.get(
        f"{BASE}/catalog/plumes/annotated",
        headers=H, params={"limit": 1, "plume_gas": "CH4", **filters},
        timeout=30,
    )
    return r.json().get("total_count")


total = plume_count()
# Instrument codes are case-sensitive upstream β€” `gao` returns None,
# `GAO` works. Other codes are lowercase. Worth a comment because it
# bites everyone once.
by_inst = {code: plume_count(instrument=code) for code in
           ("tan", "emi", "ang", "av3", "GAO")}

print(f"TOTAL CH4: {total:,} plumes\n")
print("By instrument")
for k, v in by_inst.items():
    print(f"  {k:5s} {v:>8,}")
TOTAL CH4: 32,642 plumes

By instrument
  tan     11,970
  emi      4,297
  ang      4,693
  av3      2,343
  GAO      9,339

5.2 IPCC sector distribution

Carbon Mapper attributes most plumes to an IPCC sector code. Anything below 1B2 (oil & gas) is dwarfed by it β€” Tanager's operational targeting bias toward upstream O&G shows up clearly.

sector_codes = ["1A1", "1B1a", "1B2", "3A", "4B", "6A", "6B"]
by_sector = {s: plume_count(sectors=s) for s in sector_codes}

df_sector = pd.DataFrame({
    "sector": list(by_sector),
    "name": ["Energy generation", "Coal mining", "Oil & gas",
             "Enteric fermentation", "Livestock", "Solid waste",
             "Waste water"],
    "plumes": list(by_sector.values()),
}).sort_values("plumes", ascending=False, na_position="last")
df_sector["share"] = df_sector["plumes"] / df_sector["plumes"].sum()
df_sector
sector name plumes share
2 1B2 Oil & gas 18657.0 0.580564
5 6A Solid waste 8552.0 0.266119
1 1B1a Coal mining 3574.0 0.111215
4 4B Livestock 1051.0 0.032705
0 1A1 Energy generation 252.0 0.007842
6 6B Waste water 50.0 0.001556
3 3A Enteric fermentation NaN NaN

5.3 Monthly activity β€” last 12 months

Plumes-by-month using ISO datetime intervals. Tanager-1 went operational mid-2024, so the early months are sparse; recent months reflect both the fleet ramp and the publication lag (newer detections are still flowing into the catalog).

from datetime import datetime, timedelta, timezone


def month_count(year: int, month: int) -> int | None:
    start = datetime(year, month, 1, tzinfo=timezone.utc)
    end = (datetime(year + (month == 12), (month % 12) + 1, 1,
                    tzinfo=timezone.utc)
           - timedelta(seconds=1))
    return plume_count(datetime=f"{start.isoformat()}/{end.isoformat()}")


now = datetime.now(timezone.utc)
months = []
y, m = now.year, now.month
for _ in range(12):
    months.append((y, m, month_count(y, m)))
    m -= 1
    if m == 0:
        m, y = 12, y - 1

df_months = pd.DataFrame(months, columns=["year", "month", "plumes"])
df_months["label"] = df_months.apply(
    lambda r: f"{r.year}-{r.month:02d}", axis=1,
)
df_months[["label", "plumes"]].iloc[::-1].reset_index(drop=True)
label plumes
0 2025-06 934
1 2025-07 1073
2 2025-08 932
3 2025-09 935
4 2025-10 892
5 2025-11 1011
6 2025-12 961
7 2026-01 669
8 2026-02 775
9 2026-03 949
10 2026-04 303
11 2026-05 0

5.4 Emission rate distribution

The headline metric per plume is emission_auto in kg/h. Pull one page (1,000 plumes) and summarise β€” the long tail is dramatic: the median CH4 plume is sub-kT/yr, the p99 is in the multi-kT/yr super-emitter regime.

r = requests.get(
    f"{BASE}/catalog/plumes/annotated",
    headers=H, params={"limit": 1000, "plume_gas": "CH4"}, timeout=60,
)
emissions_kgh = pd.Series(
    [item.get("emission_auto") for item in r.json().get("items", [])],
    name="emission_kg_per_h",
).dropna()

# Convert to kt/yr (Γ—24h Γ—365.25d Γ· 1e6 kg/kt) for context.
emissions_kt_yr = emissions_kgh * 24 * 365.25 / 1_000_000

print(f"Sample size: {len(emissions_kgh):,} CH4 plumes\n")
print("kg/h:")
print(emissions_kgh.describe(percentiles=[0.5, 0.75, 0.9, 0.99]).round(1))
print("\nkt/yr equivalent:")
print(emissions_kt_yr.describe(percentiles=[0.5, 0.75, 0.9, 0.99]).round(2))
Sample size: 754 CH4 plumes

kg/h:
count      754.0
mean      1228.9
std       1567.6
min         79.4
50%        802.1
75%       1464.6
90%       2540.7
99%       7265.3
max      20900.0
Name: emission_kg_per_h, dtype: float64

kt/yr equivalent:
count    754.00
mean      10.77
std       13.74
min        0.70
50%        7.03
75%       12.84
90%       22.27
99%       63.69
max      183.21
Name: emission_kg_per_h, dtype: float64

6 Β· STAC inventory β€” what's downloadable

The plume catalog (Β§ 5) is the detection index β€” what was spotted, where, when, how strong. The STAC catalogue is the download index β€” the actual GeoTIFFs and per-plume products you can pull bytes for. There are 86 STAC collections total, but most are superseded versions; only ~9 are actively published as of late 2025.

6.1 Collection counts by level

The l<n> prefix tells you what kind of product:

  • L2B β€” orthorectified scene-level retrievals (cmf, RGB, uncertainty, artifact-mask).
  • L2C β€” per-scene CH4/CO2 composites (less common downstream).
  • L3A β€” per-plume products (the small plume_tif clip + ime retrieval crop).
  • L4A β€” retrieval cubes; flat per-platform listings of L4 outputs.

v3a is the current canonical version family. Older versions (v1, v3, j001, jpl legacy) still exist for archival reads.

import re
from collections import defaultdict

r = requests.get(f"{BASE}/stac/collections", headers=H, timeout=30)
all_collections = r.json()["collections"]

groups: dict[str, list[str]] = defaultdict(list)
for c in all_collections:
    m = re.match(r"(l\d[a-z]?)-", c["id"])
    if m:
        groups[m.group(1)].append(c["id"])

df_levels = pd.DataFrame({
    "level": sorted(groups),
    "collections": [len(groups[k]) for k in sorted(groups)],
})
df_levels.loc[len(df_levels)] = ["TOTAL", len(all_collections)]
df_levels
level collections
0 l2 1
1 l2b 31
2 l2c 3
3 l3a 32
4 l3c 1
5 l4a 18
6 TOTAL 86

6.2 Active v3a collections β€” item counts

For each *-v3a collection, fetch numberMatched via a 1-result STAC search. Empty placeholder collections (e.g. l2b-ch4-mfma-v3a) show 0 β€” the algorithm variant isn't currently published. The pairs l2b-ch4-mfa-v3a / l2b-co2-mfa-v3a have identical item counts because they're the same Tanager scenes processed twice for different gases.

v3a = [c["id"] for c in all_collections
       if (c["id"].endswith("-v3a") or c["id"].endswith("-quick-v3a"))
       and ("ch4" in c["id"] or "rgb" in c["id"])]

rows = []
for cid in v3a:
    info = next(c for c in all_collections if c["id"] == cid)
    extent = info.get("extent", {}).get("temporal", {}) \
                  .get("interval", [[None, None]])[0]
    r = requests.get(
        f"{BASE}/stac/search",
        headers=H, params={"collections": cid, "limit": 1}, timeout=30,
    )
    matched = r.json().get("numberMatched")
    rows.append({
        "collection": cid,
        "items": matched,
        "start": (extent[0] or "")[:10],
        "end": (extent[1] or "")[:10],
    })

df_v3a = pd.DataFrame(rows).sort_values(
    by=["items", "collection"], ascending=[False, True],
).reset_index(drop=True)
df_v3a
collection items start end
0 l2b-ch4-mfa-v3a 1675 2025-07-11 2025-12-16
1 l2b-rgb-v3a 1672 2025-07-11 2025-12-16
2 l3a-vis-ch4-mfa-v3a 1451 2023-10-25 2025-12-16
3 l3a-ime-ch4-mfa-v3a 1450 2023-10-25 2025-12-16
4 l4a-ch4-mfa-v3a 1450 2023-10-25 2025-12-16
5 l2b-ch4-mfma-v3a 0 2025-07-11 2025-12-16
6 l4a-combined-ch4-quick-v3a 0 2025-11-06 2025-12-15
7 l4a-combined-ch4-v3a 0 2024-11-21 2025-12-15

> v3c is the live processing version, but isn't in STAC. The > latest Tanager CH4 plumes (post-2025-12-16) live in > l3a-vis-ch4-mfa-v3c / l3a-ime-ch4-mfa-v3c (and -v3d for the > very newest) β€” but those collections are not exposed via > /stac/collections or any item lookup. They're reachable only > via direct asset URLs derived from /catalog/plume/{id}. Both > wrappers handle this transparently via URL-pattern derivation: > CMPlumeImage for the per-plume L3A bundle (see Β§ 8 below), and > CMImageRaster (via api_queries.get_image_raster_for_plume) > for the parent L2B scene β€” which tries STAC first and falls back > to the same URL-pattern trick when the scene isn't registered.

6.3 Asset shapes β€” what's actually inside an item

Sample one item per active CH4 collection and list the asset keys. This is the canonical map between collection (what you search for) and asset key (what RasterioReader actually opens).

Collection Headline asset Use it for
l2b-ch4-mfa-v3a cmf.tif CH4 column-density retrieval
l2b-rgb-v3a rgb.tif True-colour overlay
l3a-ime-ch4-mfa-v3a ime-cmf-concentrations.tif Per-plume IME retrieval crop
l3a-vis-ch4-mfa-v3a plume.tif (band-4 alpha) + plume-outline.geojson Per-plume mask / polygon
sample_collections = [
    "l2b-ch4-mfa-v3a", "l2b-rgb-v3a",
    "l3a-ime-ch4-mfa-v3a", "l3a-vis-ch4-mfa-v3a",
]

asset_rows = []
for cid in sample_collections:
    r = requests.get(
        f"{BASE}/stac/search",
        headers=H, params={"collections": cid, "limit": 1}, timeout=30,
    )
    feats = r.json().get("features", [])
    if not feats:
        asset_rows.append({"collection": cid, "assets": "(empty)"})
        continue
    keys = sorted((feats[0].get("assets") or {}).keys())
    asset_rows.append({"collection": cid, "assets": ", ".join(keys)})

pd.DataFrame(asset_rows)
collection assets
0 l2b-ch4-mfa-v3a artifact-mask.tif, cmf-unortho.tif, cmf.tif, u...
1 l2b-rgb-v3a rgb.tif
2 l3a-ime-ch4-mfa-v3a ime-cmf-concentrations.png, ime-cmf-concentrat...
3 l3a-vis-ch4-mfa-v3a plume-concentrations.tif, plume-outline.geojso...

7 Β· Reachable products reference

Static reference tables β€” the canonical map of what's reachable from the API, by resource type. Numbers in Β§Β§ 5–6 are live; the tables here are documentation and don't drift with each notebook run.

7.1 Plume-level products

Every detection ships with a small bundle of per-plume products keyed off plume_id. Source paths: most assets live on the /catalog/plume/{id} REST response (URLs); a handful additionally appear as STAC item assets under l3a-*-ch4-mfa-v3a collections.

Asset key Format What it is Where to find it
plume_tif RGBA GeoTIFF Per-plume binary mask β€” band 4 is the alpha channel /catalog/plume/{id}.plume_tif and l3a-vis-ch4-mfa-v3a STAC item assets
plume_png PNG Plume mask viz /catalog/plume/{id}.plume_png
plume_rgb_png PNG Plume mask overlaid on RGB /catalog/plume/{id}.plume_rgb_png
con_tif GeoTIFF Per-plume CH4 retrieval crop (column density) l3a-ime-ch4-mfa-v3a STAC item assets, asset key ime-cmf-concentrations.tif
rgb_png / rgb.png PNG Per-plume RGB context tile /catalog/plume/{id}.rgb_png and l3a-vis-ch4-mfa-v3a
ime_outline_geojson / plume-outline.geojson GeoJSON Plume polygon β€” preferred over band-4 mask extraction l3a-vis-ch4-mfa-v3a STAC item assets
plumes.csv CSV All plumes attributed to one source /catalog/source/{source_name}/plumes.csv

The georeader wrapper CMPlumeImage exposes the GeoTIFFs (plume_tif, plume-concentrations.tif, ime-cmf-concentrations.tif, rgb.tif) and the canonical outline GeoJSON. PNG-only assets aren't wrapped (no native georeferencing).

7.2 STAC collections β€” current CH4 (v3a)

Carbon Mapper's "active" Tanager-1 CH4 product family. Everything older / superseded still resolves under /stac/collections (86 total), but new ingestion should target the v3a family below.

Collection Level Items Temporal Description
l2b-ch4-mfa-v3a L2B 1,675 2025-07-11 β†’ 2025-12-16 CH4 retrieval scene β€” assets: cmf.tif, cmf-unortho.tif, uncertainty.tif, uncertainty-unortho.tif, artifact-mask.tif, uas.txt
l2b-rgb-v3a L2B 1,672 same True-colour sibling β€” rgb.tif. 3 short of the cmf collection (still being published)
l3a-ime-ch4-mfa-v3a L3A 1,450 2023-10-25 β†’ 2025-12-16 Per-plume CH4 IME retrieval crop β€” ime-cmf-concentrations.tif, ime-cmf-mask.tif, ime-cmf-outline.geojson
l3a-vis-ch4-mfa-v3a L3A 1,451 same Per-plume CH4 visualisation β€” plume.tif (band-4 alpha mask), plume-outline.geojson, plume-rgb.png, plume-concentrations.tif
l4a-ch4-mfa-v3a L4A 1,450 same CH4 retrieval cube β€” collection-level metadata, no per-item assets

Empty *-v3a placeholders (l2b-ch4-mfma-v3a, l4a-combined-ch4-{quick-,}v3a) exist in the catalog but are not currently published. CO2 collections trimmed from this table β€” this PR is CH4-only.

7.3 Source-level products

Sources are DBSCAN clusters of plumes at the same physical site. The full source list is small enough (~12 K rows) to fetch in one shot β€” the endpoint returns the entire FeatureCollection in ~1.5 s. CH4 sources: 10,569; CO2 sources: 2,140; total: 12,709 (as of probe time).

Endpoints

Endpoint Returns Notes
/catalog/sources.geojson FeatureCollection of CMSource Strip the ?plume_gas=... suffix from source_name before keying β€” see Β§ 1.1
/catalog/source/{source_name} Single source dict (flat REST)
/catalog/source/{source_name}/plumes.csv CSV of every plume attributed to the source One row per plume, full metadata
/catalog/source/by-plume/{plume_id} Single source dict Resolve plume β†’ source without scanning the GeoJSON

Properties on each source feature

Field Type Description
source_name str Deterministic key {gas}_{sector}_{footprint_m}m_{lon}_{lat}
gas str CH4 or CO2
sector str IPCC sector code (e.g. 1B2, 6A)
plume_count int Plumes in the cluster
plume_ids list[str] All plume_ids attributed to this source
observation_scenes_names list[str] Scenes that contributed
persistence float Cluster temporal stability (0–1)
emission_auto float Site-aggregate emission (kg/h)
emission_uncertainty_auto float
published_at_min / _max datetime First/last publication of any constituent plume
timestamp_min / _max datetime First/last acquisition time
detection_date_count / observation_date_count / date_count int Distinct-day counts (detection vs. all observations)

The georeader wrapper CMSource exposes the headline fields as a frozen dataclass; the full properties dict is stashed on CMSource.raw for one-off access.

8 Β· CMPlumeImage β€” per-plume product bundle

The headline of this PR. One :class:CMPlumeImage is the cropped raster suite for one CH4 plume β€” binary mask, full column-density crop, IME-clipped retrieval, RGB context, plus the canonical outline polygon.

Highlights:

  • Five lazy properties β€” mask, concentrations, ime_concentrations, rgb, outline. Each opens its asset on first access, cached after.
  • Three constructors β€” from_plume_id (one HTTP, handles v3a and v3c), from_cmrawplume (zero HTTP if you have the typed plume), from_stac_item (driving STAC search directly; v3a only).
  • Outline canonical β€” fetches plume-outline.geojson via the derived URL; falls back to band-4 alpha vectorize on fetch failure (with a warning).
  • v3a + v3c handled transparently β€” URL-pattern derivation rewrites the host to the Bearer-aware api gateway, then builds every asset URL from a single seed (plume_tif).
from georeader.readers.carbonmapper import CMPlumeImage

# 1. Build from a plume_id β€” one HTTP round-trip
img = CMPlumeImage.from_plume_id(PLUME_ID, token=TOKEN)
print(img)
CMPlumeImage
  plume_id:       tan20251212t185057c20s4001-E
  assets present: ['plume.tif', 'plume-concentrations.tif', 'plume-outline.geojson', 'rgb.tif', 'ime-cmf-concentrations.tif', 'ime-cmf-mask.tif', 'ime-cmf-outline.geojson']
  overview_level: full

8.1 Lazy properties

Each property opens its asset on first access. No I/O happens during from_plume_id beyond the catalog metadata fetch.

from georeader.rasterio_reader import RasterioReader


def describe(name, reader):
    if reader is None:
        return f"{name:22s} (absent)"
    return f"{name:22s} {type(reader).__name__}  shape={reader.shape}"


print(describe("mask:",               img.mask))
print(describe("concentrations:",     img.concentrations))
print(describe("ime_concentrations:", img.ime_concentrations))
print(describe("rgb:",                img.rgb))
mask:                  RasterioReader  shape=(4, 101, 100)

concentrations:        RasterioReader  shape=(1, 85, 56)

ime_concentrations:    RasterioReader  shape=(1, 28, 28)

rgb:                   RasterioReader  shape=(3, 101, 100)

8.2 Outline (canonical GeoJSON)

outline returns a shapely geometry in EPSG:4326. The canonical source is plume-outline.geojson (fetched from the v3a STAC asset, or the URL-pattern equivalent for v3c). If that fetch fails, the property falls back to vectorizing the band-4 alpha of mask and logs a warning.

outline = img.outline
print(f"type:   {type(outline).__name__}")
print(f"area:   {outline.area:.6f}  (degreesΒ², EPSG:4326)")
print(f"bounds: {tuple(round(b, 4) for b in outline.bounds)}")
type:   Polygon
area:   0.000063  (degreesΒ², EPSG:4326)
bounds: (-104.177, 32.4778, -104.1687, 32.4927)

See also