Carbon Mapper¶
The Carbon Mapper reader is a typed client for Carbon Mapper's two API surfaces (REST catalog + STAC) that hides three real-world inconveniences:
- Two protocols, two bbox conventions β the REST catalog wants
repeated
?bbox=W&bbox=S&...keys, STAC wants?bbox=W,S,E,N. Mix them up and the server 422s with no useful error. - Three resource types with hand-rolled joins β a plume is a single detection, a tile (a.k.a. scene) is the L2B raster it was detected in, and a source is the DBSCAN cluster of plumes at one physical site. The API exposes them via different endpoints with no FK-style links; this layer does the joins.
- Inconsistent error shapes β 404s look different per
resource. We translate them to a small typed exception
hierarchy so callers can
except CMPlumeNotFoundrather than string-matchrequests.HTTPError.
The catalog ships methane and COβ retrievals from Tanager-1, EMIT, AVIRIS-3, AVIRIS-NG, and GAO. Plume detection is operational on all platforms; published L2B scenes lag plume publication by weeks-to-months for Tanager (see Publication lag).
> CH4 only in this notebook. The reader is gas-agnostic
> internally, but query helpers are typed Literal["CH4"] for now;
> CO2 lands in a follow-up.
This notebook walks the layers bottom-up:
| Layer | Module | Use when |
|---|---|---|
| Raw HTTP | download.py |
You need a field this layer doesn't expose, or you're prototyping a new endpoint wrapper. |
| Typed query | api_queries.py |
Default. Returns CMRawPlume / CMTileItem / CMSource, never raw dicts. |
| Cross-resolution | api_queries.get_*_for_* / get_plume_context |
One call β (plume, tile, source) β the typical ingestion shape. |
| Per-plume image | image.py / CMPlumeImage |
Per-plume product bundle (mask, concentrations, IME, RGB, outline). Handles v3a (STAC) + v3c (CDN-only) via URL-pattern derivation. |
| L2B scene raster | rasters.py / CMImageRaster |
Scene-level CMF retrieval and RGB sibling. |
Companion: products_explore.ipynb covers
the raster wrappers in depth.
Install¶
The Carbon Mapper reader is gated behind the [carbonmapper] extra
to keep georeader-spaceml's base install minimal. Install with:
pip install 'georeader-spaceml[carbonmapper]'
This pulls in pydantic (for CMRawPlume) and requests (for the
HTTP client). No Azure or other cloud-vendor SDKs are required.
Authentication¶
Every cell below hits the live API and needs a Bearer token.
CarbonMapperConfig.load() resolves credentials in this priority
order:
CARBONMAPPER_TOKENenvironment variable β one-shot, no refresh.CARBONMAPPER_EMAIL+CARBONMAPPER_PASSWORDenvironment variables β refreshable viaobtain_token.- Config file at the canonical location (matches sibling
readers like
emit.py/S2_SAFE_reader.py): ~/.georeader/auth_carbonmapper.jsonβ canonical
Legacy fallbacks (still honoured if present):
- ./config/carbonmapper_token.json
- ~/.config/carbonmapper/config.json
- ~/.carbonmapper.json
- ./.carbonmapper.json
On first run, if no config file is found and no env vars are
set, a stub ~/.georeader/auth_carbonmapper.json is created
with placeholder values for you to edit.
Sign up for a developer account at api.carbonmapper.org β the free tier covers all the calls in this notebook.
Publication lag¶
Carbon Mapper's plume catalog and STAC catalog publish on different cadences. As of late 2025:
| Asset | Latency from acquisition |
|---|---|
Plume (L4A, in /catalog/plumes/...) |
hours to days |
Tile / scene (L2B, in /stac/collections/l2b-ch4-mfa-v3a/...) |
weeks to months (Tanager) |
Practical consequences:
api_queries.list_plumes(...)returns plumes whose parent L2B scene is not yet in STAC. Don't expect every plume to round- trip throughget_tile_for_plumeβ it returnsNonewhen the parent is unpublished, andget_tile()raisesCMSceneNotPublished.- For ingestion pipelines, treat
CMSceneNotPublishedas defer-and-retry, not an error. - The plume's
geometry,emission_auto, and wind are authoritative without the L2B raster β you only need the L2B for visualisation / re-quantification / model retraining.
Setup¶
from datetime import datetime, timezone
from georeader.readers.carbonmapper import (
CMAPIError,
CMPlumeNotFound,
CMSceneNotPublished,
CMSource,
CMSourceNotFound,
CMTileItem,
CarbonMapperConfig,
api_queries,
download,
)
# --- 429-resilient HTTP -----------------------------------------------
# CarbonMapper rate-limits per account. Β§Β§ 5β6 below fire dozens of
# catalog probes back-to-back and can trip the per-minute cap. Mount
# a retry-aware adapter on a shared Session and re-bind the `requests`
# module shortcuts to route through it β so every HTTP call in this
# kernel (including the bare `requests.get(...)` calls in Β§Β§ 5β6 and
# the ones inside georeader's `.download` helpers) gets automatic 429
# backoff honouring `Retry-After`, plus exponential backoff for 5xx.
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
_cm_session = requests.Session()
_cm_session.mount("https://", HTTPAdapter(max_retries=Retry(
total=8,
backoff_factor=2.0, # 2, 4, 8, 16, 32, 64 s
status_forcelist=(429, 500, 502, 503, 504),
respect_retry_after_header=True,
allowed_methods=frozenset(["GET", "POST"]),
)))
requests.get = _cm_session.get
requests.post = _cm_session.post
requests.request = _cm_session.request
# Resolve a Bearer token from env / config file (see "Authentication").
TOKEN = CarbonMapperConfig.load().refresh_access_token()
# Protagonist plume β Tanager-1 over the Permian basin, 2025-12-12.
PLUME_ID = "tan20251212t185057c20s4001-E"
SCENE_ID = PLUME_ID.rsplit("-", 1)[0]
PERMIAN_BBOX = (-104.5, 32.0, -103.5, 32.8) # (W, S, E, N)
print(f"plume = {PLUME_ID}")
print(f"scene = {SCENE_ID}")
Domain model¶
Three resource types, with these relationships:
| Parent | Cardinality | Child | Meaning |
|---|---|---|---|
| SOURCE | 1 β N | PLUME | DBSCAN clusters detections at one physical site |
| TILE | 1 β N | PLUME | L2B scene contains the detected plumes |
| Entity | Field | Type | Notes |
|---|---|---|---|
| PLUME | plume_id |
string | tan20251212t185057...-E |
emission_auto |
float | kg/h | |
geometry |
polygon | β | |
wind_u_v |
float | from CM forecast | |
| SOURCE | source_name |
string | {gas}_{sector}_{m}m_{lon}_{lat} |
plume_count |
int | across all scenes | |
emission_auto |
float | site-aggregate (kg/h) | |
| TILE | scene_id |
string | plume_id.rsplit('-',1)[0] |
platform |
string | tan / emi / ang / av3 / gao |
|
acquired |
datetime | L2B GeoTIFF, may lag publication |
- Plume β one detection. Carries
emission_auto(kg/h), wind, geometry, and aplume_idthat encodes the source-instrument prefix and acquisition timestamp (e.g.tan20251212t185057c20s4001-E). - Tile (or scene) β the L2B GeoTIFF the plume was
extracted from. One tile contains 0..N plumes.
scene_idisplume_id.rsplit('-', 1)[0]. - Source β DBSCAN cluster of plumes at the same physical
site. One source contains 1..N plumes across many scenes / dates.
Identified by the deterministic key
{gas}_{sector}_{footprint_m}m_{lon}_{lat}.
A plume always has a parent scene (encoded in the id), but the parent L2B item may not be published in STAC yet (see Publication lag). A plume may not yet be clustered into a source if it's the first detection at that site.
1 Β· Typed models¶
Two frozen dataclasses you'll see flowing through every other call. Worth a minute up front so you know what you're getting back.
1.1 CMSource β DBSCAN-clustered point source¶
Carbon Mapper aggregates plumes detected at the same physical
location into a source β a deterministic point-source record
addressed by {gas}_{sector}_{footprint_m}m_{lon}_{lat}. The
/catalog/sources.geojson endpoint returns features whose
source_name carries a stray ?plume_gas=... query suffix that
must be stripped before using the value as a key into other
endpoints (the suffix is an accidental bleed from the geojson
endpoint's filtering query string β Carbon Mapper plans to fix
it upstream, but the strip is defensive in the meantime).
CMSource.from_geojson_feature does the strip unconditionally so
downstream code can treat source_name as canonical.
feature = {
"properties": {
"source_name": "CH4_1B2_100m_-104.17525_32.49125?plume_gas=CH4&bbox=...",
"sector": "1B2",
"gas": "CH4",
"plume_count": 12,
"persistence": 0.42,
"emission_auto": 250.0,
"emission_uncertainty_auto": 35.0,
},
"geometry": {"type": "Point", "coordinates": [-104.17525, 32.49125]},
}
src = CMSource.from_geojson_feature(feature)
print(src.source_name) # suffix stripped
print((src.point.x, src.point.y)) # (-104.17525, 32.49125)
print(f"{src.plume_count} plumes Β· sector {src.sector} Β· gas {src.gas}")
1.2 CMTileItem β typed wrapper over a STAC item¶
Frozen dataclass exposing the fields we use in practice
(scene_id, collection, datetime, platform, bbox,
geometry, asset_urls). The full properties dict and the
raw STAC item stay attached for one-off field access.
stac_item = download.stac_get_item("l2b-ch4-mfa-v3a", SCENE_ID, token=TOKEN)
tile = CMTileItem.from_stac_item(stac_item)
print(f"{tile.scene_id} Β· {tile.platform} Β· {tile.datetime}")
print(f"bbox : {tile.bbox}")
print(f"assets: {sorted(tile.asset_urls)[:5]}")
2 Β· download.py β raw HTTP wrappers¶
You usually shouldn't reach here β api_queries.py is the
supported surface. We expose it because (a) the bbox encoding is
non-obvious and worth understanding, and (b) the REST endpoints
carry fields the typed layer doesn't yet model (e.g. raw CSV
exports). Thin endpoint wrappers β same return shape as the
upstream JSON, but with bbox-encoding, retries, and Bearer auth
handled.
2.1 bbox encoding β REST vs STAC¶
Carbon Mapper's two API surfaces disagree on bbox shape:
- REST Catalog (
/catalog/...) wants repeated keys:?bbox=W&bbox=S&bbox=E&bbox=N. Comma-joined returns 422. - STAC (
/stac/...) wants the comma-joined form:?bbox=W,S,E,N.
_rest_bbox_params returns a list-valued dict (requests
serialises lists as repeated keys); _stac_bbox_param returns the
comma-joined string.
from georeader.readers.carbonmapper.download import (
_rest_bbox_params, _stac_bbox_param,
)
print(_rest_bbox_params(PERMIAN_BBOX))
# {'bbox': ['-104.5', '32.0', '-103.5', '32.8']}
print(_stac_bbox_param(PERMIAN_BBOX))
# {'bbox': '-104.5,32.0,-103.5,32.8'}
2.2 Endpoint wrappers¶
# stac_get_item β one STAC item by collection + scene_id
item = download.stac_get_item("l2b-ch4-mfa-v3a", SCENE_ID, token=TOKEN)
print(f"{item['id']} {item['properties']['datetime']}")
# get_source_for_plume_name β find the source for our protagonist plume
src_dict = download.get_source_for_plume_name(PLUME_ID, token=TOKEN)
SOURCE_NAME = src_dict["source_name"]
print(SOURCE_NAME)
# get_source_by_name β REST source record (flat dict with properties)
src_dict = download.get_source_by_name(SOURCE_NAME, token=TOKEN)
print(f"plumes: {src_dict.get('plume_count')} emission: {src_dict.get('emission_auto')} kg/h")
# get_source_plumes_csv β every plume attributed to one source as CSV text
csv_text = download.get_source_plumes_csv(SOURCE_NAME, token=TOKEN)
print(csv_text[:200])
# stac_search β accepts ids= for direct STAC item lookup
fc = download.stac_search(
collections=["l2b-ch4-mfa-v3a"],
ids=[SCENE_ID],
limit=5,
token=TOKEN,
)
print(f"{len(fc['features'])} feature(s) returned")
3 Β· api_queries.py β typed query layer¶
The default surface for downstream code. Three families:
- Single-resource fetchers β
get_plume,get_tile,get_source. Translate 404s to typed exceptions. - List helpers β
list_plumes,list_tiles,list_sources. Take a bbox + datetime range + filters. - Cross-resolution β given a plume, get its tile / source /
full context. Given a source, get all its plumes / tiles. The
join logic is hidden so callers don't reinvent
scene_idderivation, dedup, etc.
3.1 Single-resource fetchers¶
plume = api_queries.get_plume(TOKEN, PLUME_ID)
print(f"{plume.plume_id} gas={plume.gas} emission_auto={plume.emission_auto} kg/h")
print(f"scene_id={plume.scene_id}")
tile = api_queries.get_tile(TOKEN, SCENE_ID) # default collection l2b-ch4-mfa-v3a
print(f"{tile.scene_id} Β· {tile.platform}")
print(f"bbox: {tile.bbox}")
source = api_queries.get_source(TOKEN, SOURCE_NAME)
print(f"{source.source_name}")
print(f"plumes: {source.plume_count} sector: {source.sector} emission: {source.emission_auto} kg/h")
3.2 Typed exceptions¶
404s are translated to typed CMAPIError subclasses β easy to
catch one resource type without swallowing real failures. All three
inherit from CMAPIError so a single except CMAPIError catches
every documented failure mode; anything else is an HTTP / network
error and should be allowed to propagate.
try:
# Well-formed but non-existent plume id (the API 422s on malformed input).
api_queries.get_plume(TOKEN, "tan29991231t000000c00s4001-Z")
except CMPlumeNotFound as exc:
print(f"caught CMPlumeNotFound: {exc}")
try:
api_queries.get_source(TOKEN, "CH4_1B2_100m_0_0")
except CMSourceNotFound as exc:
print(f"caught CMSourceNotFound: {exc}")
try:
# A scene whose L2B item is not yet published in STAC.
api_queries.get_tile(TOKEN, "tan29991231t000000c00s4001")
except CMSceneNotPublished as exc:
print(f"caught CMSceneNotPublished: {exc}")
# All three inherit from CMAPIError if you want a single catch.
print(f"isinstance(CMPlumeNotFound(...), CMAPIError) -> "
f"{isinstance(CMPlumeNotFound('x'), CMAPIError)}")
3.3 List helpers¶
dt_min = datetime(2025, 12, 1, tzinfo=timezone.utc)
dt_max = datetime(2025, 12, 31, tzinfo=timezone.utc)
plumes = api_queries.list_plumes(
TOKEN,
bbox=PERMIAN_BBOX,
datetime_min=dt_min,
datetime_max=dt_max,
gas="CH4",
)
print(f"{len(plumes)} plumes")
for p in plumes[:3]:
print(f" {p.plume_id} emission_auto={p.emission_auto}")
# NB: STAC search caps page size at 100 β pass an explicit limit until
# pagination lands.
tiles = api_queries.list_tiles(
TOKEN,
bbox=PERMIAN_BBOX,
datetime_min=dt_min,
datetime_max=dt_max,
limit=50,
)
print(f"{len(tiles)} tiles")
for t in tiles[:3]:
print(f" {t.scene_id} {t.platform} {t.datetime}")
sources = api_queries.list_sources(TOKEN, bbox=PERMIAN_BBOX, gas="CH4")
print(f"{len(sources)} sources")
for s in sources[:3]:
print(f" {s.source_name} plumes={s.plume_count} emission={s.emission_auto}")
3.4 Cross-resolution helpers¶
The headline value-add β these do the join work the upstream API
leaves to the caller (scene-id derivation, source lookup-by-plume,
dedup of scene_ids per source, etc.).
# plume β tile (returns None when the L2B scene isn't published yet)
tile = api_queries.get_tile_for_plume(TOKEN, PLUME_ID)
print(f"tile : {tile and tile.scene_id}")
# plume β source (returns None when CM hasn't clustered this plume yet)
source = api_queries.get_source_for_plume(TOKEN, PLUME_ID)
print(f"source: {source and source.source_name}")
# One call β (plume, tile|None, source|None) β the typical ingestion shape.
plume, tile, source = api_queries.get_plume_context(TOKEN, PLUME_ID)
print(f"plume : {plume.plume_id}")
print(f" tile : {tile and tile.scene_id}")
print(f" source: {source and source.source_name}")
# tile β all plumes in that scene
plumes = api_queries.list_plumes_for_tile(TOKEN, SCENE_ID)
print(f"{len(plumes)} plumes in scene {SCENE_ID}")
# source β every plume attributed to it (parsed from CSV)
plumes = api_queries.list_plumes_for_source(TOKEN, SOURCE_NAME)
print(f"{len(plumes)} plumes for source {SOURCE_NAME}")
# source β distinct parent tiles (dedups scene_ids before STAC ids= search)
tiles = api_queries.list_tiles_for_source(TOKEN, SOURCE_NAME)
print(f"{len(tiles)} unique parent tiles")
4 Β· End-to-end mini-workflow¶
Tie it together: take a bbox + date range, list plumes, expand each into full context, count how many have a published L2B parent and how many are clustered into a source.
plumes = api_queries.list_plumes(
TOKEN, bbox=PERMIAN_BBOX, datetime_min=dt_min, datetime_max=dt_max, gas="CH4",
)
n_with_tile = n_with_source = 0
for p in plumes[:25]: # cap to keep the request count modest
_, tile, source = api_queries.get_plume_context(TOKEN, p.plume_id)
n_with_tile += tile is not None
n_with_source += source is not None
print(f"checked : {min(len(plumes), 25)}")
print(f"L2B published : {n_with_tile}")
print(f"in a source : {n_with_source}")
The 22 / 22 / 22 saturation reads as: in this Permian month,
every plume detection has both (a) a parent L2B published in STAC
and (b) a clustered source. That's typical for archives older than
~3 months β the publication lag has caught up. Run the same
query over the most recent 30 days and you'll see N / 0 / 0
(plumes exist, scenes pending) β which is what your ingestion
pipeline needs to handle gracefully via CMSceneNotPublished.
5 Β· Plume catalog stats β what's in the live catalog right now¶
Cells below hit the live API at notebook-execution time, so numbers
will drift between runs. Each cell calls one or more
/catalog/plumes/annotated?limit=1 requests and reads the
total_count field β no large data transfer. Total runtime β 25s.
If the API is rate-limiting at the moment you re-run, drop the section: the prose narrative stands on its own.
> CH4-only β plume_gas="CH4" is implicit on every call below.
5.1 Headline counts¶
Total CH4 plumes in the catalog, split by instrument. The
instrument filter is case-sensitive upstream β tan / emi /
ang / av3 are lowercase, GAO is uppercase. Other obvious
filter names are silently ignored: use plume_gas not gas, and
instrument not platform.
import pandas as pd
import requests
BASE = "https://api.carbonmapper.org/api/v1"
H = {"Authorization": f"Bearer {TOKEN}"}
def plume_count(**filters) -> int | None:
"""`total_count` for the plume catalog under the given filters."""
r = requests.get(
f"{BASE}/catalog/plumes/annotated",
headers=H, params={"limit": 1, "plume_gas": "CH4", **filters},
timeout=30,
)
return r.json().get("total_count")
total = plume_count()
# Instrument codes are case-sensitive upstream β `gao` returns None,
# `GAO` works. Other codes are lowercase. Worth a comment because it
# bites everyone once.
by_inst = {code: plume_count(instrument=code) for code in
("tan", "emi", "ang", "av3", "GAO")}
print(f"TOTAL CH4: {total:,} plumes\n")
print("By instrument")
for k, v in by_inst.items():
print(f" {k:5s} {v:>8,}")
5.2 IPCC sector distribution¶
Carbon Mapper attributes most plumes to an IPCC sector code.
Anything below 1B2 (oil & gas) is dwarfed by it β Tanager's
operational targeting bias toward upstream O&G shows up clearly.
sector_codes = ["1A1", "1B1a", "1B2", "3A", "4B", "6A", "6B"]
by_sector = {s: plume_count(sectors=s) for s in sector_codes}
df_sector = pd.DataFrame({
"sector": list(by_sector),
"name": ["Energy generation", "Coal mining", "Oil & gas",
"Enteric fermentation", "Livestock", "Solid waste",
"Waste water"],
"plumes": list(by_sector.values()),
}).sort_values("plumes", ascending=False, na_position="last")
df_sector["share"] = df_sector["plumes"] / df_sector["plumes"].sum()
df_sector
5.3 Monthly activity β last 12 months¶
Plumes-by-month using ISO datetime intervals. Tanager-1 went operational mid-2024, so the early months are sparse; recent months reflect both the fleet ramp and the publication lag (newer detections are still flowing into the catalog).
from datetime import datetime, timedelta, timezone
def month_count(year: int, month: int) -> int | None:
start = datetime(year, month, 1, tzinfo=timezone.utc)
end = (datetime(year + (month == 12), (month % 12) + 1, 1,
tzinfo=timezone.utc)
- timedelta(seconds=1))
return plume_count(datetime=f"{start.isoformat()}/{end.isoformat()}")
now = datetime.now(timezone.utc)
months = []
y, m = now.year, now.month
for _ in range(12):
months.append((y, m, month_count(y, m)))
m -= 1
if m == 0:
m, y = 12, y - 1
df_months = pd.DataFrame(months, columns=["year", "month", "plumes"])
df_months["label"] = df_months.apply(
lambda r: f"{r.year}-{r.month:02d}", axis=1,
)
df_months[["label", "plumes"]].iloc[::-1].reset_index(drop=True)
5.4 Emission rate distribution¶
The headline metric per plume is emission_auto in kg/h. Pull
one page (1,000 plumes) and summarise β the long tail is dramatic:
the median CH4 plume is sub-kT/yr, the p99 is in the multi-kT/yr
super-emitter regime.
r = requests.get(
f"{BASE}/catalog/plumes/annotated",
headers=H, params={"limit": 1000, "plume_gas": "CH4"}, timeout=60,
)
emissions_kgh = pd.Series(
[item.get("emission_auto") for item in r.json().get("items", [])],
name="emission_kg_per_h",
).dropna()
# Convert to kt/yr (Γ24h Γ365.25d Γ· 1e6 kg/kt) for context.
emissions_kt_yr = emissions_kgh * 24 * 365.25 / 1_000_000
print(f"Sample size: {len(emissions_kgh):,} CH4 plumes\n")
print("kg/h:")
print(emissions_kgh.describe(percentiles=[0.5, 0.75, 0.9, 0.99]).round(1))
print("\nkt/yr equivalent:")
print(emissions_kt_yr.describe(percentiles=[0.5, 0.75, 0.9, 0.99]).round(2))
6 Β· STAC inventory β what's downloadable¶
The plume catalog (Β§ 5) is the detection index β what was spotted, where, when, how strong. The STAC catalogue is the download index β the actual GeoTIFFs and per-plume products you can pull bytes for. There are 86 STAC collections total, but most are superseded versions; only ~9 are actively published as of late 2025.
6.1 Collection counts by level¶
The l<n> prefix tells you what kind of product:
- L2B β orthorectified scene-level retrievals (cmf, RGB, uncertainty, artifact-mask).
- L2C β per-scene CH4/CO2 composites (less common downstream).
- L3A β per-plume products (the small
plume_tifclip +imeretrieval crop). - L4A β retrieval cubes; flat per-platform listings of L4 outputs.
v3a is the current canonical version family. Older versions
(v1, v3, j001, jpl legacy) still exist for archival
reads.
import re
from collections import defaultdict
r = requests.get(f"{BASE}/stac/collections", headers=H, timeout=30)
all_collections = r.json()["collections"]
groups: dict[str, list[str]] = defaultdict(list)
for c in all_collections:
m = re.match(r"(l\d[a-z]?)-", c["id"])
if m:
groups[m.group(1)].append(c["id"])
df_levels = pd.DataFrame({
"level": sorted(groups),
"collections": [len(groups[k]) for k in sorted(groups)],
})
df_levels.loc[len(df_levels)] = ["TOTAL", len(all_collections)]
df_levels
6.2 Active v3a collections β item counts¶
For each *-v3a collection, fetch numberMatched via a
1-result STAC search. Empty placeholder collections (e.g.
l2b-ch4-mfma-v3a) show 0 β the algorithm variant isn't
currently published. The pairs l2b-ch4-mfa-v3a /
l2b-co2-mfa-v3a have identical item counts because they're the
same Tanager scenes processed twice for different gases.
v3a = [c["id"] for c in all_collections
if (c["id"].endswith("-v3a") or c["id"].endswith("-quick-v3a"))
and ("ch4" in c["id"] or "rgb" in c["id"])]
rows = []
for cid in v3a:
info = next(c for c in all_collections if c["id"] == cid)
extent = info.get("extent", {}).get("temporal", {}) \
.get("interval", [[None, None]])[0]
r = requests.get(
f"{BASE}/stac/search",
headers=H, params={"collections": cid, "limit": 1}, timeout=30,
)
matched = r.json().get("numberMatched")
rows.append({
"collection": cid,
"items": matched,
"start": (extent[0] or "")[:10],
"end": (extent[1] or "")[:10],
})
df_v3a = pd.DataFrame(rows).sort_values(
by=["items", "collection"], ascending=[False, True],
).reset_index(drop=True)
df_v3a
> v3c is the live processing version, but isn't in STAC. The
> latest Tanager CH4 plumes (post-2025-12-16) live in
> l3a-vis-ch4-mfa-v3c / l3a-ime-ch4-mfa-v3c (and -v3d for the
> very newest) β but those collections are not exposed via
> /stac/collections or any item lookup. They're reachable only
> via direct asset URLs derived from /catalog/plume/{id}. Both
> wrappers handle this transparently via URL-pattern derivation:
> CMPlumeImage for the per-plume L3A bundle (see Β§ 8 below), and
> CMImageRaster (via api_queries.get_image_raster_for_plume)
> for the parent L2B scene β which tries STAC first and falls back
> to the same URL-pattern trick when the scene isn't registered.
6.3 Asset shapes β what's actually inside an item¶
Sample one item per active CH4 collection and list the asset keys.
This is the canonical map between collection (what you search
for) and asset key (what RasterioReader actually opens).
| Collection | Headline asset | Use it for |
|---|---|---|
l2b-ch4-mfa-v3a |
cmf.tif |
CH4 column-density retrieval |
l2b-rgb-v3a |
rgb.tif |
True-colour overlay |
l3a-ime-ch4-mfa-v3a |
ime-cmf-concentrations.tif |
Per-plume IME retrieval crop |
l3a-vis-ch4-mfa-v3a |
plume.tif (band-4 alpha) + plume-outline.geojson |
Per-plume mask / polygon |
sample_collections = [
"l2b-ch4-mfa-v3a", "l2b-rgb-v3a",
"l3a-ime-ch4-mfa-v3a", "l3a-vis-ch4-mfa-v3a",
]
asset_rows = []
for cid in sample_collections:
r = requests.get(
f"{BASE}/stac/search",
headers=H, params={"collections": cid, "limit": 1}, timeout=30,
)
feats = r.json().get("features", [])
if not feats:
asset_rows.append({"collection": cid, "assets": "(empty)"})
continue
keys = sorted((feats[0].get("assets") or {}).keys())
asset_rows.append({"collection": cid, "assets": ", ".join(keys)})
pd.DataFrame(asset_rows)
7 Β· Reachable products reference¶
Static reference tables β the canonical map of what's reachable from the API, by resource type. Numbers in Β§Β§ 5β6 are live; the tables here are documentation and don't drift with each notebook run.
7.1 Plume-level products¶
Every detection ships with a small bundle of per-plume products
keyed off plume_id. Source paths: most assets live on the
/catalog/plume/{id} REST response (URLs); a handful additionally
appear as STAC item assets under l3a-*-ch4-mfa-v3a collections.
| Asset key | Format | What it is | Where to find it |
|---|---|---|---|
plume_tif |
RGBA GeoTIFF | Per-plume binary mask β band 4 is the alpha channel | /catalog/plume/{id}.plume_tif and l3a-vis-ch4-mfa-v3a STAC item assets |
plume_png |
PNG | Plume mask viz | /catalog/plume/{id}.plume_png |
plume_rgb_png |
PNG | Plume mask overlaid on RGB | /catalog/plume/{id}.plume_rgb_png |
con_tif |
GeoTIFF | Per-plume CH4 retrieval crop (column density) | l3a-ime-ch4-mfa-v3a STAC item assets, asset key ime-cmf-concentrations.tif |
rgb_png / rgb.png |
PNG | Per-plume RGB context tile | /catalog/plume/{id}.rgb_png and l3a-vis-ch4-mfa-v3a |
ime_outline_geojson / plume-outline.geojson |
GeoJSON | Plume polygon β preferred over band-4 mask extraction | l3a-vis-ch4-mfa-v3a STAC item assets |
plumes.csv |
CSV | All plumes attributed to one source | /catalog/source/{source_name}/plumes.csv |
The georeader wrapper CMPlumeImage
exposes the GeoTIFFs (plume_tif, plume-concentrations.tif,
ime-cmf-concentrations.tif, rgb.tif) and the canonical outline
GeoJSON. PNG-only assets aren't wrapped (no native georeferencing).
7.2 STAC collections β current CH4 (v3a)¶
Carbon Mapper's "active" Tanager-1 CH4 product family. Everything
older / superseded still resolves under /stac/collections
(86 total), but new ingestion should target the v3a family below.
| Collection | Level | Items | Temporal | Description |
|---|---|---|---|---|
l2b-ch4-mfa-v3a |
L2B | 1,675 | 2025-07-11 β 2025-12-16 | CH4 retrieval scene β assets: cmf.tif, cmf-unortho.tif, uncertainty.tif, uncertainty-unortho.tif, artifact-mask.tif, uas.txt |
l2b-rgb-v3a |
L2B | 1,672 | same | True-colour sibling β rgb.tif. 3 short of the cmf collection (still being published) |
l3a-ime-ch4-mfa-v3a |
L3A | 1,450 | 2023-10-25 β 2025-12-16 | Per-plume CH4 IME retrieval crop β ime-cmf-concentrations.tif, ime-cmf-mask.tif, ime-cmf-outline.geojson |
l3a-vis-ch4-mfa-v3a |
L3A | 1,451 | same | Per-plume CH4 visualisation β plume.tif (band-4 alpha mask), plume-outline.geojson, plume-rgb.png, plume-concentrations.tif |
l4a-ch4-mfa-v3a |
L4A | 1,450 | same | CH4 retrieval cube β collection-level metadata, no per-item assets |
Empty *-v3a placeholders (l2b-ch4-mfma-v3a,
l4a-combined-ch4-{quick-,}v3a) exist in the catalog but are not
currently published. CO2 collections trimmed from this table β
this PR is CH4-only.
7.3 Source-level products¶
Sources are DBSCAN clusters of plumes at the same physical site. The full source list is small enough (~12 K rows) to fetch in one shot β the endpoint returns the entire FeatureCollection in ~1.5 s. CH4 sources: 10,569; CO2 sources: 2,140; total: 12,709 (as of probe time).
Endpoints¶
| Endpoint | Returns | Notes |
|---|---|---|
/catalog/sources.geojson |
FeatureCollection of CMSource | Strip the ?plume_gas=... suffix from source_name before keying β see Β§ 1.1 |
/catalog/source/{source_name} |
Single source dict (flat REST) | |
/catalog/source/{source_name}/plumes.csv |
CSV of every plume attributed to the source | One row per plume, full metadata |
/catalog/source/by-plume/{plume_id} |
Single source dict | Resolve plume β source without scanning the GeoJSON |
Properties on each source feature¶
| Field | Type | Description |
|---|---|---|
source_name |
str | Deterministic key {gas}_{sector}_{footprint_m}m_{lon}_{lat} |
gas |
str | CH4 or CO2 |
sector |
str | IPCC sector code (e.g. 1B2, 6A) |
plume_count |
int | Plumes in the cluster |
plume_ids |
list[str] | All plume_ids attributed to this source |
observation_scenes_names |
list[str] | Scenes that contributed |
persistence |
float | Cluster temporal stability (0β1) |
emission_auto |
float | Site-aggregate emission (kg/h) |
emission_uncertainty_auto |
float | |
published_at_min / _max |
datetime | First/last publication of any constituent plume |
timestamp_min / _max |
datetime | First/last acquisition time |
detection_date_count / observation_date_count / date_count |
int | Distinct-day counts (detection vs. all observations) |
The georeader wrapper CMSource
exposes the headline fields as a frozen dataclass; the full
properties dict is stashed on CMSource.raw for one-off access.
8 Β· CMPlumeImage β per-plume product bundle¶
The headline of this PR. One :class:CMPlumeImage is the cropped
raster suite for one CH4 plume β binary mask, full column-density
crop, IME-clipped retrieval, RGB context, plus the canonical
outline polygon.
Highlights:
- Five lazy properties β
mask,concentrations,ime_concentrations,rgb,outline. Each opens its asset on first access, cached after. - Three constructors β
from_plume_id(one HTTP, handles v3a and v3c),from_cmrawplume(zero HTTP if you have the typed plume),from_stac_item(driving STAC search directly; v3a only). - Outline canonical β fetches
plume-outline.geojsonvia the derived URL; falls back to band-4 alpha vectorize on fetch failure (with a warning). - v3a + v3c handled transparently β URL-pattern derivation
rewrites the host to the Bearer-aware api gateway, then builds
every asset URL from a single seed (
plume_tif).
from georeader.readers.carbonmapper import CMPlumeImage
# 1. Build from a plume_id β one HTTP round-trip
img = CMPlumeImage.from_plume_id(PLUME_ID, token=TOKEN)
print(img)
8.1 Lazy properties¶
Each property opens its asset on first access. No I/O happens
during from_plume_id beyond the catalog metadata fetch.
from georeader.rasterio_reader import RasterioReader
def describe(name, reader):
if reader is None:
return f"{name:22s} (absent)"
return f"{name:22s} {type(reader).__name__} shape={reader.shape}"
print(describe("mask:", img.mask))
print(describe("concentrations:", img.concentrations))
print(describe("ime_concentrations:", img.ime_concentrations))
print(describe("rgb:", img.rgb))
8.2 Outline (canonical GeoJSON)¶
outline returns a shapely geometry in EPSG:4326. The canonical
source is plume-outline.geojson (fetched from the v3a STAC
asset, or the URL-pattern equivalent for v3c). If that fetch
fails, the property falls back to vectorizing the band-4 alpha of
mask and logs a warning.
outline = img.outline
print(f"type: {type(outline).__name__}")
print(f"area: {outline.area:.6f} (degreesΒ², EPSG:4326)")
print(f"bounds: {tuple(round(b, 4) for b in outline.bounds)}")
See also¶
products_explore.ipynbβCMPlumeImage/CMImageRaster/CMPlumeRaster, georeader-backed lazy raster wrappers that consume the typed items returned here.- Carbon Mapper Reader API reference β full module / class / function listing rendered from source.
- Carbon Mapper API docs β upstream OpenAPI schema and endpoint inventory.