Carbon Mapper products¶
How to consume Carbon Mapper data at every product level via the georeader reader subpackage. One section per resource type, each showing the typed model, the asset wrapper, and a worked example with inline plots:
| Β§ | Resource | Wrapper | Headline asset |
|---|---|---|---|
| 1 | Plume β one detection | CMRawPlume (typed metadata) |
catalog dict + asset URLs |
| 2 | Image β per-plume products | CMPlumeImage (this PR's headline) |
mask, concentrations, ime, rgb, outline |
| 3 | Tile β L2B scene | CMImageRaster |
cmf retrieval + RGB sibling |
| 4 | Source β DBSCAN cluster | CMSource |
source-aggregated plume stats |
CH4 only in this notebook. The reader is gas-agnostic
internally, but query helpers are typed Literal["CH4"] for now;
CO2 lands in a follow-up.
Companion: api_explore.ipynb β the typed
query layer in depth (REST + STAC surfaces, filter map, exception
hierarchy).
Install¶
pip install 'georeader-spaceml[carbonmapper]'
Includes pydantic, requests, and the standard rasterio /
shapely dependencies. The wrappers go through
georeader.RasterioReader for the actual GeoTIFF I/O β no
extras beyond the base [carbonmapper] extra.
Setup¶
CarbonMapperConfig.load() resolves a Bearer token from env
(CARBONMAPPER_TOKEN, or CARBONMAPPER_EMAIL+PASSWORD) or one
of the standard config-file paths β see
api_explore.ipynb#authentication.
Carbon Mapper's STAC asset URLs are also gated by the same token,
so we push it into GDAL's curl session here. RasterioReader
goes through rasterio β GDAL β libcurl, which reads the env var.
import os
from georeader.readers.carbonmapper import (
CarbonMapperConfig,
Collection,
Gas,
Instrument,
api_queries,
)
# --- 429-resilient HTTP -----------------------------------------------
# CarbonMapper rate-limits per account. Notebook re-executions fire
# dozens of catalog probes back-to-back and can trip the per-minute cap.
# Mount a retry-aware adapter on a shared Session and re-bind the
# `requests` module shortcuts to route through it β so every HTTP call
# in this kernel (including ones inside georeader's own .download
# helpers) gets automatic 429 backoff honouring `Retry-After`, plus
# exponential backoff for transient 5xx.
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
_cm_session = requests.Session()
_cm_session.mount("https://", HTTPAdapter(max_retries=Retry(
total=8,
backoff_factor=2.0, # 2, 4, 8, 16, 32, 64 s
status_forcelist=(429, 500, 502, 503, 504),
respect_retry_after_header=True,
allowed_methods=frozenset(["GET", "POST"]),
)))
requests.get = _cm_session.get
requests.post = _cm_session.post
requests.request = _cm_session.request
TOKEN = CarbonMapperConfig.load().refresh_access_token()
# RasterioReader streams over GDAL/libcurl. Push the token in once.
os.environ["GDAL_HTTP_HEADERS"] = f"Authorization: Bearer {TOKEN}"
os.environ["CPL_VSIL_CURL_ALLOWED_EXTENSIONS"] = ".tif,.TIF"
# Retry on transient HTTP failures (5xx + 429) β the CarbonMapper
# CDN occasionally rate-limits tight read sequences. GDAL's built-in
# retry hides those from rasterio.
os.environ["GDAL_HTTP_MAX_RETRY"] = "5"
os.environ["GDAL_HTTP_RETRY_DELAY"] = "3" # seconds (exponential)
# Protagonist plume β March 31, 2026. L3A is processed as v3d
# (newest), L2B parent is v3c (no v3d L2B exists yet). 1954 kg/h
# emission β the strongest plume of this scene. Showcases the
# Phase 1 + 2 canonical API across the version split.
PLUME_ID = "tan20260331t181625c77s4001-D"
print(f"plume_id = {PLUME_ID}")
print(f"Collection.L3A_VIS_V3A.is_stac_resident = {Collection.L3A_VIS_V3A.is_stac_resident}")
print(f"Collection.L3A_VIS_V3C.is_stac_resident = {Collection.L3A_VIS_V3C.is_stac_resident}")
Products inventory β what's reachable per plume¶
Static reference β no API calls. The same content lives in the PR plan; mirrored here so the notebook is self-contained.
Per-plume assets (L3A vis + ime collections)¶
12 reachable assets per CH4 plume. CMPlumeImage exposes 7 of them as lazy properties; the 5 PNG variants are decimated thumbnails (strictly smaller than the matching GeoTIFFs) and intentionally not wrapped β there's no native georeferencing on a PNG.
| Asset | Format | What it is | CMPlumeImage property |
|---|---|---|---|
plume.tif |
GeoTIFF (RGBA) | Mask raster β band 4 alpha = broader plume mask | mask |
plume-concentrations.tif |
GeoTIFF (1-band) | CH4 column density crop (full per-plume window) | concentrations |
ime-cmf-concentrations.tif |
GeoTIFF (1-band) | IME-clipped column density (only mask-significant pixels) | ime_concentrations |
ime-cmf-mask.tif |
GeoTIFF (1-band) | IME-significance binary mask (subset of mask) |
ime_mask |
rgb.tif |
GeoTIFF (3-band) | True-colour context (widest extent of all per-plume assets) | rgb |
plume-outline.geojson |
GeoJSON | Broader plume polygon (canonical) | outline |
ime-cmf-outline.geojson |
GeoJSON | Tighter IME-significant polygon | ime_outline |
plume.png |
PNG | Decimated viz of plume.tif |
(not wrapped) |
plume-rgb.png |
PNG | Plume mask overlaid on RGB | (not wrapped) |
rgb.png |
PNG | Decimated viz of rgb.tif |
(not wrapped) |
ime-cmf-concentrations.png |
PNG | Decimated viz | (not wrapped) |
ime-cmf-mask.png |
PNG | Decimated viz | (not wrapped) |
For a wider scene-level concentration view, use the L2B
CMImageRaster.cmf from the parent tile (Β§ 3) β there is no
wider concentration product at the plume level.
Native extents (verified for one v3a plume)¶
Each asset has a different native size β Carbon Mapper crops them at different scales:
| Asset | Pixels | Approx. ground span |
|---|---|---|
plume.tif / rgb.tif |
101 Γ 100 | 3030m Γ 3000m (widest β full context window) |
plume-concentrations.tif |
41 Γ 48 | 1230m Γ 1440m (per-plume crop) |
ime-cmf-concentrations.tif |
11 Γ 11 | 330m Γ 330m (IME-significant only β tightest) |
| PNG variants | 6Γ3 to 115Γ115 | n/a (decimated thumbnails) |
This asymmetry is why the Β§2.4 visualization locks all panels to a common viewport (rgb dominates).
Version timeline (active CH4 product families)¶
Carbon Mapper bumps emission_version per processing-software
release. v3a is the canonical STAC-exposed version family;
v3c is the live processing version of newer plumes (post
2026-01) but is not registered in /stac/collections.
CMPlumeImage.from_plume_id handles both transparently via
URL-pattern derivation.
| Version | In STAC? | Item count (cmf-mfa) | Earliest | Latest | Notes |
|---|---|---|---|---|---|
mf-v1 |
β | 8,487 (L2B) | 2016-09-10 | 2024-06-12 | First public release; legacy schema |
mfa-v1 |
β | 24,786 (L2B) | 2017-06-18 | 2025-03-04 | Mass-Flux-Adjusted retrieval |
mfa-v3 |
β | 35,900 (L2B) | 2014-10-05 | 2025-11-11 | Largest single version; current STAC schema |
mfa-v3a |
β | 1,675 (L2B) | 2025-07-11 | 2025-12-16 | Canonical current β STAC-exposed |
v3b |
β | (REST URLs only) | early 2026 | early 2026 | Intermediate, not in STAC |
v3c |
β | (REST URLs only) | 2026-01 | live | Newest β REST URLs only, NOT in STAC |
v3a and v3c cover different plumes (not different versions
of the same plumes). v3a is a curated 2025-07 β 2025-12
snapshot; v3c is the rolling forward processing of plumes from
2026-01 onward.
1 Β· Plumes β typed metadata¶
CMRawPlume is the typed view of one Carbon Mapper detection. The
headline metric is emission_auto (kg/h); everything else
(emission uncertainty, wind, geometry, sector, instrument) is
ancillary.
Two derived properties bridge plumes to the other entities below:
scene_idβplume_id.rsplit('-', 1)[0]. The L2B item id inCollection.L2B_V3A. Use this to navigate from a plume to its parent tile (Β§3) without an HTTP round-trip.versionβ re-exposesemission_version("v3a"/"v3b"/"v3c"). Branch on this to choose between STAC item lookup (v3a) and URL-pattern derivation (v3c, see Β§2).
plume = api_queries.get_plume(TOKEN, PLUME_ID)
print(f"plume_id : {plume.plume_id}")
print(f"gas : {plume.gas}")
print(f"instrument : {plume.instrument} β {plume.instrument_name}")
print(f"version : {plume.version}")
print(f"scene_id : {plume.scene_id} β derived (rsplit)")
print(f"scene_uuid : {plume.scene_uuid} β API's internal UUID")
print(f"emission_auto : {plume.emission_auto:>9.1f} kg/h")
print(f"wind (u, v) : ({plume.wind_u:.2f}, {plume.wind_v:.2f}) m/s")
print(f"sector : {plume.sector}")
1.1 Enums for constrained value sets¶
Three StrEnums expose Carbon Mapper's closed value sets. They
satisfy str so they're drop-in for any string parameter, but
typed checking + IDE autocomplete come for free.
print(f"Gas: {[g.value for g in Gas]}")
print(f"Instrument: {[i.value for i in Instrument]}")
print()
# Instrument is case-insensitive on construction (the upstream API
# is mixed-case β `tan` lowercase, `GAO` uppercase β and `_missing_`
# normalises).
for variant in ("tan", "TAN", "Tan", "gao", "GAO"):
print(f" Instrument({variant!r:>5}) β {Instrument(variant)}")
1.2 list_plumes β typed, filterable, paginated¶
The default consumer surface for plume discovery. gas is typed
Gas | Literal["CH4"] so plain gas="CH4" calls keep
type-checking.
from datetime import datetime, timezone
# March 2026 Tanager CH4 detections across CONUS β same time
# window as the protagonist plume. Shows the L3A version split
# (v3c through 2026-02, v3d from 2026-03) on the `version` field.
plumes = api_queries.list_plumes(
TOKEN,
instruments=["tan"],
datetime_min=datetime(2026, 3, 1, tzinfo=timezone.utc),
datetime_max=datetime(2026, 3, 31, tzinfo=timezone.utc),
gas=Gas.CH4,
limit=20,
)
print(f"{len(plumes)} CH4 Tanager plumes (March 2026)\n")
for p in plumes[:5]:
em = f"{p.emission_auto:>7.0f}" if p.emission_auto is not None else " β"
print(
f" {p.plume_id} emission={em} kg/h "
f"sector={p.sector} version={p.version}"
)
2 Β· Images β per-plume product bundle¶
CMPlumeImage wraps every reachable per-plume L3A asset as a
lazy property:
| Property | Asset | Type | Notes |
|---|---|---|---|
mask |
plume.tif |
RasterioReader (RGBA) |
Band 4 alpha = broader plume mask |
concentrations |
plume-concentrations.tif |
RasterioReader |
Per-plume column-density crop (full) |
ime_concentrations |
ime-cmf-concentrations.tif |
RasterioReader |
IME-clipped column density β the data CM integrated for emission_auto |
ime_mask |
ime-cmf-mask.tif |
RasterioReader |
IME-significant binary mask β tighter subset of mask |
rgb |
rgb.tif |
RasterioReader |
True-colour context (3-band) |
outline |
plume-outline.geojson |
shapely.MultiPolygon |
Broader plume polygon (canonical) |
ime_outline |
ime-cmf-outline.geojson |
shapely.MultiPolygon |
Tighter IME-significant polygon |
PNG variants (plume.png, plume-rgb.png, rgb.png,
ime-cmf-concentrations.png, ime-cmf-mask.png) are
intentionally not wrapped β Carbon Mapper publishes them as
decimated thumbnails for visualisation, not georeferenced data.
The matching .tif is always strictly larger and
georeferenced.
Key design choices:
- Outline canonical β
outlinefetches the GeoJSON; falls back to vectorising band-4 alpha (with a warning) only on fetch failure.ime_outlinereturnsNonerather than vectorising β the broaderoutlineis the safety net. - v3a + v3c handled transparently β
from_plume_idderives every asset URL from the catalogplume_tifURL via a version-agnostic regex swap. Works for v3a (STAC-resident), v3b (intermediate), and v3c (CDN-only). - Seven lazy properties β opening one doesn't open the others. Each is cached after first access.
No other concentration GeoTIFF exists at the plume level. The
two above (concentrations + ime_concentrations) are it. For
a wider scene-level concentration view, use CMImageRaster.cmf
from the parent tile (see Β§ 3).
2.1 Construction β three entry points¶
from_plume_id (one HTTP round-trip), from_cmrawplume (zero
HTTP if you already have the typed plume), and from_stac_item
(driving STAC search directly; v3a only).
from georeader.readers.carbonmapper import CMPlumeImage
# Option A β from a plume_id (one HTTP, handles v3a / v3b / v3c)
img = CMPlumeImage.from_plume_id(PLUME_ID, token=TOKEN)
print(img)
2.2 Properties β what's reachable¶
URL derivation already happened in from_plume_id (one HTTP for
the catalog metadata, then 7 URLs derived locally β no further
network I/O until you access a property). The actual rasterio
reads happen in Β§2.4 below.
# Inventory the URLs CMPlumeImage built. Properties are lazy β
# none of these have opened a raster yet.
from georeader.readers.carbonmapper.image import CM_PLUME_IMAGE_ASSETS
print(f"{'asset':32s} URL?")
print("-" * 38)
for asset in CM_PLUME_IMAGE_ASSETS:
print(f" {asset:30s} {'yes' if asset in img.urls else 'no'}")
2.3 Outlines β broader vs IME-tight¶
Two polygon products in EPSG:4326:
outlineβ broader plume polygon, canonical fromplume-outline.geojson. Falls back to vectorising the band-4 alpha ofmask(with a warning) on fetch failure.ime_outlineβ tighter, the polygon Carbon Mapper actually integrated over foremission_auto. Excludes pixels below the IME detection threshold. ReturnsNoneon fetch failure rather than vectorising β the broader outline is the safety net.
def show(name, geom):
if geom is None:
print(f"{name:14s} (absent)")
return
print(f"{name:14s} area={geom.area:.6e} degΒ² bounds={tuple(round(b, 4) for b in geom.bounds)}")
show("outline:", img.outline)
show("ime_outline:", img.ime_outline)
2.4 Visualise β every per-plume product with the outline overlaid¶
A 4-panel figure with the canonical plume-outline.geojson
overlaid on every panel:
- mask β
plume.tifband-4 alpha (the binary plume mask). - concentrations β full per-plume crop from
plume-concentrations.tif. - ime_concentrations β IME-clipped retrieval from
ime-cmf-concentrations.tif(only mask-significant pixels β the data used to computeemission_auto). - rgb β true-colour context from
rgb.tif.
Carbon Mapper publishes each asset with a different native
extent β plume.tif and plume-concentrations.tif are tightly
cropped to the plume; ime-cmf-concentrations.tif is slightly
wider; rgb.tif is the full context window. To read panels
side-by-side, we lock matplotlib's xlim / ylim across all
four to the widest available extent (rgb's bounds), so the plume
sits inside each panel rather than filling it.
import matplotlib.pyplot as plt
import numpy as np
from pyproj import Transformer
from shapely.ops import transform as shp_transform
def _plot_outline(ax, geom_4326, dst_crs):
"""Reproject outline into dst_crs and stroke it."""
if geom_4326 is None:
return
tr = Transformer.from_crs("EPSG:4326", dst_crs, always_xy=True)
g = shp_transform(tr.transform, geom_4326)
polys = list(g.geoms) if g.geom_type == "MultiPolygon" else [g]
for p in polys:
x, y = p.exterior.xy
ax.plot(x, y, color="red", lw=1.4, solid_capstyle="round")
# Pick the widest available extent (rgb is widest by CM convention)
# and lock all panels' xlim/ylim to it for visual consistency.
target_crs = None
xmin = ymin = float("inf")
xmax = ymax = float("-inf")
for reader in (img.mask, img.concentrations, img.ime_concentrations, img.rgb):
if reader is None:
continue
b = reader.bounds
target_crs = target_crs or str(reader.crs)
xmin, ymin = min(xmin, b[0]), min(ymin, b[1])
xmax, ymax = max(xmax, b[2]), max(ymax, b[3])
common_xlim = (xmin, xmax)
common_ylim = (ymin, ymax)
import time
# Refresh token + light throttling before the multi-raster reads
# below β the api gateway sometimes rate-limits tight read bursts.
fresh = CarbonMapperConfig.load().refresh_access_token()
import os
os.environ["GDAL_HTTP_HEADERS"] = f"Authorization: Bearer {fresh}"
fig, axes = plt.subplots(1, 4, figsize=(18, 4.6), constrained_layout=True)
# (a) mask β band 4 alpha
time.sleep(1); mask_geo = img.mask.load()
mask_arr = np.asarray(mask_geo.values)
alpha = (mask_arr[3] > 0).astype("uint8") if mask_arr.shape[0] >= 4 else mask_arr.squeeze()
bx = mask_geo.bounds
axes[0].imshow(
alpha,
extent=(bx[0], bx[2], bx[1], bx[3]),
origin="upper", cmap="Greys", interpolation="nearest",
)
_plot_outline(axes[0], img.outline, str(mask_geo.crs))
axes[0].set_title(f"mask (alpha) {mask_geo.crs}")
# (b) concentrations β full per-plume crop from plume-concentrations.tif
time.sleep(1); con_geo = img.concentrations.load()
con_arr = np.asarray(con_geo.values).squeeze().astype("float32")
nd = img.concentrations.nodata
if nd is not None:
con_arr = np.where(con_arr == nd, np.nan, con_arr)
bx = con_geo.bounds
im = axes[1].imshow(
con_arr,
extent=(bx[0], bx[2], bx[1], bx[3]),
origin="upper", cmap="plasma",
)
fig.colorbar(im, ax=axes[1], fraction=0.046, pad=0.04, label="ppmΒ·m")
_plot_outline(axes[1], img.outline, str(con_geo.crs))
axes[1].set_title("concentrations (per-plume crop)")
# (c) IME concentrations (mask-clipped, per-plume)
if img.ime_concentrations is not None:
time.sleep(1); ime_geo = img.ime_concentrations.load()
ime_arr = np.asarray(ime_geo.values).squeeze().astype("float32")
nd = img.ime_concentrations.nodata
if nd is not None:
ime_arr = np.where(ime_arr == nd, np.nan, ime_arr)
bx = ime_geo.bounds
im = axes[2].imshow(
ime_arr,
extent=(bx[0], bx[2], bx[1], bx[3]),
origin="upper", cmap="plasma",
)
fig.colorbar(im, ax=axes[2], fraction=0.046, pad=0.04, label="ppmΒ·m")
_plot_outline(axes[2], img.outline, str(ime_geo.crs))
axes[2].set_title("ime_concentrations (clipped)")
else:
axes[2].set_axis_off()
axes[2].set_title("ime_concentrations (absent)")
# (d) RGB
if img.rgb is not None:
time.sleep(1); rgb_geo = img.rgb.load()
rgb_arr = np.asarray(rgb_geo.values)
rgb_img = np.moveaxis(rgb_arr[:3], 0, -1).astype("float32")
lo, hi = np.nanpercentile(rgb_img, [2, 98])
rgb_img = np.clip((rgb_img - lo) / max(hi - lo, 1e-9), 0, 1)
bx = rgb_geo.bounds
axes[3].imshow(rgb_img, extent=(bx[0], bx[2], bx[1], bx[3]), origin="upper")
_plot_outline(axes[3], img.outline, str(rgb_geo.crs))
axes[3].set_title(f"rgb {rgb_geo.crs}")
else:
axes[3].set_axis_off()
axes[3].set_title("rgb (absent)")
# Lock all four panels to the same viewport for direct comparison.
for ax in axes:
ax.set_xlim(common_xlim)
ax.set_ylim(common_ylim)
ax.set_aspect("equal")
fig.suptitle(f"CMPlumeImage Β· {PLUME_ID}", fontsize=11)
plt.show()
3 Β· Tiles β L2B scene rasters¶
CMImageRaster wraps the L2B retrieval scene β full Tanager swath
with cmf / cmf-unortho / uncertainty / uncertainty-unortho
/ artifact-mask / uas (text sidecar) plus an rgb sibling.
Use api_queries.get_image_raster_for_plume(token, plume_id) β
the canonical resolver. It tries STAC first (cheap when the
scene's L2B is registered there) and falls back to URL-pattern
derivation for 2026 v3c/v3d scenes that aren't in
/stac/collections. The protagonist is a March 2026 v3d L3A
plume; its L2B parent is v3c, reachable only via the fallback
path.
from georeader.readers.carbonmapper import CMImageRaster
# One resolver for every version. Returns a CMImageRaster with
# all 7 L2B assets (cmf, cmf-unortho, uncertainty,
# uncertainty-unortho, artifact-mask, uas, rgb sibling). Lazy:
# no rasters are opened until a property is accessed.
ir = api_queries.get_image_raster_for_plume(TOKEN, PLUME_ID)
print(ir)
print()
print(f"L2B collection used: {ir.asset_paths['cmf'].split('/')[-4]}")
3.1.5 Cache L2B rasters locally¶
The v3c L2B CDN can rate-limit tight rasterio read bursts β the
demos in Β§Β§ 3.2 / 3.3 / 3.4 open cmf / rgb / uncertainty
multiple times each. Pre-download those three GeoTIFFs via Python
requests (whose 429-retry adapter we configured in the setup
cell), then rebind ir.asset_paths to the local files. Idempotent:
re-runs skip files already in /tmp/cm_notebooks/<plume_id>/.
This is purely a notebook-cell cache for repeatable demos β
CMImageRaster reads from whatever paths are in asset_paths, so
local file paths work transparently. Production code goes
straight at the CDN.
from pathlib import Path
import requests # uses the retry-equipped session from the setup cell
CACHE_DIR = Path("/tmp/cm_notebooks") / PLUME_ID
CACHE_DIR.mkdir(parents=True, exist_ok=True)
for band in ("cmf", "rgb", "uncertainty"):
url = ir.asset_paths[band]
local = CACHE_DIR / f"{band}.tif"
if not local.exists():
print(f" fetching {band:13s} -> {local}")
r = requests.get(
url,
headers={"Authorization": f"Bearer {TOKEN}"},
stream=True,
timeout=120,
)
r.raise_for_status()
with local.open("wb") as fh:
for chunk in r.iter_content(chunk_size=1 << 20):
fh.write(chunk)
else:
print(f" cached {band:13s} ({local.stat().st_size // 1024:>5d} KB)")
# Rebind so downstream rasterio reads (read_polygon, tile_*) hit
# the local file instead of the CDN.
ir.asset_paths[band] = str(local)
# Share the cached `ir` with CMPlumeImage's `.tile` cached_property
# so img.tile_cmf() / .tile_rgb() / .tile_uncertainty() in Β§3.4 also
# read from the local cache. Pre-seeding __dict__ short-circuits the
# get_image_raster_for_plume call the descriptor would otherwise make.
img.__dict__["tile"] = ir
3.1 Lazy band properties¶
Every loadable L2B asset has a property β including the
*-unortho raw-frame variants and the uas text sidecar that
older versions of the wrapper dropped at construction. Inventory
the URLs without opening any raster (actual reads happen in
Β§3.3 below).
# Inventory the L2B assets resolved on this CMImageRaster β no
# rasterio opens here (avoids hammering the api gateway with a
# burst of HTTP heads).
for band in ("cmf", "cmf-unortho", "uncertainty", "uncertainty-unortho",
"artifact-mask", "rgb", "uas"):
have = band in ir.asset_paths
print(f" {band:24s} {'yes' if have else 'no'}")
# Scene-level metadata is stored on the wrapper itself (no IO).
print()
print(f"scene_id : {ir.scene_id}")
3.2 read_polygon β clip the scene to a polygon¶
Returns {band: <geodata> | None} per requested band. Use
img.outline (from Β§2) as the clip geometry to fetch the
column-density crop straight from the parent scene β same data
Carbon Mapper duplicates as con_tif per plume.
# `read_polygon` interprets the geometry's CRS via crs_polygon.
# img.outline is in EPSG:4326 β the default.
crops = ir.read_polygon(
polygon=img.outline,
bands=("cmf", "uncertainty"),
)
for band, reader in crops.items():
if reader is None:
print(f"{band:14s}: no overlap")
else:
geo = reader.load()
print(f"{band:14s}: GeoTensor shape={tuple(geo.values.shape)} crs={geo.crs}")
3.3 Visualise β scene crop with plume polygon overlay¶
Pad the plume bounds 4Γ for context, crop each band, then overlay the plume polygon (reprojected EPSG:4326 β scene CRS) on each panel.
import shapely.affinity as saff
from shapely.geometry import box as shp_box
# 4Γ padded window for context.
context_window = saff.scale(shp_box(*img.outline.bounds), xfact=4.0, yfact=4.0)
context_crops = ir.read_polygon(
polygon=context_window,
bands=("cmf", "rgb", "uncertainty"),
)
panels = [
("cmf", "magma", "cmf (ppmΒ·m)"),
("rgb", None, None),
("uncertainty", "viridis", "uncertainty"),
]
fig, axes = plt.subplots(1, 3, figsize=(15, 5.2), constrained_layout=True)
for ax, (band, cmap, cbar_label) in zip(axes, panels):
reader = context_crops[band]
if reader is None:
ax.set_axis_off()
ax.set_title(f"{band} (no overlap)")
continue
geo = reader.load()
arr = np.asarray(geo.values)
bx = geo.bounds
if band == "rgb" and arr.ndim == 3 and arr.shape[0] >= 3:
rgb_img = np.moveaxis(arr[:3], 0, -1).astype("float32")
lo, hi = np.nanpercentile(rgb_img, [2, 98])
rgb_img = np.clip((rgb_img - lo) / max(hi - lo, 1e-9), 0, 1)
ax.imshow(rgb_img, extent=(bx[0], bx[2], bx[1], bx[3]), origin="upper")
else:
a = arr.squeeze().astype("float32")
nd = reader.nodata
if nd is not None:
a = np.where(a == nd, np.nan, a)
im = ax.imshow(
a, extent=(bx[0], bx[2], bx[1], bx[3]),
origin="upper", cmap=cmap,
)
fig.colorbar(im, ax=ax, fraction=0.046, pad=0.04, label=cbar_label)
_plot_outline(ax, img.outline, str(geo.crs))
ax.set_title(f"{band} {geo.crs}")
ax.set_xlabel("E (m)")
ax.set_ylabel("N (m)")
ax.set_aspect("equal")
fig.suptitle("L2B products with plume polygon overlay")
plt.show()
3.4 L3A thumbnail vs L2B tile-crop β side-by-side¶
The L3A pre-cropped products (plume-concentrations.tif,
ime-cmf-concentrations.tif, etc.) are thumbnail-grade β
the IME concentration raster is typically ~11 Γ 11 px and
pixelates into circle-like blobs. The Phase 2 tile_*()
methods crop the L2B parent at full native resolution
(~150 Γ 150 px with the default pad_px=64), giving a
recognisable plume image you can actually overlay against an
RGB context.
Same plume, same outline polygon overlay β left column is the
L3A thumbnail, right column is img.tile_* at L2B native
resolution.
# Phase 2: one method call per band. .tile is lazy (one HTTP
# the first time, cached thereafter); the crop methods do
# read.read_from_polygon(...) + .load() internally.
# `img` was already constructed in Β§2 from the protagonist PLUME_ID.
crop_cmf = img.tile_cmf(pad_px=64) # GeoTensor
crop_rgb = img.tile_rgb(pad_px=64) # GeoTensor
crop_unc = img.tile_uncertainty(pad_px=64) # GeoTensor
outline_overlay = img.outline
fig, axes = plt.subplots(2, 2, figsize=(11, 9.5), constrained_layout=True)
# Top-left: L3A pre-cropped thumbnail (small, pixelated).
ax = axes[0, 0]
l3a_arr = img.concentrations.load().values.squeeze()
l3a_bounds = img.concentrations.bounds
l3a_nd = img.concentrations.nodata
if l3a_nd is not None:
l3a_arr = np.where(l3a_arr == l3a_nd, np.nan, l3a_arr)
im0 = ax.imshow(
l3a_arr,
extent=(l3a_bounds[0], l3a_bounds[2], l3a_bounds[1], l3a_bounds[3]),
origin="upper", cmap="magma",
)
fig.colorbar(im0, ax=ax, fraction=0.046, pad=0.04, label="L3A ppmΒ·m")
ax.set_title(
f"L3A plume-concentrations.tif (thumbnail)\n"
f"{l3a_arr.shape[1]} Γ {l3a_arr.shape[0]} px (pre-cropped)",
fontsize=10,
)
_plot_outline(ax, outline_overlay, str(img.concentrations.crs))
ax.set_aspect("equal")
# Top-right: Phase 2 .tile_cmf β L2B native resolution.
ax = axes[0, 1]
cmf_arr = np.asarray(crop_cmf.values).squeeze().astype("float32")
# Mask nodata from the cropped tile (-9999 sentinel).
cmf_arr = np.where(cmf_arr == -9999, np.nan, cmf_arr)
bx = crop_cmf.bounds
im1 = ax.imshow(
cmf_arr, extent=(bx[0], bx[2], bx[1], bx[3]),
origin="upper", cmap="magma",
)
fig.colorbar(im1, ax=ax, fraction=0.046, pad=0.04, label="L2B ppmΒ·m")
ax.set_title(
f"img.tile_cmf(pad_px=64)\n"
f"{cmf_arr.shape[1]} Γ {cmf_arr.shape[0]} px @ L2B native res",
fontsize=10,
)
_plot_outline(ax, outline_overlay, str(crop_cmf.crs))
ax.set_aspect("equal")
# Bottom-left: .tile_rgb β proper context image.
ax = axes[1, 0]
rgb_img = np.moveaxis(np.asarray(crop_rgb.values)[:3], 0, -1).astype("float32")
lo, hi = np.nanpercentile(rgb_img, [2, 98])
rgb_img = np.clip((rgb_img - lo) / max(hi - lo, 1e-9), 0, 1)
rb = crop_rgb.bounds
ax.imshow(rgb_img, extent=(rb[0], rb[2], rb[1], rb[3]), origin="upper")
ax.set_title(
f"img.tile_rgb(pad_px=64)\n"
f"{rgb_img.shape[1]} Γ {rgb_img.shape[0]} px true-colour",
fontsize=10,
)
_plot_outline(ax, outline_overlay, str(crop_rgb.crs))
ax.set_aspect("equal")
# Bottom-right: .tile_uncertainty.
ax = axes[1, 1]
unc_arr = np.asarray(crop_unc.values).squeeze().astype("float32")
unc_arr = np.where(unc_arr == -9999, np.nan, unc_arr)
ub = crop_unc.bounds
im3 = ax.imshow(
unc_arr, extent=(ub[0], ub[2], ub[1], ub[3]),
origin="upper", cmap="viridis",
)
fig.colorbar(im3, ax=ax, fraction=0.046, pad=0.04, label="ppmΒ·m")
ax.set_title(
f"img.tile_uncertainty(pad_px=64)\n"
f"{unc_arr.shape[1]} Γ {unc_arr.shape[0]} px",
fontsize=10,
)
_plot_outline(ax, outline_overlay, str(crop_unc.crs))
ax.set_aspect("equal")
fig.suptitle(
f"{PLUME_ID}: L3A thumbnail (left) vs L2B tile-crop (right)",
fontsize=11,
)
plt.show()
4 Β· Sources β DBSCAN clusters with stats¶
CMSource is one DBSCAN-clustered point source, addressed by the
deterministic key {gas}_{sector}_{footprint_m}m_{lon}_{lat}. One
source aggregates many plumes detected at the same physical site
across many scenes / dates.
Three patterns to know:
list_sourcesβ paginated source listing within a bbox.get_source_for_plumeβ resolve plume β source.list_plumes_for_sourceβ fan out to every detection attributed to one source (CSV-backed; full materialisation).
Below we use pattern 2 to find our protagonist plume's source, then pattern 3 to compute per-source detection statistics.
# Plume β its source (one HTTP, returns None if the plume isn't
# clustered yet β first-detection sites haven't passed DBSCAN).
src = api_queries.get_source_for_plume(TOKEN, PLUME_ID)
def _fmt(v, fmt=""):
return f"{v:{fmt}}" if v is not None else "β"
if src is None:
print("plume not yet clustered into a source")
else:
print(f"source_name : {src.source_name}")
print(f"sector : {src.sector or 'β'}")
print(f"plume_count : {_fmt(src.plume_count)}")
print(f"persistence : {_fmt(src.persistence, '.2f')}")
print(f"emission_auto: {_fmt(src.emission_auto, '.0f')} kg/h (site-aggregate)")
print(f"point : ({src.point.x:.5f}, {src.point.y:.5f})")
4.1 All plumes attributed to a persistent source¶
Our protagonist plume above may or may not have been clustered into a source yet β fresh detections usually haven't passed DBSCAN at the time of acquisition. For the stats walkthrough switch to a known-persistent Permian oil-and-gas source β many detections across multiple Tanager passes.
list_plumes_for_source materialises every detection into a
list of typed CMRawPlumes. The endpoint is single-shot (no
pagination); for a typical Permian site this returns a few
hundred records in seconds.
# Switch to a persistent Permian oil-and-gas source for the stats
# demo (the protagonist plume's source has only 1 detection).
DEMO_SOURCE = "CH4_1B2_500m_-103.93835_32.06406"
demo_src = api_queries.get_source(TOKEN, DEMO_SOURCE)
plumes_at_source = api_queries.list_plumes_for_source(TOKEN, DEMO_SOURCE)
print(f"{len(plumes_at_source)} detections at {DEMO_SOURCE}")
print()
print(f"{'plume_id':40s} {'date':10s} {'inst':5s} {'kg/h':>8s}")
print("-" * 70)
for p in plumes_at_source[:6]:
obs = p.observation_datetime
date = obs.date().isoformat() if obs else "β"
em = f"{p.emission_auto:.0f}" if p.emission_auto is not None else "β"
print(f"{p.plume_id:40s} {date:10s} {p.instrument or 'β':5s} {em:>8s}")
if len(plumes_at_source) > 6:
print(f"... and {len(plumes_at_source) - 6} more")
4.2 Source statistics¶
Compute a quick site profile from the attributed-plume list:
- detection count + distinct days
- emission distribution (median / p90 / max)
- sensor mix
- date span (first β last detection)
These are the same kinds of stats Carbon Mapper's own source UI shows; rolling them up locally lets you filter / compare sources yourself without the UI.
import pandas as pd
# Emissions: filter out the unset / hidden cases
emissions = pd.Series(
[p.emission_auto for p in plumes_at_source if p.emission_auto is not None],
name="kg_per_h",
)
# Distinct detection days
detection_dates = sorted({
p.observation_datetime.date()
for p in plumes_at_source
if p.observation_datetime is not None
})
# Sensor mix
sensor_counts = pd.Series(
[p.instrument for p in plumes_at_source if p.instrument]
).value_counts()
def _fmt(v, fmt=""):
return f"{v:{fmt}}" if v is not None else "β"
# Pretty print a one-page profile
print(f"=== {demo_src.source_name} ===")
print(f"sector : {demo_src.sector or 'β'}")
print(f"persistence : {_fmt(demo_src.persistence, '.2f')}")
print(f"site emission_auto : {_fmt(demo_src.emission_auto, '>8.0f')} kg/h (CM aggregate)")
print()
print(f"detections : {len(plumes_at_source)}")
print(f"with emission : {len(emissions)}")
print(f"distinct days : {len(detection_dates)}")
if detection_dates:
print(f"date range : {detection_dates[0]} β {detection_dates[-1]}")
print()
if len(emissions):
print(f"emission (kg/h):")
print(emissions.describe(percentiles=[0.5, 0.9, 0.99]).round(0).to_string())
print()
print(f"sensor mix:")
print(sensor_counts.to_string())
4.3 Visualise β emission timeline¶
A scatter of every detection's emission rate over time, with the
site-aggregate emission_auto from the CMSource overlaid.
Helps you eyeball whether the source's headline emission rate is
representative of typical detections, or skewed by a few
super-emitter days.
detections_df = pd.DataFrame([
{
"datetime": p.observation_datetime,
"emission_auto": p.emission_auto,
"instrument": p.instrument,
}
for p in plumes_at_source
if p.observation_datetime is not None
and p.emission_auto is not None
]).sort_values("datetime")
if len(detections_df) == 0:
print("no detections with emissions to plot")
else:
fig, ax = plt.subplots(figsize=(11, 4.2), constrained_layout=True)
for inst, subset in detections_df.groupby("instrument"):
ax.scatter(
subset["datetime"], subset["emission_auto"],
label=f"{inst} ({len(subset)})", alpha=0.7, s=22,
)
if demo_src.emission_auto is not None:
ax.axhline(
demo_src.emission_auto, color="black", linestyle="--", linewidth=1,
label=f"site emission_auto = {demo_src.emission_auto:.0f} kg/h",
)
ax.set_xlabel("observation_datetime (UTC)")
ax.set_ylabel("emission_auto (kg/h)")
ax.set_title(f"Emission timeline Β· {demo_src.source_name}")
ax.legend(fontsize=8)
ax.grid(True, alpha=0.3)
plt.show()
See also¶
api_explore.ipynbβ the typed query layer in depth (REST + STAC surfaces, filter map, exception hierarchy, plume catalog stats).- Carbon Mapper Reader API reference β full module / class / function listing.
- GeoReader concepts β the abstract
GeoDataprotocol bothRasterioReaderandGeoTensorsatisfy.