Skip to content

Geospatial Data Reading and Manipulation

Protocols

The georeader package uses two main protocols to define interfaces for geospatial data:

GeoDataBase Protocol

This is the minimal interface required for geospatial operations. Window methods require objects implementing this protocol as input. Any class implementing this protocol provides basic spatial information:

  • transform: A rasterio.Affine object defining the spatial transform
  • crs: Coordinate reference system
  • shape: The shape of the data array
  • width: Width of the data (shape[-1])
  • height: Height of the data (shape[-2])

GeoData Protocol

This extends the GeoDataBase protocol with methods for data access. Read methods require objects implementing this protocol as inputs. Classes implementing the GeoData protocol must have the following methods and properties:

  • All properties from GeoDataBase
  • load(boundless: bool = True) -> GeoTensor: Loads data into memory
  • read_from_window(window, boundless) -> Union[Self, GeoTensor]: Reads data from a window
  • values: Returns the data array
  • res: Resolution (tuple of x and y resolution)
  • dtype: Data type
  • dims: Dimension names
  • fill_value_default: Fill value for missing data
  • bounds: Data bounds
  • footprint(crs: Optional[str] = None) -> Polygon: Returns the footprint as a polygon

Implementations

The library provides the following implementations of the GeoData protocol:

  1. GeoTensor: A numpy-based implementation for in-memory operations.
  2. RasterioReader: An implementation for lazy-loading with rasterio.
  3. readers.*: Custom readers for official data formats of several satellite missions (Sentinel-2, Proba-V, SpotVGT, EMIT, PRISMA or EnMAP).

Window and Read Methods

The API provides two types of methods:

Window Methods

These methods work with any object implementing the GeoDataBase protocol. They calculate rasterio.windows objects without reading any data:

Read Methods

These methods require objects implementing the GeoData protocol. They load and transform data:

API Reference

Read Methods

Read Module: Window-based raster reading with reprojection and resampling.

This module provides functions to read raster data from various sources using window-based access patterns. It handles coordinate transformations, reprojection, and resampling - the core I/O operations for geospatial raster processing.

Reading Workflow Overview

The module supports multiple ways to specify the area of interest::

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    READING WORKFLOW: AREA SPECIFICATION                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                          β”‚
β”‚  Input Specification          Function                     Output       β”‚
β”‚  ────────────────────         ─────────────────────        ──────────   β”‚
β”‚                                                                          β”‚
β”‚  Polygon (geometry)     ───►  read_from_polygon()    ───►  GeoTensor   β”‚
β”‚                                                                          β”‚
β”‚  Bounds (minx,miny,     ───►  read_from_bounds()     ───►  GeoTensor   β”‚
β”‚          maxx,maxy)                                                      β”‚
β”‚                                                                          β”‚
β”‚  Center + Shape         ───►  read_from_center_coords() ─► GeoTensor   β”‚
β”‚  (x, y) + (H, W)                                                         β”‚
β”‚                                                                          β”‚
β”‚  Window (row_off,       ───►  read_from_window()     ───►  GeoTensor   β”‚
β”‚          col_off, H, W)                                                  β”‚
β”‚                                                                          β”‚
β”‚  Web Tile (x, y, z)     ───►  read_from_tile()       ───►  GeoTensor   β”‚
β”‚                                                                          β”‚
β”‚  Match another raster   ───►  read_reproject_like()  ───►  GeoTensor   β”‚
β”‚                                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Window vs Bounds Coordinates

Understanding the difference between pixel windows and geographic bounds::

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              WINDOW (PIXELS) vs BOUNDS (GEOGRAPHIC COORDINATES)          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                          β”‚
β”‚  WINDOW (pixel space)                BOUNDS (CRS units)                 β”‚
β”‚  ─────────────────────               ──────────────────                 β”‚
β”‚                                                                          β”‚
β”‚  (col_off, row_off)                  (minx, maxy)  ← upper-left         β”‚
β”‚       ↓                                   ↓                              β”‚
β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”‚
β”‚    β”‚ width pixels β”‚                  β”‚              β”‚ geographic        β”‚
β”‚    β”‚              β”‚   ◄═══════►      β”‚              β”‚ extent in         β”‚
β”‚    β”‚ height pixelsβ”‚    transform     β”‚              β”‚ CRS units         β”‚
β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚
β”‚                                           ↑                              β”‚
β”‚                                      (maxx, miny)  ← lower-right        β”‚
β”‚                                                                          β”‚
β”‚  Window: rasterio.windows.Window(col_off, row_off, width, height)       β”‚
β”‚  Bounds: (minx, miny, maxx, maxy) - order matches shapely/rasterio      β”‚
β”‚                                                                          β”‚
β”‚  Conversion:                                                             β”‚
β”‚    bounds = window_utils.window_bounds(window, transform)               β”‚
β”‚    window = window_from_bounds(data, bounds, crs_bounds)                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Reprojection & Resampling

When reading data into a different CRS or resolution::

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     REPROJECTION WORKFLOW                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                          β”‚
β”‚  Source CRS (e.g., EPSG:4326)         Target CRS (e.g., EPSG:32633)    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚
β”‚  β”‚  β•±β•²    β•±β•²    β•±β•²    β”‚              β”‚ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β”‚           β”‚
β”‚  β”‚ β•±  β•²  β•±  β•²  β•±  β•²   β”‚    ═════►    β”‚ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β”‚           β”‚
β”‚  β”‚β•±    β•²β•±    β•²β•±    β•²  β”‚   Reproject  β”‚ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β”‚           β”‚
β”‚  β”‚ Irregular grid     β”‚   + Resample β”‚ Regular UTM grid   β”‚           β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚
β”‚                                                                          β”‚
β”‚  Resampling Methods (rasterio.warp.Resampling):                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚ Method         β”‚ Best for                                       β”‚    β”‚
β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”‚
β”‚  β”‚ nearest        β”‚ Categorical data, masks, classification        β”‚    β”‚
β”‚  β”‚ bilinear       β”‚ Continuous data, fast                          β”‚    β”‚
β”‚  β”‚ cubic          β”‚ Continuous data, smooth                        β”‚    β”‚
β”‚  β”‚ cubic_spline   β”‚ Continuous data, very smooth (DEFAULT)         β”‚    β”‚
β”‚  β”‚ lanczos        β”‚ Downsampling, sharp edges                      β”‚    β”‚
β”‚  β”‚ average        β”‚ Downsampling, area-weighted mean               β”‚    β”‚
β”‚  β”‚ mode           β”‚ Downsampling categorical data                  β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                                                                          β”‚
β”‚  Anti-aliasing: Automatic Gaussian blur before downsampling to          β”‚
β”‚                 prevent aliasing artifacts. Controlled by:              β”‚
β”‚                 - anti_aliasing=True (default in resize)                β”‚
β”‚                 - anti_aliasing_sigma (auto-calculated or manual)       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Boundless Reading

Reading outside raster bounds returns fill values::

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    BOUNDLESS READING (boundless=True)                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                          β”‚
β”‚  Requested Window              Result with boundless=True               β”‚
β”‚  ─────────────────              ─────────────────────────               β”‚
β”‚                                                                          β”‚
β”‚       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                         β”‚
β”‚       β”‚ fill β”‚ data β”‚           β”‚  0  β”‚ data β”‚   fill_value_default    β”‚
β”‚       β”‚ ─────┼───── β”‚           β”‚ ────┼───── β”‚   fills out-of-bounds   β”‚
β”‚       β”‚ fill β”‚ data β”‚           β”‚  0  β”‚ data β”‚   pixels                β”‚
β”‚       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                         β”‚
β”‚            ↑                                                             β”‚
β”‚     Request extends                                                      β”‚
β”‚     beyond raster bounds                                                 β”‚
β”‚                                                                          β”‚
β”‚  boundless=False: Raises error or clips to valid region                 β”‚
β”‚  boundless=True:  Pads with fill_value_default (default behavior)       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Module Functions Overview

Window Creation
  • :func:window_from_polygon: Polygon geometry β†’ pixel window
  • :func:window_from_bounds: Geographic bounds β†’ pixel window
  • :func:window_from_center_coords: Center point + shape β†’ pixel window
  • :func:window_from_tile: Web mercator tile (x,y,z) β†’ pixel window
Reading Functions
  • :func:read_from_window: Read using pixel window
  • :func:read_from_polygon: Read area within polygon
  • :func:read_from_bounds: Read area within bounds
  • :func:read_from_center_coords: Read centered on point
  • :func:read_from_tile: Read web mercator tile
Reprojection
  • :func:read_reproject: Read with CRS transformation
  • :func:read_reproject_like: Match another raster's grid
  • :func:read_to_crs: Simple CRS conversion
  • :func:resize: Change resolution with anti-aliasing

Quick Start

Read a region by polygon::

from georeader import read
from shapely.geometry import box

# Define area of interest in WGS84
aoi = box(-122.5, 37.5, -122.0, 38.0)

# Read from raster (auto-transforms polygon to raster CRS)
gt = read.read_from_polygon(reader, aoi, crs_polygon="EPSG:4326")

Read and reproject to match another raster::

# Make data_in match data_like's grid exactly
gt_aligned = read.read_reproject_like(data_in, data_like)

Read a web map tile::

# Read tile at zoom 15, coordinates (x=5242, y=12661)
gt_tile = read.read_from_tile(reader, x=5242, y=12661, z=15)

See Also

georeader.geotensor : GeoTensor class returned by read functions georeader.window_utils : Lower-level window manipulation utilities georeader.rasterio_reader : RasterioReader for lazy file access

References

  • Rasterio windowed reading: https://rasterio.readthedocs.io/en/latest/topics/windowed-rw.html
  • Rasterio reprojection: https://rasterio.readthedocs.io/en/latest/topics/reproject.html
  • Web Mercator tiles: https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames

read_from_center_coords(data_in, center_coords, shape, crs_center_coords=None, return_only_data=False, trigger_load=False, boundless=True)

Extract a rectangular chip from raster data centered on geographic coordinates.

This function combines window calculation and data reading in one step. It's particularly useful for creating training chips for machine learning, extracting regions around points of interest, or generating thumbnails centered on specific locations.

Parameters:

Name Type Description Default
data_in GeoData

Input raster data with spatial reference (crs, transform). Must implement the GeoData protocol.

required
center_coords Tuple[float, float]

Center point as (x, y) in geographic coordinates. For WGS84, this would be (longitude, latitude). For projected CRS, (easting, northing).

required
shape Tuple[int, int]

Desired output size as (height, width) in pixels. The chip will have exactly this shape if boundless=True.

required
crs_center_coords Optional[Any]

Coordinate reference system of center_coords. If None, assumes coords are in the same CRS as data_in. Can be EPSG code string, CRS object, or WKT. Defaults to None.

None
return_only_data bool

If True, returns numpy array without georeferencing. If False, returns GeoData object with spatial metadata. Defaults to False.

False
trigger_load bool

If True, forces loading data into memory (for lazy readers). Defaults to False.

False
boundless bool

If True, output always matches shape, padding with fill_value_default for out-of-bounds areas. If False, clips to actual data extent. Defaults to True.

True

Returns:

Type Description
Union[GeoData, ndarray]

Union[GeoData, np.ndarray]: - If return_only_data=True: numpy array with shape (bands, height, width) or (height, width) - If return_only_data=False: GeoData object with transform adjusted to chip location

Examples:

>>> import rasterio
>>> from georeader import RasterioReader
>>>
>>> # Extract 512x512 chip centered on a location
>>> with rasterio.open('sentinel2.tif') as src:
...     reader = RasterioReader(src)
...     center = (-3.7038, 40.4168)  # Madrid (lon, lat)
...     chip = read_from_center_coords(reader, center, (512, 512),
...                                     crs_center_coords='EPSG:4326')
...     print(chip.shape)  # (bands, 512, 512)
...     print(chip.bounds)  # Geographic bounds of the chip
>>> # Get just the numpy array without georeference
>>> data_array = read_from_center_coords(reader, center, (256, 256),
...                                       crs_center_coords='EPSG:4326',
...                                       return_only_data=True)
>>> # Extract chip with different aspect ratio
>>> chip_rect = read_from_center_coords(reader, center, (256, 512))  # height=256, width=512
Note
  • The center coordinate refers to the geographic center, which maps to the pixel at (height/2, width/2) in the output chip.
  • For chips near image boundaries, boundless=True pads with fill_value_default.
  • The output transform is adjusted so the chip maintains correct georeferencing.
Source code in georeader/read.py
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
def read_from_center_coords(
    data_in: GeoData,
    center_coords: Tuple[float, float],
    shape: Tuple[int, int],
    crs_center_coords: Optional[Any] = None,
    return_only_data: bool = False,
    trigger_load: bool = False,
    boundless: bool = True,
) -> Union[GeoData, np.ndarray]:
    """
    Extract a rectangular chip from raster data centered on geographic coordinates.

    This function combines window calculation and data reading in one step. It's particularly
    useful for creating training chips for machine learning, extracting regions around points
    of interest, or generating thumbnails centered on specific locations.

    Args:
        data_in (GeoData): Input raster data with spatial reference (crs, transform).
            Must implement the GeoData protocol.
        center_coords (Tuple[float, float]): Center point as (x, y) in geographic coordinates.
            For WGS84, this would be (longitude, latitude). For projected CRS, (easting, northing).
        shape (Tuple[int, int]): Desired output size as (height, width) in pixels.
            The chip will have exactly this shape if boundless=True.
        crs_center_coords (Optional[Any], optional): Coordinate reference system of center_coords.
            If None, assumes coords are in the same CRS as `data_in`. Can be EPSG code string,
            CRS object, or WKT. Defaults to None.
        return_only_data (bool, optional): If True, returns numpy array without georeferencing.
            If False, returns GeoData object with spatial metadata. Defaults to False.
        trigger_load (bool, optional): If True, forces loading data into memory (for lazy readers).
            Defaults to False.
        boundless (bool, optional): If True, output always matches shape, padding with
            fill_value_default for out-of-bounds areas. If False, clips to actual data extent.
            Defaults to True.

    Returns:
        Union[GeoData, np.ndarray]:
            - If return_only_data=True: numpy array with shape (bands, height, width) or (height, width)
            - If return_only_data=False: GeoData object with transform adjusted to chip location

    Examples:
        >>> import rasterio
        >>> from georeader import RasterioReader
        >>>
        >>> # Extract 512x512 chip centered on a location
        >>> with rasterio.open('sentinel2.tif') as src:
        ...     reader = RasterioReader(src)
        ...     center = (-3.7038, 40.4168)  # Madrid (lon, lat)
        ...     chip = read_from_center_coords(reader, center, (512, 512),
        ...                                     crs_center_coords='EPSG:4326')
        ...     print(chip.shape)  # (bands, 512, 512)
        ...     print(chip.bounds)  # Geographic bounds of the chip

        >>> # Get just the numpy array without georeference
        >>> data_array = read_from_center_coords(reader, center, (256, 256),
        ...                                       crs_center_coords='EPSG:4326',
        ...                                       return_only_data=True)

        >>> # Extract chip with different aspect ratio
        >>> chip_rect = read_from_center_coords(reader, center, (256, 512))  # height=256, width=512

    Note:
        - The center coordinate refers to the geographic center, which maps to the pixel at
          (height/2, width/2) in the output chip.
        - For chips near image boundaries, boundless=True pads with fill_value_default.
        - The output transform is adjusted so the chip maintains correct georeferencing.
    """
    # Calculate the window that encompasses the desired chip area
    window = window_from_center_coords(data_in, center_coords, shape, crs_center_coords)

    # Read data from the calculated window
    return read_from_window(
        data_in, window=window, return_only_data=return_only_data, trigger_load=trigger_load, boundless=boundless
    )

read_from_bounds(data_in, bounds, crs_bounds=None, pad_add=(0, 0), return_only_data=False, trigger_load=False, boundless=True)

Extract raster data within a geographic bounding box, with optional CRS transformation.

This function is the primary interface for reading raster data by geographic extent. It's particularly useful for: - Extracting specific geographic regions from large rasters - Reading data in a different CRS than the source (e.g., WGS84 bounds from UTM raster) - Creating training chips for machine learning with consistent geographic extents - Subsetting satellite imagery to areas of interest - Co-registration workflows requiring precise spatial alignment

The function handles the complete workflow: converts bounds to pixel window, optionally adds padding (useful for edge-aware processing like CNNs or interpolation), and returns the data with correct georeferencing. When bounds are in a different CRS, it automatically transforms them to match the raster's coordinate system.

Algorithm: 1. Transform bounds from crs_bounds to data_in.crs (if needed) 2. Calculate pixel window corresponding to geographic bounds 3. Add padding to window if requested (for algorithms needing context) 4. Round window to integer pixel coordinates (ceil for outer bounds) 5. Read data using read_from_window with boundless support

Parameters:

Name Type Description Default
data_in GeoData

Input georeferenced data with spatial reference (crs, transform). Must implement the GeoData protocol with "x" and "y" dimensions.

required
bounds Tuple[float, float, float, float]

Geographic bounding box as (left, bottom, right, top) or (xmin, ymin, xmax, ymax) in the CRS specified by crs_bounds. For WGS84, this would be (lon_min, lat_min, lon_max, lat_max). For UTM, (easting_min, northing_min, easting_max, northing_max).

required
crs_bounds Optional[str]

Coordinate reference system of the bounds. If None, assumes bounds are in the same CRS as data_in. Common formats: "EPSG:4326" (WGS84), "EPSG:32630" (UTM Zone 30N), CRS object, or WKT string. Defaults to None.

None
pad_add Tuple[int, int]

Additional padding in pixels to add around the bounding box as (pad_y, pad_x). Useful for: - CNN inference needing receptive field context - Interpolation algorithms requiring neighboring pixels - Co-registration workflows with geometric transformations - Edge-aware image processing Format: (rows_padding, cols_padding). Defaults to (0, 0).

(0, 0)
return_only_data bool

If True, returns numpy array without georeferencing. If False, returns GeoData object with spatial metadata (transform, crs). Defaults to False.

False
trigger_load bool

If True, forces loading data into memory (for lazy readers like xarray or dask-backed arrays). Defaults to False.

False
boundless bool

If True, output always matches window shape, padding with fill_value_default for out-of-bounds areas. If False, clips to actual data extent. Defaults to True.

True

Returns:

Type Description
Union[GeoData, ndarray]

Union[GeoData, np.ndarray]: - If return_only_data=False: GeoData object with transform adjusted to the bounds and shape matching the geographic extent (plus padding if specified) - If return_only_data=True: numpy array with shape (bands, height, width) or (height, width) depending on input dimensions

Examples:

>>> import rasterio
>>> from georeader import RasterioReader
>>>
>>> # Example 1: Read a 1km x 1km area from UTM raster
>>> with rasterio.open('sentinel2_utm.tif') as src:
...     reader = RasterioReader(src)
...     # Bounds in UTM Zone 30N (meters)
...     bounds_utm = (500000, 4649000, 501000, 4650000)  # 1km x 1km square
...     data = read_from_bounds(reader, bounds_utm)
...     print(f"Shape: {data.shape}")  # e.g., (13, 100, 100) at 10m resolution
...     print(f"Bounds: {data.bounds}")  # Should match requested bounds
>>> # Example 2: Read with CRS transformation (WGS84 β†’ UTM)
>>> with rasterio.open('landsat_utm.tif') as src:  # UTM Zone 33N raster
...     reader = RasterioReader(src)
...     # Specify bounds in WGS84 (degrees)
...     bounds_wgs84 = (13.37, 52.51, 13.38, 52.52)  # Small area in Berlin
...     data = read_from_bounds(reader, bounds_wgs84, crs_bounds='EPSG:4326')
...     print(f"CRS: {data.crs}")  # Still UTM (no reprojection, just subsetting)
>>> # Example 3: Read with padding for CNN inference
>>> # Padding ensures the CNN has context at edges
>>> bounds = (-3.71, 40.41, -3.69, 40.42)  # Madrid area in WGS84
>>> data_padded = read_from_bounds(reader, bounds,
...                                 crs_bounds='EPSG:4326',
...                                 pad_add=(16, 16))  # 16-pixel padding
...     # Output will be larger than actual bounds to include context
>>> # Example 4: Extract training chips at consistent locations
>>> # For machine learning, we often need chips at specific coordinates
>>> training_areas = [
...     (-122.5, 37.7, -122.4, 37.8),   # San Francisco
...     (-118.3, 34.0, -118.2, 34.1),   # Los Angeles
...     (-73.9, 40.7, -73.8, 40.8),     # New York
... ]
>>> chips = []
>>> for bounds_wgs in training_areas:
...     chip = read_from_bounds(reader, bounds_wgs,
...                            crs_bounds='EPSG:4326',
...                            return_only_data=True)
...     chips.append(chip)
>>> # All chips now have consistent geographic extent for training
>>> # Example 5: Clip to actual extent (no padding)
>>> # Useful when you don't want data outside raster boundaries
>>> bounds_large = (-10, 30, 10, 50)  # Large area, may exceed raster
>>> data_clipped = read_from_bounds(reader, bounds_large,
...                                  crs_bounds='EPSG:4326',
...                                  boundless=False)
>>> # Output only contains pixels within both bounds AND raster extent
Note
  • Window coordinates are rounded outward (ceil) to ensure complete coverage
  • The output transform is adjusted to match the actual pixel boundaries
  • Padding is added in pixel space after CRS transformation
  • For interpolation/resampling needing edge context, use pad_add=(3, 3) minimum
  • Boundless reading uses fill_value_default from data_in for out-of-bounds pixels
Source code in georeader/read.py
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
def read_from_bounds(
    data_in: GeoData,
    bounds: Tuple[float, float, float, float],
    crs_bounds: Optional[str] = None,
    pad_add: Tuple[int, int] = (0, 0),
    return_only_data: bool = False,
    trigger_load: bool = False,
    boundless: bool = True,
) -> Union[GeoData, np.ndarray]:
    """
    Extract raster data within a geographic bounding box, with optional CRS transformation.

    This function is the primary interface for reading raster data by geographic extent. It's
    particularly useful for:
    - Extracting specific geographic regions from large rasters
    - Reading data in a different CRS than the source (e.g., WGS84 bounds from UTM raster)
    - Creating training chips for machine learning with consistent geographic extents
    - Subsetting satellite imagery to areas of interest
    - Co-registration workflows requiring precise spatial alignment

    The function handles the complete workflow: converts bounds to pixel window, optionally
    adds padding (useful for edge-aware processing like CNNs or interpolation), and returns
    the data with correct georeferencing. When bounds are in a different CRS, it automatically
    transforms them to match the raster's coordinate system.

    Algorithm:
    1. Transform bounds from crs_bounds to data_in.crs (if needed)
    2. Calculate pixel window corresponding to geographic bounds
    3. Add padding to window if requested (for algorithms needing context)
    4. Round window to integer pixel coordinates (ceil for outer bounds)
    5. Read data using read_from_window with boundless support

    Args:
        data_in (GeoData): Input georeferenced data with spatial reference (crs, transform).
            Must implement the GeoData protocol with "x" and "y" dimensions.
        bounds (Tuple[float, float, float, float]): Geographic bounding box as
            (left, bottom, right, top) or (xmin, ymin, xmax, ymax) in the CRS specified
            by crs_bounds. For WGS84, this would be (lon_min, lat_min, lon_max, lat_max).
            For UTM, (easting_min, northing_min, easting_max, northing_max).
        crs_bounds (Optional[str], optional): Coordinate reference system of the bounds.
            If None, assumes bounds are in the same CRS as data_in. Common formats:
            "EPSG:4326" (WGS84), "EPSG:32630" (UTM Zone 30N), CRS object, or WKT string.
            Defaults to None.
        pad_add (Tuple[int, int], optional): Additional padding in pixels to add around
            the bounding box as (pad_y, pad_x). Useful for:
            - CNN inference needing receptive field context
            - Interpolation algorithms requiring neighboring pixels
            - Co-registration workflows with geometric transformations
            - Edge-aware image processing
            Format: (rows_padding, cols_padding). Defaults to (0, 0).
        return_only_data (bool, optional): If True, returns numpy array without georeferencing.
            If False, returns GeoData object with spatial metadata (transform, crs).
            Defaults to False.
        trigger_load (bool, optional): If True, forces loading data into memory (for lazy readers
            like xarray or dask-backed arrays). Defaults to False.
        boundless (bool, optional): If True, output always matches window shape, padding with
            fill_value_default for out-of-bounds areas. If False, clips to actual data extent.
            Defaults to True.

    Returns:
        Union[GeoData, np.ndarray]:
            - If return_only_data=False: GeoData object with transform adjusted to the bounds
              and shape matching the geographic extent (plus padding if specified)
            - If return_only_data=True: numpy array with shape (bands, height, width) or
              (height, width) depending on input dimensions

    Examples:
        >>> import rasterio
        >>> from georeader import RasterioReader
        >>>
        >>> # Example 1: Read a 1km x 1km area from UTM raster
        >>> with rasterio.open('sentinel2_utm.tif') as src:
        ...     reader = RasterioReader(src)
        ...     # Bounds in UTM Zone 30N (meters)
        ...     bounds_utm = (500000, 4649000, 501000, 4650000)  # 1km x 1km square
        ...     data = read_from_bounds(reader, bounds_utm)
        ...     print(f"Shape: {data.shape}")  # e.g., (13, 100, 100) at 10m resolution
        ...     print(f"Bounds: {data.bounds}")  # Should match requested bounds

        >>> # Example 2: Read with CRS transformation (WGS84 β†’ UTM)
        >>> with rasterio.open('landsat_utm.tif') as src:  # UTM Zone 33N raster
        ...     reader = RasterioReader(src)
        ...     # Specify bounds in WGS84 (degrees)
        ...     bounds_wgs84 = (13.37, 52.51, 13.38, 52.52)  # Small area in Berlin
        ...     data = read_from_bounds(reader, bounds_wgs84, crs_bounds='EPSG:4326')
        ...     print(f"CRS: {data.crs}")  # Still UTM (no reprojection, just subsetting)

        >>> # Example 3: Read with padding for CNN inference
        >>> # Padding ensures the CNN has context at edges
        >>> bounds = (-3.71, 40.41, -3.69, 40.42)  # Madrid area in WGS84
        >>> data_padded = read_from_bounds(reader, bounds,
        ...                                 crs_bounds='EPSG:4326',
        ...                                 pad_add=(16, 16))  # 16-pixel padding
        ...     # Output will be larger than actual bounds to include context

        >>> # Example 4: Extract training chips at consistent locations
        >>> # For machine learning, we often need chips at specific coordinates
        >>> training_areas = [
        ...     (-122.5, 37.7, -122.4, 37.8),   # San Francisco
        ...     (-118.3, 34.0, -118.2, 34.1),   # Los Angeles
        ...     (-73.9, 40.7, -73.8, 40.8),     # New York
        ... ]
        >>> chips = []
        >>> for bounds_wgs in training_areas:
        ...     chip = read_from_bounds(reader, bounds_wgs,
        ...                            crs_bounds='EPSG:4326',
        ...                            return_only_data=True)
        ...     chips.append(chip)
        >>> # All chips now have consistent geographic extent for training

        >>> # Example 5: Clip to actual extent (no padding)
        >>> # Useful when you don't want data outside raster boundaries
        >>> bounds_large = (-10, 30, 10, 50)  # Large area, may exceed raster
        >>> data_clipped = read_from_bounds(reader, bounds_large,
        ...                                  crs_bounds='EPSG:4326',
        ...                                  boundless=False)
        >>> # Output only contains pixels within both bounds AND raster extent

    Note:
        - Window coordinates are rounded outward (ceil) to ensure complete coverage
        - The output transform is adjusted to match the actual pixel boundaries
        - Padding is added in pixel space after CRS transformation
        - For interpolation/resampling needing edge context, use pad_add=(3, 3) minimum
        - Boundless reading uses fill_value_default from data_in for out-of-bounds pixels
    """
    window_in = window_from_bounds(data_in, bounds, crs_bounds)
    if any(p > 0 for p in pad_add):
        window_in = pad_window(window_in, pad_add)  # Add padding for bicubic int or for co-registration
    window_in = round_outer_window(window_in)

    return read_from_window(
        data_in, window_in, return_only_data=return_only_data, trigger_load=trigger_load, boundless=boundless
    )

read_from_polygon(data_in, polygon, crs_polygon=None, pad_add=(0, 0), return_only_data=False, trigger_load=False, boundless=True, window_surrounding=False)

Extract raster data within a polygon boundary, supporting complex shapes and masking.

This function reads the minimum bounding rectangle containing a polygon, making it ideal for: - Extracting irregular-shaped regions (e.g., administrative boundaries, watersheds) - Processing data within specific land parcels or management zones - Creating masks for pixel-wise operations within complex geometries - Reducing memory footprint by reading only the extent containing features of interest - Multi-polygon support for disconnected regions

The function calculates the pixel window that encompasses all polygon vertices, reads that rectangular region, and preserves georeferencing for downstream processing. For actual polygon masking (setting pixels outside polygon to nodata), combine this with rasterio.features.geometry_mask.

Common workflow: 1. Read data within polygon bounds β†’ get rectangular chip 2. Create geometry mask β†’ binary array (True inside polygon) 3. Apply mask β†’ set pixels outside polygon to nodata or 0

Algorithm: 1. Transform polygon vertices from crs_polygon to data_in.crs (if needed) 2. Find minimum bounding rectangle in pixel coordinates 3. Optionally add 1-pixel buffer (window_surrounding=True) for complete coverage 4. Add user-specified padding (pad_add) for processing context 5. Round window and read data using read_from_window

Parameters:

Name Type Description Default
data_in GeoData

Input georeferenced data with spatial reference (crs, transform). Must implement the GeoData protocol with "x" and "y" dimensions.

required
polygon Union[Polygon, MultiPolygon]

Shapely geometry defining the area of interest. Can be a simple Polygon for single region or MultiPolygon for disconnected areas. Polygon vertices define the boundary; function reads the bounding rectangle.

required
crs_polygon Optional[str]

Coordinate reference system of the polygon. If None, assumes polygon is in the same CRS as data_in. Common formats: "EPSG:4326" (WGS84), "EPSG:32630" (UTM), CRS object, or WKT string. Defaults to None.

None
pad_add Tuple[int, int]

Additional padding in pixels as (pad_y, pad_x). Useful for: - Ensuring complete polygon coverage at edges - CNN inference needing receptive field context - Interpolation requiring neighboring pixels - Edge-aware processing algorithms Format: (rows_padding, cols_padding). Defaults to (0, 0).

(0, 0)
return_only_data bool

If True, returns numpy array without georeferencing. If False, returns GeoData object with spatial metadata. Defaults to False.

False
trigger_load bool

If True, forces loading data into memory (for lazy readers). Defaults to False.

False
boundless bool

If True, output matches window shape, padding with fill_value_default for out-of-bounds areas. If False, clips to actual data extent. Defaults to True.

True
window_surrounding bool

If True, adds 1-pixel buffer around polygon to ensure complete surrounding coverage (window edges won't align with vertices). Useful when polygon vertices align exactly with pixel boundaries and you need complete coverage. Defaults to False.

False

Returns:

Type Description
Union[GeoData, ndarray]

Union[GeoData, np.ndarray]: - If return_only_data=False: GeoData object with transform adjusted to the minimum bounding rectangle and shape matching polygon extent (plus padding) - If return_only_data=True: numpy array with shape (bands, height, width) or (height, width) depending on input dimensions

Examples:

>>> from shapely.geometry import Polygon, box
>>> import rasterio
>>> import rasterio.features
>>> from georeader import RasterioReader, read
>>>
>>> # Example 1: Read rectangular region using polygon
>>> polygon = box(-3.71, 40.41, -3.69, 40.42)  # Madrid area (WGS84)
>>> with rasterio.open('sentinel2.tif') as src:
...     reader = RasterioReader(src)
...     data = read.read_from_polygon(reader, polygon, crs_polygon='EPSG:4326')
...     print(f"Shape: {data.shape}")  # (13, H, W) - minimum rect containing polygon
...     print(f"Bounds: {data.bounds}")
>>> # Example 2: Read irregular polygon with masking workflow
>>> # Step 1: Define irregular polygon (e.g., agricultural field)
>>> field_boundary = Polygon([
...     (-3.7050, 40.4150), (-3.7030, 40.4150),
...     (-3.7030, 40.4170), (-3.7040, 40.4180),
...     (-3.7050, 40.4170), (-3.7050, 40.4150)
... ])
>>> # Step 2: Read bounding rectangle
>>> data = read.read_from_polygon(reader, field_boundary, crs_polygon='EPSG:4326')
>>> # Step 3: Create mask (True inside polygon, False outside)
>>> from rasterio.features import geometry_mask
>>> mask = geometry_mask(
...     [field_boundary],
...     transform=data.transform,
...     invert=True,  # True inside polygon
...     out_shape=data.shape[-2:]
... )
>>> # Step 4: Apply mask
>>> data.values[:, ~mask] = data.fill_value_default  # Mask outside polygon
>>> # Now data only contains pixels within field_boundary
>>> # Example 3: Multi-polygon (disconnected regions)
>>> from shapely.geometry import MultiPolygon
>>> # Read multiple farms in one operation
>>> farm1 = box(-3.71, 40.41, -3.70, 40.42)
>>> farm2 = box(-3.68, 40.41, -3.67, 40.42)
>>> farms = MultiPolygon([farm1, farm2])
>>> data = read.read_from_polygon(reader, farms, crs_polygon='EPSG:4326')
>>> # Returns bounding rectangle containing all polygons
>>> # Example 4: Read with padding for CNN inference
>>> # Polygon defines ROI, padding provides context
>>> roi = Polygon([(-3.70, 40.41), (-3.69, 40.41), (-3.69, 40.42), (-3.70, 40.42)])
>>> data_padded = read.read_from_polygon(
...     reader, roi,
...     crs_polygon='EPSG:4326',
...     pad_add=(32, 32),  # 32-pixel padding for CNN receptive field
...     window_surrounding=True  # Ensure complete coverage
... )
>>> # Example 5: Memory-efficient masking for large areas
>>> # Read only the extent containing the polygon, not entire raster
>>> watershed = Polygon([...])  # Complex watershed boundary
>>> # This reads only the bounding box, not the full raster
>>> data = read.read_from_polygon(reader, watershed, crs_polygon='EPSG:4326')
>>> print(f"Read shape: {data.shape}")  # Much smaller than full raster
>>> # Apply masking as in Example 2
>>> mask = geometry_mask([watershed], transform=data.transform,
...                      invert=True, out_shape=data.shape[-2:])
>>> # Example 6: Time series analysis within boundary
>>> # Multi-temporal stack reading same spatial extent
>>> boundary = box(-3.71, 40.41, -3.69, 40.42)
>>> time_series_data = []
>>> for date, raster_path in date_raster_pairs:
...     with rasterio.open(raster_path) as src:
...         reader = RasterioReader(src)
...         data = read.read_from_polygon(reader, boundary,
...                                       crs_polygon='EPSG:4326')
...         time_series_data.append(data)
>>> # All chips have consistent extent for temporal analysis
Note
  • Function reads the minimum bounding RECTANGLE, not the exact polygon shape
  • For actual polygon masking, use rasterio.features.geometry_mask after reading
  • Window coordinates are rounded outward to ensure complete polygon coverage
  • MultiPolygon returns single rectangular chip containing all disconnected parts
  • window_surrounding=True adds 1-pixel buffer for edge cases
  • Padding is applied in pixel space after CRS transformation
Source code in georeader/read.py
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
def read_from_polygon(
    data_in: GeoData,
    polygon: Union[Polygon, MultiPolygon],
    crs_polygon: Optional[str] = None,
    pad_add: Tuple[int, int] = (0, 0),
    return_only_data: bool = False,
    trigger_load: bool = False,
    boundless: bool = True,
    window_surrounding: bool = False,
) -> Union[GeoData, np.ndarray]:
    """
    Extract raster data within a polygon boundary, supporting complex shapes and masking.

    This function reads the minimum bounding rectangle containing a polygon, making it ideal for:
    - Extracting irregular-shaped regions (e.g., administrative boundaries, watersheds)
    - Processing data within specific land parcels or management zones
    - Creating masks for pixel-wise operations within complex geometries
    - Reducing memory footprint by reading only the extent containing features of interest
    - Multi-polygon support for disconnected regions

    The function calculates the pixel window that encompasses all polygon vertices, reads that
    rectangular region, and preserves georeferencing for downstream processing. For actual
    polygon masking (setting pixels outside polygon to nodata), combine this with
    `rasterio.features.geometry_mask`.

    Common workflow:
    1. Read data within polygon bounds β†’ get rectangular chip
    2. Create geometry mask β†’ binary array (True inside polygon)
    3. Apply mask β†’ set pixels outside polygon to nodata or 0

    Algorithm:
    1. Transform polygon vertices from crs_polygon to data_in.crs (if needed)
    2. Find minimum bounding rectangle in pixel coordinates
    3. Optionally add 1-pixel buffer (window_surrounding=True) for complete coverage
    4. Add user-specified padding (pad_add) for processing context
    5. Round window and read data using read_from_window

    Args:
        data_in (GeoData): Input georeferenced data with spatial reference (crs, transform).
            Must implement the GeoData protocol with "x" and "y" dimensions.
        polygon (Union[Polygon, MultiPolygon]): Shapely geometry defining the area of interest.
            Can be a simple Polygon for single region or MultiPolygon for disconnected areas.
            Polygon vertices define the boundary; function reads the bounding rectangle.
        crs_polygon (Optional[str], optional): Coordinate reference system of the polygon.
            If None, assumes polygon is in the same CRS as data_in. Common formats:
            "EPSG:4326" (WGS84), "EPSG:32630" (UTM), CRS object, or WKT string.
            Defaults to None.
        pad_add (Tuple[int, int], optional): Additional padding in pixels as (pad_y, pad_x).
            Useful for:
            - Ensuring complete polygon coverage at edges
            - CNN inference needing receptive field context
            - Interpolation requiring neighboring pixels
            - Edge-aware processing algorithms
            Format: (rows_padding, cols_padding). Defaults to (0, 0).
        return_only_data (bool, optional): If True, returns numpy array without georeferencing.
            If False, returns GeoData object with spatial metadata. Defaults to False.
        trigger_load (bool, optional): If True, forces loading data into memory (for lazy readers).
            Defaults to False.
        boundless (bool, optional): If True, output matches window shape, padding with
            fill_value_default for out-of-bounds areas. If False, clips to actual data extent.
            Defaults to True.
        window_surrounding (bool, optional): If True, adds 1-pixel buffer around polygon
            to ensure complete surrounding coverage (window edges won't align with vertices).
            Useful when polygon vertices align exactly with pixel boundaries and you need
            complete coverage. Defaults to False.

    Returns:
        Union[GeoData, np.ndarray]:
            - If return_only_data=False: GeoData object with transform adjusted to the
              minimum bounding rectangle and shape matching polygon extent (plus padding)
            - If return_only_data=True: numpy array with shape (bands, height, width) or
              (height, width) depending on input dimensions

    Examples:
        >>> from shapely.geometry import Polygon, box
        >>> import rasterio
        >>> import rasterio.features
        >>> from georeader import RasterioReader, read
        >>>
        >>> # Example 1: Read rectangular region using polygon
        >>> polygon = box(-3.71, 40.41, -3.69, 40.42)  # Madrid area (WGS84)
        >>> with rasterio.open('sentinel2.tif') as src:
        ...     reader = RasterioReader(src)
        ...     data = read.read_from_polygon(reader, polygon, crs_polygon='EPSG:4326')
        ...     print(f"Shape: {data.shape}")  # (13, H, W) - minimum rect containing polygon
        ...     print(f"Bounds: {data.bounds}")

        >>> # Example 2: Read irregular polygon with masking workflow
        >>> # Step 1: Define irregular polygon (e.g., agricultural field)
        >>> field_boundary = Polygon([
        ...     (-3.7050, 40.4150), (-3.7030, 40.4150),
        ...     (-3.7030, 40.4170), (-3.7040, 40.4180),
        ...     (-3.7050, 40.4170), (-3.7050, 40.4150)
        ... ])
        >>> # Step 2: Read bounding rectangle
        >>> data = read.read_from_polygon(reader, field_boundary, crs_polygon='EPSG:4326')
        >>> # Step 3: Create mask (True inside polygon, False outside)
        >>> from rasterio.features import geometry_mask
        >>> mask = geometry_mask(
        ...     [field_boundary],
        ...     transform=data.transform,
        ...     invert=True,  # True inside polygon
        ...     out_shape=data.shape[-2:]
        ... )
        >>> # Step 4: Apply mask
        >>> data.values[:, ~mask] = data.fill_value_default  # Mask outside polygon
        >>> # Now data only contains pixels within field_boundary

        >>> # Example 3: Multi-polygon (disconnected regions)
        >>> from shapely.geometry import MultiPolygon
        >>> # Read multiple farms in one operation
        >>> farm1 = box(-3.71, 40.41, -3.70, 40.42)
        >>> farm2 = box(-3.68, 40.41, -3.67, 40.42)
        >>> farms = MultiPolygon([farm1, farm2])
        >>> data = read.read_from_polygon(reader, farms, crs_polygon='EPSG:4326')
        >>> # Returns bounding rectangle containing all polygons

        >>> # Example 4: Read with padding for CNN inference
        >>> # Polygon defines ROI, padding provides context
        >>> roi = Polygon([(-3.70, 40.41), (-3.69, 40.41), (-3.69, 40.42), (-3.70, 40.42)])
        >>> data_padded = read.read_from_polygon(
        ...     reader, roi,
        ...     crs_polygon='EPSG:4326',
        ...     pad_add=(32, 32),  # 32-pixel padding for CNN receptive field
        ...     window_surrounding=True  # Ensure complete coverage
        ... )

        >>> # Example 5: Memory-efficient masking for large areas
        >>> # Read only the extent containing the polygon, not entire raster
        >>> watershed = Polygon([...])  # Complex watershed boundary
        >>> # This reads only the bounding box, not the full raster
        >>> data = read.read_from_polygon(reader, watershed, crs_polygon='EPSG:4326')
        >>> print(f"Read shape: {data.shape}")  # Much smaller than full raster
        >>> # Apply masking as in Example 2
        >>> mask = geometry_mask([watershed], transform=data.transform,
        ...                      invert=True, out_shape=data.shape[-2:])

        >>> # Example 6: Time series analysis within boundary
        >>> # Multi-temporal stack reading same spatial extent
        >>> boundary = box(-3.71, 40.41, -3.69, 40.42)
        >>> time_series_data = []
        >>> for date, raster_path in date_raster_pairs:
        ...     with rasterio.open(raster_path) as src:
        ...         reader = RasterioReader(src)
        ...         data = read.read_from_polygon(reader, boundary,
        ...                                       crs_polygon='EPSG:4326')
        ...         time_series_data.append(data)
        >>> # All chips have consistent extent for temporal analysis

    Note:
        - Function reads the minimum bounding RECTANGLE, not the exact polygon shape
        - For actual polygon masking, use rasterio.features.geometry_mask after reading
        - Window coordinates are rounded outward to ensure complete polygon coverage
        - MultiPolygon returns single rectangular chip containing all disconnected parts
        - window_surrounding=True adds 1-pixel buffer for edge cases
        - Padding is applied in pixel space after CRS transformation
    """
    window_in = window_from_polygon(data_in, polygon, crs_polygon, window_surrounding=window_surrounding)
    if any(p > 0 for p in pad_add):
        window_in = pad_window(window_in, pad_add)  # Add padding for bicubic int or for co-registration
    window_in = round_outer_window(window_in)

    return read_from_window(
        data_in, window_in, return_only_data=return_only_data, trigger_load=trigger_load, boundless=boundless
    )

read_from_window(data_in, window, return_only_data=False, trigger_load=False, boundless=True)

Read raster data from a specified window, with optional padding for out-of-bounds areas.

This function extracts data from a raster using pixel window coordinates. When the window extends beyond raster boundaries, it can pad with fill values (boundless=True) or return None/clipped data (boundless=False).

Parameters:

Name Type Description Default
data_in GeoData

Input raster data with spatial reference (crs, transform). Must implement the GeoData protocol with "x" and "y" dimensions.

required
window Window

Window defining the area to read in pixel coordinates. Format: Window(col_off, row_off, width, height).

required
return_only_data bool

If True, returns numpy array without georeferencing. If False, returns GeoData object with spatial metadata. Defaults to False.

False
trigger_load bool

If True, forces loading data into memory (for lazy readers). Defaults to False.

False
boundless bool

If True, output always matches window shape, padding with fill_value_default for out-of-bounds areas. If False, only reads intersecting area or returns None. Defaults to True.

True

Returns:

Type Description
Union[GeoData, ndarray, None]

Union[GeoData, np.ndarray, None]: - If return_only_data=True: numpy array with shape matching the data dimensions - If return_only_data=False: GeoData object with updated transform for the window - If boundless=False and no intersection: None

Examples:

>>> import rasterio
>>> from georeader import GeoTensor
>>> # Read a 256x256 window starting at pixel (100, 200)
>>> window = rasterio.windows.Window(col_off=100, row_off=200, width=256, height=256)
>>> with rasterio.open('image.tif') as src:
...     data = GeoTensor.load_from_window(src, window)
...     result = read_from_window(data, window)
...     print(result.shape)  # Shape: (bands, 256, 256)
>>> # Read without padding (only intersecting area)
>>> window_large = rasterio.windows.Window(0, 0, 10000, 10000)  # Beyond bounds
>>> result = read_from_window(data, window_large, boundless=False)
>>> # result will be clipped to actual data extent
Note

The output transform is adjusted to correspond to the window's geographic location. For windows partially outside bounds, boundless=True pads with fill_value_default.

Source code in georeader/read.py
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
def read_from_window(
    data_in: GeoData,
    window: rasterio.windows.Window,
    return_only_data: bool = False,
    trigger_load: bool = False,
    boundless: bool = True,
) -> Union[GeoData, np.ndarray, None]:
    """
    Read raster data from a specified window, with optional padding for out-of-bounds areas.

    This function extracts data from a raster using pixel window coordinates. When the window
    extends beyond raster boundaries, it can pad with fill values (boundless=True) or return
    None/clipped data (boundless=False).

    Args:
        data_in (GeoData): Input raster data with spatial reference (crs, transform).
            Must implement the GeoData protocol with "x" and "y" dimensions.
        window (rasterio.windows.Window): Window defining the area to read in pixel coordinates.
            Format: Window(col_off, row_off, width, height).
        return_only_data (bool, optional): If True, returns numpy array without georeferencing.
            If False, returns GeoData object with spatial metadata. Defaults to False.
        trigger_load (bool, optional): If True, forces loading data into memory (for lazy readers).
            Defaults to False.
        boundless (bool, optional): If True, output always matches window shape, padding with
            fill_value_default for out-of-bounds areas. If False, only reads intersecting area
            or returns None. Defaults to True.

    Returns:
        Union[GeoData, np.ndarray, None]:
            - If return_only_data=True: numpy array with shape matching the data dimensions
            - If return_only_data=False: GeoData object with updated transform for the window
            - If boundless=False and no intersection: None

    Examples:
        >>> import rasterio
        >>> from georeader import GeoTensor
        >>> # Read a 256x256 window starting at pixel (100, 200)
        >>> window = rasterio.windows.Window(col_off=100, row_off=200, width=256, height=256)
        >>> with rasterio.open('image.tif') as src:
        ...     data = GeoTensor.load_from_window(src, window)
        ...     result = read_from_window(data, window)
        ...     print(result.shape)  # Shape: (bands, 256, 256)

        >>> # Read without padding (only intersecting area)
        >>> window_large = rasterio.windows.Window(0, 0, 10000, 10000)  # Beyond bounds
        >>> result = read_from_window(data, window_large, boundless=False)
        >>> # result will be clipped to actual data extent

    Note:
        The output transform is adjusted to correspond to the window's geographic location.
        For windows partially outside bounds, boundless=True pads with fill_value_default.
    """
    # Create dict mapping dimension names to sizes: {'band': C, 'y': H, 'x': W}
    named_shape = OrderedDict(zip(data_in.dims, data_in.shape))

    # Define window covering entire data extent in pixel coordinates
    window_data = rasterio.windows.Window(col_off=0, row_off=0, width=named_shape["x"], height=named_shape["y"])

    # Get current transform for calculating new transform for the window
    transform = data_in.transform

    # Handle case where window doesn't intersect data at all
    if not rasterio.windows.intersect([window_data, window]):
        if not boundless:
            return None  # No data to return

        # Create array filled with fill_value_default matching window shape
        expected_shapes = {"x": window.width, "y": window.height}
        # Build shape tuple: (bands, window.height, window.width) or (window.height, window.width)
        shape = tuple([named_shape[s] if s not in ["x", "y"] else expected_shapes[s] for s in data_in.dims])
        data = np.zeros(shape, dtype=data_in.dtype)
        fill_value_default = getattr(data_in, "fill_value_default", 0)
        if fill_value_default != 0:
            data += fill_value_default
        if return_only_data:
            return data

        # Calculate transform for this window position
        return GeoTensor(
            data,
            crs=data_in.crs,
            transform=rasterio.windows.transform(window, transform=transform),
            fill_value_default=fill_value_default,
        )

    # Read data from window using the reader's method (handles padding automatically)
    data_sel = data_in.read_from_window(window=window, boundless=boundless)

    if return_only_data:
        return data_sel.values

    # Load into memory if requested (useful for lazy readers)
    if trigger_load:
        data_sel = data_sel.load()

    return data_sel

read_from_tile(data, x, y, z, dst_crs=WEB_MERCATOR_CRS, out_shape=(SIZE_DEFAULT, SIZE_DEFAULT), resolution_dst_crs=None, assert_if_not_intersects=False)

Read raster data corresponding to a Web Mercator (XYZ) tile for web mapping applications.

This function extracts and optionally reprojects raster data to match XYZ tile coordinates used by web mapping services (OpenStreetMap, Google Maps, Mapbox, etc.). It's the primary interface for: - Creating tile servers from arbitrary raster data - Building custom web map overlays from satellite imagery - Generating tiles for Leaflet, OpenLayers, or Mapbox GL JS - Creating tile caches for faster web mapping performance - Converting between different tile schemas and CRS

XYZ tiles follow the Slippy Map / TMS convention where: - The world is divided into 2^z Γ— 2^z tiles at zoom level z - Tile (0, 0) is at the top-left (northwest corner) - x increases eastward (0 to 2^z - 1) - y increases southward (0 to 2^z - 1) - Each tile represents the same geographic area at different resolutions

The function handles the complete tile workflow: 1. Calculate tile bounds in Web Mercator (EPSG:3857) 2. Check if tile intersects the raster footprint 3. Extract data with optional reprojection to destination CRS 4. Resize/resample to standard tile dimensions (typically 256Γ—256)

Algorithm: 1. Convert (x, y, z) to geographic bounds using mercantile 2. Check intersection with data footprint (skip if no overlap) 3. If reader has read_from_tile method, delegate to it (optimized path) 4. Otherwise, read polygon extent and reproject/resize as needed 5. Return tile with correct georeferencing in dst_crs

Parameters:

Name Type Description Default
data GeoData

Input georeferenced raster data with spatial reference (crs, transform). Can be in any CRS; the function handles transformation to tile coordinates.

required
x int

Tile column index (0 to 2^z - 1). Increases eastward from the prime meridian. At z=0, x=0 covers the entire world. At z=1, x=0 is western hemisphere.

required
y int

Tile row index (0 to 2^z - 1). Increases southward from the north pole. At z=0, y=0 covers the entire world. At z=1, y=0 is northern hemisphere.

required
z int

Zoom level (typically 0-22). Determines tile resolution: - z=0: 1 tile for entire world (~40,075 km at equator) - z=1: 2Γ—2 = 4 tiles - z=10: 1024Γ—1024 = 1,048,576 tiles - z=15: ~2.4 meters/pixel at equator - z=20: ~7.5 cm/pixel at equator

required
dst_crs Optional[Any]

Output coordinate reference system. Defaults to WEB_MERCATOR_CRS (EPSG:3857) which is standard for web maps. Can be set to None to use data's native CRS (less common for web tiles).

WEB_MERCATOR_CRS
out_shape Optional[Tuple[int, int]]

Output tile dimensions as (height, width). Defaults to (SIZE_DEFAULT, SIZE_DEFAULT) which is typically (256, 256). Standard tile sizes: 256Γ—256 (most common), 512Γ—512 (retina), 128Γ—128 (rare). If None, output size matches the native resolution in the tile extent.

(SIZE_DEFAULT, SIZE_DEFAULT)
resolution_dst_crs Optional[Union[float, Tuple[float, float]]]

Target resolution in dst_crs units (meters for EPSG:3857, degrees for WGS84). Defaults to None. If both out_shape and resolution_dst_crs are None, uses native data resolution. If out_shape is provided, this parameter is ignored.

None
assert_if_not_intersects bool

If True, raises AssertionError when tile doesn't intersect data footprint. If False, returns None for non-intersecting tiles (useful for tile servers that expect None for empty tiles). Defaults to False.

False

Returns:

Type Description
Optional[GeoTensor]

Optional[GeoTensor]: - If tile intersects data: GeoTensor with shape (bands, height, width) or (height, width), georeferenced to dst_crs at the tile's location - If tile doesn't intersect: None (or raises AssertionError if assert_if_not_intersects=True)

Examples:

>>> from georeader import RasterioReader, read
>>> import rasterio
>>>
>>> # Example 1: Generate standard 256Γ—256 web tile
>>> with rasterio.open('sentinel2_spain.tif') as src:
...     reader = RasterioReader(src)
...     # Tile covering Madrid at zoom 12
...     tile = read.read_from_tile(reader, x=2046, y=1537, z=12)
...     print(f"Tile shape: {tile.shape}")  # (13, 256, 256) - 13 Sentinel-2 bands
...     print(f"Tile CRS: {tile.crs}")  # EPSG:3857 (Web Mercator)
...     print(f"Tile bounds: {tile.bounds}")  # Bounds in Web Mercator meters
>>> # Example 2: High-resolution retina tile (512Γ—512)
>>> tile_retina = read.read_from_tile(reader, x=2046, y=1537, z=12,
...                                    out_shape=(512, 512))
>>> # Twice the resolution for high-DPI displays
>>> # Example 3: Tile server implementation
>>> def get_tile(z, x, y, raster_path):
...     '''Simple tile server endpoint'''
...     with rasterio.open(raster_path) as src:
...         reader = RasterioReader(src)
...         tile = read.read_from_tile(reader, x=x, y=y, z=z)
...         if tile is None:
...             return None  # Empty tile (outside data extent)
...         return tile.values  # Return as numpy array for rendering
>>>
>>> # Usage: tile = get_tile(12, 2046, 1537, 'sentinel2.tif')
>>> # Example 4: Generate tile at native CRS (less common)
>>> # Useful when serving tiles in non-Web Mercator projections
>>> tile_utm = read.read_from_tile(reader, x=2046, y=1537, z=12,
...                                 dst_crs=None)  # Uses data's native CRS
>>> print(f"Native CRS: {tile_utm.crs}")
>>> # Example 5: Tile generation across zoom levels
>>> # Generate tiles for a pyramid (zoom levels 8-14)
>>> import mercantile
>>> bounds_wgs84 = (-3.75, 40.35, -3.65, 40.50)  # Madrid area
>>> for z in range(8, 15):
...     # Get tiles covering the area at this zoom
...     tiles = list(mercantile.tiles(*bounds_wgs84, z))
...     print(f"Zoom {z}: {len(tiles)} tiles")
...     for tile_coords in tiles:
...         tile_data = read.read_from_tile(reader,
...                                         x=tile_coords.x,
...                                         y=tile_coords.y,
...                                         z=tile_coords.z)
...         if tile_data is not None:
...             # Save tile to disk: tiles/{z}/{x}/{y}.png
...             # tile_data.save(f'tiles/{z}/{tile_coords.x}/{tile_coords.y}.tif')
...             pass
>>> # Example 6: Check tile coverage before processing
>>> tile_check = read.read_from_tile(reader, x=0, y=0, z=5,
...                                   assert_if_not_intersects=True)
>>> # Raises AssertionError if tile doesn't intersect data
>>> # Useful for validating tile requests
References
  • OSM Slippy Map: https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames
  • XYZ Tiles: https://en.wikipedia.org/wiki/Tiled_web_map
  • Mercantile library: https://github.com/mapbox/mercantile
  • Web Mercator: https://epsg.io/3857
Note
  • Tiles are in EPSG:3857 by default (required for most web mapping libraries)
  • The function uses mercantile to convert tile coordinates to geographic bounds
  • For non-intersecting tiles, returns None (standard behavior for tile servers)
  • Output size defaults to 256Γ—256 (standard for web maps since Google Maps)
  • Optimized readers may implement read_from_tile for better performance
  • Tile coordinates follow TMS/XYZ convention (y increases southward)
Source code in georeader/read.py
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
def read_from_tile(
    data: GeoData,
    x: int,
    y: int,
    z: int,
    dst_crs: Optional[Any] = WEB_MERCATOR_CRS,
    out_shape: Optional[Tuple[int, int]] = (SIZE_DEFAULT, SIZE_DEFAULT),
    resolution_dst_crs: Optional[Union[float, Tuple[float, float]]] = None,
    assert_if_not_intersects: bool = False,
) -> Optional[GeoTensor]:
    """
    Read raster data corresponding to a Web Mercator (XYZ) tile for web mapping applications.

    This function extracts and optionally reprojects raster data to match XYZ tile coordinates
    used by web mapping services (OpenStreetMap, Google Maps, Mapbox, etc.). It's the primary
    interface for:
    - Creating tile servers from arbitrary raster data
    - Building custom web map overlays from satellite imagery
    - Generating tiles for Leaflet, OpenLayers, or Mapbox GL JS
    - Creating tile caches for faster web mapping performance
    - Converting between different tile schemas and CRS

    XYZ tiles follow the Slippy Map / TMS convention where:
    - The world is divided into 2^z Γ— 2^z tiles at zoom level z
    - Tile (0, 0) is at the top-left (northwest corner)
    - x increases eastward (0 to 2^z - 1)
    - y increases southward (0 to 2^z - 1)
    - Each tile represents the same geographic area at different resolutions

    The function handles the complete tile workflow:
    1. Calculate tile bounds in Web Mercator (EPSG:3857)
    2. Check if tile intersects the raster footprint
    3. Extract data with optional reprojection to destination CRS
    4. Resize/resample to standard tile dimensions (typically 256Γ—256)

    Algorithm:
    1. Convert (x, y, z) to geographic bounds using mercantile
    2. Check intersection with data footprint (skip if no overlap)
    3. If reader has read_from_tile method, delegate to it (optimized path)
    4. Otherwise, read polygon extent and reproject/resize as needed
    5. Return tile with correct georeferencing in dst_crs

    Args:
        data (GeoData): Input georeferenced raster data with spatial reference (crs, transform).
            Can be in any CRS; the function handles transformation to tile coordinates.
        x (int): Tile column index (0 to 2^z - 1). Increases eastward from the prime meridian.
            At z=0, x=0 covers the entire world. At z=1, x=0 is western hemisphere.
        y (int): Tile row index (0 to 2^z - 1). Increases southward from the north pole.
            At z=0, y=0 covers the entire world. At z=1, y=0 is northern hemisphere.
        z (int): Zoom level (typically 0-22). Determines tile resolution:
            - z=0: 1 tile for entire world (~40,075 km at equator)
            - z=1: 2Γ—2 = 4 tiles
            - z=10: 1024Γ—1024 = 1,048,576 tiles
            - z=15: ~2.4 meters/pixel at equator
            - z=20: ~7.5 cm/pixel at equator
        dst_crs (Optional[Any], optional): Output coordinate reference system.
            Defaults to WEB_MERCATOR_CRS (EPSG:3857) which is standard for web maps.
            Can be set to None to use data's native CRS (less common for web tiles).
        out_shape (Optional[Tuple[int, int]], optional): Output tile dimensions as (height, width).
            Defaults to (SIZE_DEFAULT, SIZE_DEFAULT) which is typically (256, 256).
            Standard tile sizes: 256Γ—256 (most common), 512Γ—512 (retina), 128Γ—128 (rare).
            If None, output size matches the native resolution in the tile extent.
        resolution_dst_crs (Optional[Union[float, Tuple[float, float]]], optional):
            Target resolution in dst_crs units (meters for EPSG:3857, degrees for WGS84).
            Defaults to None. If both out_shape and resolution_dst_crs are None, uses
            native data resolution. If out_shape is provided, this parameter is ignored.
        assert_if_not_intersects (bool, optional): If True, raises AssertionError when
            tile doesn't intersect data footprint. If False, returns None for non-intersecting
            tiles (useful for tile servers that expect None for empty tiles). Defaults to False.

    Returns:
        Optional[GeoTensor]:
            - If tile intersects data: GeoTensor with shape (bands, height, width) or
              (height, width), georeferenced to dst_crs at the tile's location
            - If tile doesn't intersect: None (or raises AssertionError if assert_if_not_intersects=True)

    Examples:
        >>> from georeader import RasterioReader, read
        >>> import rasterio
        >>>
        >>> # Example 1: Generate standard 256Γ—256 web tile
        >>> with rasterio.open('sentinel2_spain.tif') as src:
        ...     reader = RasterioReader(src)
        ...     # Tile covering Madrid at zoom 12
        ...     tile = read.read_from_tile(reader, x=2046, y=1537, z=12)
        ...     print(f"Tile shape: {tile.shape}")  # (13, 256, 256) - 13 Sentinel-2 bands
        ...     print(f"Tile CRS: {tile.crs}")  # EPSG:3857 (Web Mercator)
        ...     print(f"Tile bounds: {tile.bounds}")  # Bounds in Web Mercator meters

        >>> # Example 2: High-resolution retina tile (512Γ—512)
        >>> tile_retina = read.read_from_tile(reader, x=2046, y=1537, z=12,
        ...                                    out_shape=(512, 512))
        >>> # Twice the resolution for high-DPI displays

        >>> # Example 3: Tile server implementation
        >>> def get_tile(z, x, y, raster_path):
        ...     '''Simple tile server endpoint'''
        ...     with rasterio.open(raster_path) as src:
        ...         reader = RasterioReader(src)
        ...         tile = read.read_from_tile(reader, x=x, y=y, z=z)
        ...         if tile is None:
        ...             return None  # Empty tile (outside data extent)
        ...         return tile.values  # Return as numpy array for rendering
        >>>
        >>> # Usage: tile = get_tile(12, 2046, 1537, 'sentinel2.tif')

        >>> # Example 4: Generate tile at native CRS (less common)
        >>> # Useful when serving tiles in non-Web Mercator projections
        >>> tile_utm = read.read_from_tile(reader, x=2046, y=1537, z=12,
        ...                                 dst_crs=None)  # Uses data's native CRS
        >>> print(f"Native CRS: {tile_utm.crs}")

        >>> # Example 5: Tile generation across zoom levels
        >>> # Generate tiles for a pyramid (zoom levels 8-14)
        >>> import mercantile
        >>> bounds_wgs84 = (-3.75, 40.35, -3.65, 40.50)  # Madrid area
        >>> for z in range(8, 15):
        ...     # Get tiles covering the area at this zoom
        ...     tiles = list(mercantile.tiles(*bounds_wgs84, z))
        ...     print(f"Zoom {z}: {len(tiles)} tiles")
        ...     for tile_coords in tiles:
        ...         tile_data = read.read_from_tile(reader,
        ...                                         x=tile_coords.x,
        ...                                         y=tile_coords.y,
        ...                                         z=tile_coords.z)
        ...         if tile_data is not None:
        ...             # Save tile to disk: tiles/{z}/{x}/{y}.png
        ...             # tile_data.save(f'tiles/{z}/{tile_coords.x}/{tile_coords.y}.tif')
        ...             pass

        >>> # Example 6: Check tile coverage before processing
        >>> tile_check = read.read_from_tile(reader, x=0, y=0, z=5,
        ...                                   assert_if_not_intersects=True)
        >>> # Raises AssertionError if tile doesn't intersect data
        >>> # Useful for validating tile requests

    References:
        - OSM Slippy Map: https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames
        - XYZ Tiles: https://en.wikipedia.org/wiki/Tiled_web_map
        - Mercantile library: https://github.com/mapbox/mercantile
        - Web Mercator: https://epsg.io/3857

    Note:
        - Tiles are in EPSG:3857 by default (required for most web mapping libraries)
        - The function uses mercantile to convert tile coordinates to geographic bounds
        - For non-intersecting tiles, returns None (standard behavior for tile servers)
        - Output size defaults to 256Γ—256 (standard for web maps since Google Maps)
        - Optimized readers may implement read_from_tile for better performance
        - Tile coordinates follow TMS/XYZ convention (y increases southward)
    """
    bounds_wgs = mercantile.xy_bounds(int(x), int(y), int(z))
    polygon_crs_webmercator = box(bounds_wgs.left, bounds_wgs.bottom, bounds_wgs.right, bounds_wgs.top)

    intersects = polygon_crs_webmercator.intersects(data.footprint(crs=WEB_MERCATOR_CRS))

    if not intersects:
        assert not assert_if_not_intersects, "Tile does not intersect data"
    else:
        return

    if out_shape is not None and hasattr(data, "read_from_tile"):
        return data.read_from_tile(x, y, z, dst_crs=dst_crs, out_shape=out_shape)

    if dst_crs is None:
        dst_crs = data.crs

    if window_utils.compare_crs(data.crs, dst_crs) and (out_shape is None) and (resolution_dst_crs is None):
        # read from polygon handles the case where the data does not intersect the polygon
        return read_from_polygon(data, polygon_crs_webmercator, WEB_MERCATOR_CRS, window_surrounding=True).load()

    if out_shape is not None:
        polygon_crs_dst = window_utils.polygon_to_crs(polygon_crs_webmercator, WEB_MERCATOR_CRS, dst_crs)
        bounds_dst = polygon_crs_dst.bounds
        dst_transform = rasterio.transform.from_bounds(*bounds_dst, width=out_shape[1], height=out_shape[0])
        window_data = rasterio.windows.Window(0, 0, width=out_shape[1], height=out_shape[0])
    else:
        if resolution_dst_crs is not None:
            if isinstance(resolution_dst_crs, numbers.Number):
                resolution_dst_crs = (abs(resolution_dst_crs), abs(resolution_dst_crs))

        polygon_crs_data = window_utils.polygon_to_crs(polygon_crs_webmercator, WEB_MERCATOR_CRS, data.crs)
        bounds_crs_data = polygon_crs_data.bounds

        in_height, in_width = data.shape[-2:]
        dst_transform, width, height = rasterio.warp.calculate_default_transform(
            data.crs, dst_crs, in_width, in_height, *bounds_crs_data, resolution=resolution_dst_crs
        )
        window_data = rasterio.windows.Window(0, 0, width=width, height=height)
        dst_transform, window_data = calculate_transform_window(data, dst_crs, resolution_dst_crs)

    return read_reproject(data, dst_crs=dst_crs, dst_transform=dst_transform, window_out=window_data)

read_to_crs(data_in, dst_crs, resampling=rasterio.warp.Resampling.cubic_spline, resolution_dst_crs=None, return_only_data=False)

Change the crs of data_in to dst_crs. This function is a wrapper of the read_reproject function to reproject data_in to dst_crs.

Parameters:

Name Type Description Default
data_in GeoData

GeoData to reproyect

required
dst_crs Any

dst crs. Examples: "EPSG:4326", "EPSG:3857"

required
resampling Resampling

Defaults to rasterio.warp.Resampling.cubic_spline

cubic_spline
resolution_dst_crs Optional[Union[float, Tuple[float, float]]]

spatial resolution of the output GeoTensor in dst_crs CRS. Defaults to None. If not provided it will compute the resolution to match the resolution of the input.

None
return_only_data bool

Defaults to False. If True it returns a np.ndarray otherwise a GeoTensor object (georreferenced array).

False

Returns:

Type Description
Union[GeoTensor, ndarray]

Union[GeoTensor, np.ndarray]: data in dst_crs

Source code in georeader/read.py
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
def read_to_crs(
    data_in: GeoData,
    dst_crs: Any,
    resampling: rasterio.warp.Resampling = rasterio.warp.Resampling.cubic_spline,
    resolution_dst_crs: Optional[Union[float, Tuple[float, float]]] = None,
    return_only_data: bool = False,
) -> Union[GeoTensor, np.ndarray]:
    """
    Change the crs of data_in to dst_crs. This function is a wrapper of the `read_reproject` function
    to reproject data_in to dst_crs.

    Args:
        data_in (GeoData): GeoData to reproyect
        dst_crs (Any): dst crs. Examples: "EPSG:4326", "EPSG:3857"
        resampling (rasterio.warp.Resampling, optional):
            Defaults to `rasterio.warp.Resampling.cubic_spline`
        resolution_dst_crs (Optional[Union[float, Tuple[float, float]]], optional):
            spatial resolution of the output `GeoTensor` in `dst_crs` CRS. Defaults to None.
            If not provided it will compute the resolution to match the resolution of the input.
        return_only_data (bool, optional): Defaults to `False`.
            If `True` it returns a np.ndarray otherwise a `GeoTensor` object (georreferenced array).

    Returns:
        Union[GeoTensor, np.ndarray]: data in dst_crs
    """
    if window_utils.compare_crs(data_in.crs, dst_crs):
        return data_in

    window_data, dst_transform = calculate_transform_window(data_in, dst_crs, resolution_dst_crs)

    return read_reproject(
        data_in,
        dst_crs=dst_crs,
        dst_transform=dst_transform,
        window_out=window_data,
        resampling=resampling,
        return_only_data=return_only_data,
    )

read_reproject_like(data_in, data_like, resolution_dst=None, resampling=rasterio.warp.Resampling.cubic_spline, dtype_dst=None, return_only_data=False, dst_nodata=None)

Reads from data_in and reprojects to have the same extent and resolution than data_like.

Parameters:

Name Type Description Default
data_in GeoData

GeoData to read and reproject. Expected coords "x" and "y".

required
data_like GeoData

GeoData to get the bounds and resolution to reproject data_in.

required
resolution_dst Optional[Union[float, Tuple[float, float]]]

if not None it will overwrite the resolution of data_like.

None
resampling Resampling

specifies how data is reprojected from rasterio.warp.Resampling.

cubic_spline
dtype_dst Any

if None it will be inferred

None
return_only_data bool

defaults to False. If True it returns a np.ndarray otherwise returns an GeoTensor object (georreferenced array).

False
dst_nodata Optional[int]

dst_nodata value

None

Returns:

Type Description
Union[GeoTensor, ndarray]

GeoTensor read from data_in with same transform, crs, shape and bounds than data_like.

Source code in georeader/read.py
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
def read_reproject_like(
    data_in: GeoData,
    data_like: GeoData,
    resolution_dst: Optional[Union[float, Tuple[float, float]]] = None,
    resampling: rasterio.warp.Resampling = rasterio.warp.Resampling.cubic_spline,
    dtype_dst: Any = None,
    return_only_data: bool = False,
    dst_nodata: Optional[int] = None,
) -> Union[GeoTensor, np.ndarray]:
    """
    Reads from `data_in` and reprojects to have the same extent and resolution than `data_like`.

    Args:
        data_in: GeoData to read and reproject. Expected coords "x" and "y".
        data_like: GeoData to get the bounds and resolution to reproject `data_in`.
        resolution_dst: if not None it will overwrite the resolution of `data_like`.
        resampling: specifies how data is reprojected from `rasterio.warp.Resampling`.
        dtype_dst: if None it will be inferred
        return_only_data: defaults to `False`. If `True` it returns a np.ndarray otherwise
            returns an GeoTensor object (georreferenced array).
        dst_nodata: dst_nodata value

    Returns:
        GeoTensor read from `data_in` with same transform, crs, shape and bounds than `data_like`.
    """

    shape_out = data_like.shape[-2:]
    if resolution_dst is not None:
        if isinstance(resolution_dst, float):
            resolution_dst = (resolution_dst, resolution_dst)

        resolution_data_like = data_like.res

        shape_out = (
            int(round(shape_out[0] / resolution_dst[0] * resolution_data_like[0])),
            int(round(shape_out[1] / resolution_dst[1] * resolution_data_like[1])),
        )

    return read_reproject(
        data_in,
        dst_crs=data_like.crs,
        dst_transform=data_like.transform,
        resolution_dst_crs=resolution_dst,
        window_out=rasterio.windows.Window(0, 0, width=shape_out[-1], height=shape_out[-2]),
        resampling=resampling,
        dtype_dst=dtype_dst,
        return_only_data=return_only_data,
        dst_nodata=dst_nodata,
    )

resize(data_in, resolution_dst, window_out=None, anti_aliasing=True, anti_aliasing_sigma=None, resampling=rasterio.warp.Resampling.cubic_spline, return_only_data=False)

Resample raster data to a different spatial resolution with optional anti-aliasing.

This function changes the pixel size (spatial resolution) of raster data while preserving geographic extent and CRS. It's essential for: - Downsampling high-resolution imagery to reduce file size or processing time - Upsampling low-resolution data for visualization or analysis - Matching resolution across multi-source datasets - Creating image pyramids for multi-scale processing - Preparing data at specific resolutions for machine learning models

The function intelligently handles both upsampling (resolution_dst > resolution_src) and downsampling (resolution_dst < resolution_src). For downsampling, it applies Gaussian anti-aliasing by default to prevent aliasing artifacts (moirΓ© patterns, jagged edges). This is critical for maintaining visual quality and preventing information loss when reducing resolution.

Anti-aliasing workflow (for downsampling): 1. Determine downsampling factor: scale = resolution_dst / resolution_src 2. Calculate Gaussian sigma: Οƒ = (scale - 1) / 2 for scale > 1 3. Apply Gaussian filter to smooth high-frequency components 4. Resample to target resolution using specified resampling algorithm

The function preserves georeferencing, adjusting the transform to reflect the new pixel size while maintaining the same geographic extent (upper-left corner stays fixed).

Algorithm: 1. Compute scale factors: scale = (res_dst_y/res_src_y, res_dst_x/res_src_x) 2. Calculate output shape: shape_out = shape_in / scale (rounded up) 3. If downsampling (scale > 1) and anti_aliasing=True: - Apply Gaussian filter with sigma = (scale - 1) / 2 4. Call read_reproject with same CRS but updated resolution 5. Adjust transform: new pixel size = resolution_dst

Parameters:

Name Type Description Default
data_in GeoData

Input georeferenced data to resample. Expected to have "x" and "y" spatial dimensions. Can be 2D (H, W), 3D (C, H, W), or 4D (T, C, H, W).

required
resolution_dst Union[float, Tuple[float, float]]

Target spatial resolution in data_in's CRS units. If float, assumes same resolution in x and y directions. If tuple, (res_y, res_x). Units: - Meters for projected CRS (e.g., UTM: 10 = 10m/pixel) - Degrees for geographic CRS (e.g., WGS84: 0.0001 = ~11m at equator)

required
window_out Optional[Window]

Explicit output window dimensions. If None, automatically computed from input shape and scale factor (ceiling operation to ensure complete coverage). Format: Window(col_off, row_off, width, height). Defaults to None.

None
anti_aliasing bool

Whether to apply Gaussian filter before downsampling to reduce aliasing artifacts. Highly recommended for downsampling (scale > 1) to: - Prevent moirΓ© patterns and jagged edges - Reduce high-frequency noise - Improve visual quality of downsampled images - Preserve spatial structure at coarser resolutions Has no effect when upsampling (scale ≀ 1). Defaults to True.

True
anti_aliasing_sigma Optional[Union[float, ndarray]]

Standard deviation for Gaussian filtering. If None, automatically computed as (scale - 1) / 2 where scale is the downsampling factor. Can be: - float: Same sigma for all bands - np.ndarray: Per-band sigma values with shape matching non-spatial dims Larger sigma = more smoothing (blurrier but less aliasing). Defaults to None.

None
resampling Resampling

Resampling algorithm for interpolation. Common options: - cubic_spline: Smooth, good for continuous data (DEFAULT) - bilinear: Faster, slight quality loss - nearest: Categorical data (land cover, labels) - lanczos: High quality, slower - average: Good for downsampling continuous data Defaults to rasterio.warp.Resampling.cubic_spline.

cubic_spline
return_only_data bool

If True, returns numpy array without georeferencing. If False, returns GeoTensor with updated transform. Defaults to False.

False

Returns:

Type Description
Union[GeoTensor, ndarray]

Union[GeoTensor, np.ndarray]: - If return_only_data=False: GeoTensor with shape determined by resolution ratio, transform adjusted to reflect new pixel size, same CRS and bounds as input - If return_only_data=True: numpy array with resampled data

Examples:

>>> from georeader import GeoTensor, read
>>> import rasterio
>>> import numpy as np
>>>
>>> # Example 1: Downsample Sentinel-2 from 10m to 30m (Landsat resolution)
>>> # Load Sentinel-2 data at 10m resolution
>>> s2_data = GeoTensor.load_file('sentinel2_10m.tif')
>>> print(f"Original: {s2_data.shape}, res: {s2_data.res}")  # (13, 1000, 1000), res: (10, 10)
>>>
>>> # Downsample to 30m (3x reduction)
>>> s2_30m = read.resize(s2_data, resolution_dst=30.0)
>>> print(f"Downsampled: {s2_30m.shape}, res: {s2_30m.res}")  # (13, 334, 334), res: (30, 30)
>>> # Shape reduction: 1000 / 3 β‰ˆ 334 pixels
>>> # Anti-aliasing automatically applied to prevent artifacts
>>> # Example 2: Upsample low-resolution data (2x magnification)
>>> # Coarse data at 60m resolution
>>> coarse_data = GeoTensor.load_file('data_60m.tif')
>>> print(f"Original: {coarse_data.shape}")  # (4, 100, 100)
>>>
>>> # Upsample to 30m resolution
>>> upsampled = read.resize(coarse_data, resolution_dst=30.0)
>>> print(f"Upsampled: {upsampled.shape}")  # (4, 200, 200)
>>> # Shape increase: 100 * 2 = 200 pixels
>>> # Uses cubic_spline interpolation for smooth result
>>> # Example 3: Downsample with custom anti-aliasing
>>> # Strong smoothing before downsampling (reduce noise)
>>> smoothed = read.resize(s2_data, resolution_dst=50.0,
...                       anti_aliasing=True,
...                       anti_aliasing_sigma=3.0)  # Custom sigma
>>> # More aggressive smoothing than default
>>> # Example 4: Disable anti-aliasing (faster but lower quality)
>>> # For quick previews or when speed is critical
>>> fast_downsample = read.resize(s2_data, resolution_dst=30.0,
...                              anti_aliasing=False)
>>> # Faster but may show aliasing artifacts
>>> # Example 5: Different resolutions in x and y
>>> # Non-square pixels (uncommon but supported)
>>> anisotropic = read.resize(s2_data,
...                          resolution_dst=(20.0, 30.0))  # (res_y, res_x)
>>> print(f"Resolution: {anisotropic.res}")  # (20, 30)
>>> # Different sampling rates in each dimension
>>> # Example 6: Resampling for categorical data (land cover)
>>> labels = GeoTensor.load_file('land_cover_10m.tif')
>>> # Use nearest neighbor to preserve class values
>>> labels_30m = read.resize(labels, resolution_dst=30.0,
...                         resampling=rasterio.warp.Resampling.nearest,
...                         anti_aliasing=False)  # No smoothing for discrete data
>>> # Class labels preserved (no interpolation)
>>> # Example 7: Create image pyramid (multi-resolution)
>>> # Generate multiple resolution levels for fast visualization
>>> pyramid = {}
>>> base_res = 10.0
>>> for level in range(5):  # 5 pyramid levels
...     resolution = base_res * (2 ** level)  # 10m, 20m, 40m, 80m, 160m
...     pyramid[level] = read.resize(s2_data, resolution_dst=resolution)
...     print(f"Level {level}: {pyramid[level].shape}, res: {resolution}m")
>>> # Example 8: Match resolution to reference dataset
>>> reference = GeoTensor.load_file('reference_30m.tif')
>>> # Resample data to match reference resolution
>>> matched = read.resize(s2_data, resolution_dst=reference.res[0])
>>> assert matched.res == reference.res
>>> # Now both datasets have same resolution for analysis
Note
  • Function preserves CRS (no projection change, only resolution change)
  • Geographic bounds remain constant (upper-left corner fixed)
  • Transform is updated: pixel size = resolution_dst
  • Output shape computed as: shape_out = ceil(shape_in * res_in / res_dst)
  • Anti-aliasing only applied when downsampling (scale > 1)
  • For upsampling, resampling algorithm determines interpolation quality
  • Uses scipy.ndimage.gaussian_filter for anti-aliasing (requires scipy)
  • Efficient: operates in-place when possible to minimize memory usage
Source code in georeader/read.py
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
def resize(
    data_in: GeoData,
    resolution_dst: Union[float, Tuple[float, float]],
    window_out: Optional[rasterio.windows.Window] = None,
    anti_aliasing: bool = True,
    anti_aliasing_sigma: Optional[Union[float, np.ndarray]] = None,
    resampling: rasterio.warp.Resampling = rasterio.warp.Resampling.cubic_spline,
    return_only_data: bool = False,
) -> Union[GeoTensor, np.ndarray]:
    """
    Resample raster data to a different spatial resolution with optional anti-aliasing.

    This function changes the pixel size (spatial resolution) of raster data while preserving
    geographic extent and CRS. It's essential for:
    - Downsampling high-resolution imagery to reduce file size or processing time
    - Upsampling low-resolution data for visualization or analysis
    - Matching resolution across multi-source datasets
    - Creating image pyramids for multi-scale processing
    - Preparing data at specific resolutions for machine learning models

    The function intelligently handles both upsampling (resolution_dst > resolution_src) and
    downsampling (resolution_dst < resolution_src). For downsampling, it applies Gaussian
    anti-aliasing by default to prevent aliasing artifacts (moirΓ© patterns, jagged edges).
    This is critical for maintaining visual quality and preventing information loss when
    reducing resolution.

    Anti-aliasing workflow (for downsampling):
    1. Determine downsampling factor: scale = resolution_dst / resolution_src
    2. Calculate Gaussian sigma: Οƒ = (scale - 1) / 2 for scale > 1
    3. Apply Gaussian filter to smooth high-frequency components
    4. Resample to target resolution using specified resampling algorithm

    The function preserves georeferencing, adjusting the transform to reflect the new
    pixel size while maintaining the same geographic extent (upper-left corner stays fixed).

    Algorithm:
    1. Compute scale factors: scale = (res_dst_y/res_src_y, res_dst_x/res_src_x)
    2. Calculate output shape: shape_out = shape_in / scale (rounded up)
    3. If downsampling (scale > 1) and anti_aliasing=True:
       - Apply Gaussian filter with sigma = (scale - 1) / 2
    4. Call read_reproject with same CRS but updated resolution
    5. Adjust transform: new pixel size = resolution_dst

    Args:
        data_in (GeoData): Input georeferenced data to resample. Expected to have "x" and "y"
            spatial dimensions. Can be 2D (H, W), 3D (C, H, W), or 4D (T, C, H, W).
        resolution_dst (Union[float, Tuple[float, float]]): Target spatial resolution in
            data_in's CRS units. If float, assumes same resolution in x and y directions.
            If tuple, (res_y, res_x). Units:
            - Meters for projected CRS (e.g., UTM: 10 = 10m/pixel)
            - Degrees for geographic CRS (e.g., WGS84: 0.0001 = ~11m at equator)
        window_out (Optional[rasterio.windows.Window], optional): Explicit output window
            dimensions. If None, automatically computed from input shape and scale factor
            (ceiling operation to ensure complete coverage). Format: Window(col_off, row_off,
            width, height). Defaults to None.
        anti_aliasing (bool, optional): Whether to apply Gaussian filter before downsampling
            to reduce aliasing artifacts. Highly recommended for downsampling (scale > 1) to:
            - Prevent moirΓ© patterns and jagged edges
            - Reduce high-frequency noise
            - Improve visual quality of downsampled images
            - Preserve spatial structure at coarser resolutions
            Has no effect when upsampling (scale ≀ 1). Defaults to True.
        anti_aliasing_sigma (Optional[Union[float, np.ndarray]], optional): Standard deviation
            for Gaussian filtering. If None, automatically computed as (scale - 1) / 2 where
            scale is the downsampling factor. Can be:
            - float: Same sigma for all bands
            - np.ndarray: Per-band sigma values with shape matching non-spatial dims
            Larger sigma = more smoothing (blurrier but less aliasing). Defaults to None.
        resampling (rasterio.warp.Resampling, optional): Resampling algorithm for interpolation.
            Common options:
            - cubic_spline: Smooth, good for continuous data (DEFAULT)
            - bilinear: Faster, slight quality loss
            - nearest: Categorical data (land cover, labels)
            - lanczos: High quality, slower
            - average: Good for downsampling continuous data
            Defaults to rasterio.warp.Resampling.cubic_spline.
        return_only_data (bool, optional): If True, returns numpy array without georeferencing.
            If False, returns GeoTensor with updated transform. Defaults to False.

    Returns:
        Union[GeoTensor, np.ndarray]:
            - If return_only_data=False: GeoTensor with shape determined by resolution ratio,
              transform adjusted to reflect new pixel size, same CRS and bounds as input
            - If return_only_data=True: numpy array with resampled data

    Examples:
        >>> from georeader import GeoTensor, read
        >>> import rasterio
        >>> import numpy as np
        >>>
        >>> # Example 1: Downsample Sentinel-2 from 10m to 30m (Landsat resolution)
        >>> # Load Sentinel-2 data at 10m resolution
        >>> s2_data = GeoTensor.load_file('sentinel2_10m.tif')
        >>> print(f"Original: {s2_data.shape}, res: {s2_data.res}")  # (13, 1000, 1000), res: (10, 10)
        >>>
        >>> # Downsample to 30m (3x reduction)
        >>> s2_30m = read.resize(s2_data, resolution_dst=30.0)
        >>> print(f"Downsampled: {s2_30m.shape}, res: {s2_30m.res}")  # (13, 334, 334), res: (30, 30)
        >>> # Shape reduction: 1000 / 3 β‰ˆ 334 pixels
        >>> # Anti-aliasing automatically applied to prevent artifacts

        >>> # Example 2: Upsample low-resolution data (2x magnification)
        >>> # Coarse data at 60m resolution
        >>> coarse_data = GeoTensor.load_file('data_60m.tif')
        >>> print(f"Original: {coarse_data.shape}")  # (4, 100, 100)
        >>>
        >>> # Upsample to 30m resolution
        >>> upsampled = read.resize(coarse_data, resolution_dst=30.0)
        >>> print(f"Upsampled: {upsampled.shape}")  # (4, 200, 200)
        >>> # Shape increase: 100 * 2 = 200 pixels
        >>> # Uses cubic_spline interpolation for smooth result

        >>> # Example 3: Downsample with custom anti-aliasing
        >>> # Strong smoothing before downsampling (reduce noise)
        >>> smoothed = read.resize(s2_data, resolution_dst=50.0,
        ...                       anti_aliasing=True,
        ...                       anti_aliasing_sigma=3.0)  # Custom sigma
        >>> # More aggressive smoothing than default

        >>> # Example 4: Disable anti-aliasing (faster but lower quality)
        >>> # For quick previews or when speed is critical
        >>> fast_downsample = read.resize(s2_data, resolution_dst=30.0,
        ...                              anti_aliasing=False)
        >>> # Faster but may show aliasing artifacts

        >>> # Example 5: Different resolutions in x and y
        >>> # Non-square pixels (uncommon but supported)
        >>> anisotropic = read.resize(s2_data,
        ...                          resolution_dst=(20.0, 30.0))  # (res_y, res_x)
        >>> print(f"Resolution: {anisotropic.res}")  # (20, 30)
        >>> # Different sampling rates in each dimension

        >>> # Example 6: Resampling for categorical data (land cover)
        >>> labels = GeoTensor.load_file('land_cover_10m.tif')
        >>> # Use nearest neighbor to preserve class values
        >>> labels_30m = read.resize(labels, resolution_dst=30.0,
        ...                         resampling=rasterio.warp.Resampling.nearest,
        ...                         anti_aliasing=False)  # No smoothing for discrete data
        >>> # Class labels preserved (no interpolation)

        >>> # Example 7: Create image pyramid (multi-resolution)
        >>> # Generate multiple resolution levels for fast visualization
        >>> pyramid = {}
        >>> base_res = 10.0
        >>> for level in range(5):  # 5 pyramid levels
        ...     resolution = base_res * (2 ** level)  # 10m, 20m, 40m, 80m, 160m
        ...     pyramid[level] = read.resize(s2_data, resolution_dst=resolution)
        ...     print(f"Level {level}: {pyramid[level].shape}, res: {resolution}m")

        >>> # Example 8: Match resolution to reference dataset
        >>> reference = GeoTensor.load_file('reference_30m.tif')
        >>> # Resample data to match reference resolution
        >>> matched = read.resize(s2_data, resolution_dst=reference.res[0])
        >>> assert matched.res == reference.res
        >>> # Now both datasets have same resolution for analysis

    Note:
        - Function preserves CRS (no projection change, only resolution change)
        - Geographic bounds remain constant (upper-left corner fixed)
        - Transform is updated: pixel size = resolution_dst
        - Output shape computed as: shape_out = ceil(shape_in * res_in / res_dst)
        - Anti-aliasing only applied when downsampling (scale > 1)
        - For upsampling, resampling algorithm determines interpolation quality
        - Uses scipy.ndimage.gaussian_filter for anti-aliasing (requires scipy)
        - Efficient: operates in-place when possible to minimize memory usage
    """
    resolution_or = data_in.res
    if isinstance(resolution_dst, numbers.Number):
        resolution_dst = (abs(resolution_dst), abs(resolution_dst))
    scale = np.array([resolution_dst[0] / resolution_or[0], resolution_dst[1] / resolution_or[1]])

    if window_out is None:
        spatial_shape = data_in.shape[-2:]

        # scale < 1 => make image smaller (resolution_or < resolution_dst)
        # scale > 1 => make image larger (resolution_or > resolution_dst)
        output_shape_exact = spatial_shape[0] / scale[0], spatial_shape[1] / scale[1]
        output_shape_rounded = round(output_shape_exact[0], ndigits=3), round(output_shape_exact[1], ndigits=3)
        output_shape = ceil(output_shape_rounded[0]), ceil(output_shape_rounded[1])
        window_out = rasterio.windows.Window(col_off=0, row_off=0, width=output_shape[1], height=output_shape[0])

    if anti_aliasing:
        data_in = apply_anti_aliasing(data_in, anti_aliasing_sigma=anti_aliasing_sigma, resolution_dst=resolution_dst)

    return read_reproject(
        data_in,
        dst_crs=data_in.crs,
        resolution_dst_crs=resolution_dst,
        dst_transform=data_in.transform,
        window_out=window_out,
        resampling=resampling,
        return_only_data=return_only_data,
    )

read_reproject(data_in, dst_crs=None, bounds=None, resolution_dst_crs=None, dst_transform=None, window_out=None, resampling=rasterio.warp.Resampling.cubic_spline, dtype_dst=None, return_only_data=False, dst_nodata=None)

Reproject raster data to a different CRS, resolution, and/or extent.

This is the core reprojection function in georeader, providing fine-grained control over the output coordinate system, spatial resolution, geographic extent, and resampling method. It handles complex transformations including: - CRS changes (e.g., WGS84 β†’ UTM, UTM β†’ Web Mercator) - Resolution changes (resampling/downsampling) - Geographic subsetting (reading only a portion in destination CRS) - Data type conversions

The function uses rasterio's warp.reproject under the hood, which leverages GDAL's high-performance reprojection engine. It automatically handles: - Non-intersecting regions (returns nodata-filled array) - Multi-band and multi-temporal data (iterates over all bands/times) - Boolean arrays (converts to float32 for interpolation, then back) - Edge cases near poles or antimeridian

Algorithm: 1. Determine output transform from bounds/resolution or use provided transform 2. Check if source data intersects destination extent 3. Read input data with small buffer (3 pixels) for edge handling 4. Iterate over each band/time slice and call rasterio.warp.reproject 5. Package result as GeoTensor with destination CRS and transform

Parameters:

Name Type Description Default
data_in GeoData

Input georeferenced data to reproject. Must have "x" and "y" spatial dimensions. Can be 2D (H, W), 3D (C, H, W), or 4D (T, C, H, W).

required
dst_crs Optional[str]

Destination coordinate reference system. If None, uses the same CRS as data_in (useful for resolution change only). Format: "EPSG:4326", "EPSG:32630", CRS object, or WKT string. Defaults to None.

None
bounds Optional[Tuple[float, float, float, float]]

Output extent as (xmin, ymin, xmax, ymax) in dst_crs coordinates. If None, must provide window_out. Useful for reading a specific geographic region. Defaults to None.

None
resolution_dst_crs Optional[Union[float, Tuple[float, float]]]

Target resolution in dst_crs units. If float, same resolution in x and y. If tuple, (res_x, res_y). If None, uses resolution from dst_transform. Units: meters for projected CRS, degrees for geographic CRS. Defaults to None.

None
dst_transform Optional[Affine]

Output affine transform. If None, computed automatically from bounds and resolution. Useful for aligning to an existing grid. Defaults to None.

None
window_out Optional[Window]

Output size as Window(col_off=0, row_off=0, width=W, height=H). If None, computed from bounds. Defines output array dimensions. Defaults to None.

None
resampling Resampling

Resampling algorithm. Options: nearest, bilinear, cubic, cubic_spline, lanczos, average, mode, etc. Default: cubic_spline (smooth, accurate for continuous data).

cubic_spline
dtype_dst Any

Output data type. If None, uses data_in.dtype. Examples: np.float32, np.uint8, np.int16. Defaults to None.

None
return_only_data bool

If True, returns numpy array without georeference. If False, returns GeoTensor with spatial metadata. Defaults to False.

False
dst_nodata Optional[int]

Fill value for out-of-bounds regions. If None, uses data_in.fill_value_default. Defaults to None.

None

Returns:

Type Description
Union[GeoTensor, ndarray]

Union[GeoTensor, np.ndarray]: Reprojected data. - If return_only_data=False: GeoTensor with shape matching window_out, georeferenced to dst_crs with dst_transform - If return_only_data=True: numpy array with same shape

Examples:

>>> # Example 1: Simple CRS change (WGS84 β†’ UTM Zone 30N)
>>> from georeader import GeoTensor, read
>>> import rasterio
>>>
>>> # Create sample data in WGS84
>>> transform_wgs84 = rasterio.Affine(0.001, 0, -3.71, 0, -0.001, 40.42)
>>> data_wgs84 = GeoTensor(np.random.rand(100, 100), transform_wgs84, "EPSG:4326")
>>>
>>> # Reproject to UTM (no bounds = full extent)
>>> data_utm = read.read_reproject(data_wgs84, dst_crs="EPSG:32630")
>>> print(f"Input shape: {data_wgs84.shape}, Output shape: {data_utm.shape}")
>>> print(f"Output CRS: {data_utm.crs}, resolution: {data_utm.res}")
>>> # Example 2: Reproject with specific resolution (10m pixels)
>>> data_utm_10m = read.read_reproject(
...     data_wgs84,
...     dst_crs="EPSG:32630",
...     resolution_dst_crs=10.0  # 10 meters
... )
>>> print(f"Resolution: {data_utm_10m.res}")  # (10.0, 10.0)
>>> # Example 3: Reproject and subset by bounds
>>> bounds_madrid = (437000, 4474000, 439000, 4476000)  # UTM coords (2km Γ— 2km)
>>> data_subset = read.read_reproject(
...     data_wgs84,
...     dst_crs="EPSG:32630",
...     bounds=bounds_madrid,
...     resolution_dst_crs=10.0
... )
>>> print(f"Subset shape: {data_subset.shape}")  # ~(200, 200) at 10m resolution
>>> # Example 4: Align to existing grid (match another raster)
>>> reference = GeoTensor.load_file("reference_grid.tif")
>>> aligned = read.read_reproject(
...     data_wgs84,
...     dst_crs=reference.crs,
...     dst_transform=reference.transform,
...     window_out=rasterio.windows.Window(0, 0, reference.width, reference.height)
... )
>>> # Output exactly matches reference grid
>>> # Example 5: Custom resampling for categorical data
>>> labels = GeoTensor(np.random.randint(0, 10, (100, 100)), transform_wgs84, "EPSG:4326")
>>> labels_reprojected = read.read_reproject(
...     labels,
...     dst_crs="EPSG:32630",
...     resampling=rasterio.warp.Resampling.nearest  # Preserve class labels
... )
Note
  • Performance: Reads input data with 3-pixel buffer to avoid edge artifacts
  • Optimization: Detects no-op cases (same CRS + resolution + alignment) and uses faster read_from_window instead
  • Boolean handling: Converts bool β†’ float32 β†’ interpolate β†’ threshold > 0.5 β†’ bool
  • Multi-dimensional: Processes each (time, band) slice independently
  • Memory: Output array allocated upfront and filled via rasterio.warp.reproject
  • Non-intersecting: Returns nodata-filled array if source doesn't overlap destination
Source code in georeader/read.py
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
def read_reproject(
    data_in: GeoData,
    dst_crs: Optional[str] = None,
    bounds: Optional[Tuple[float, float, float, float]] = None,
    resolution_dst_crs: Optional[Union[float, Tuple[float, float]]] = None,
    dst_transform: Optional[rasterio.Affine] = None,
    window_out: Optional[rasterio.windows.Window] = None,
    resampling: rasterio.warp.Resampling = rasterio.warp.Resampling.cubic_spline,
    dtype_dst: Any = None,
    return_only_data: bool = False,
    dst_nodata: Optional[int] = None,
) -> Union[GeoTensor, np.ndarray]:
    """
    Reproject raster data to a different CRS, resolution, and/or extent.

    This is the core reprojection function in georeader, providing fine-grained control
    over the output coordinate system, spatial resolution, geographic extent, and
    resampling method. It handles complex transformations including:
    - CRS changes (e.g., WGS84 β†’ UTM, UTM β†’ Web Mercator)
    - Resolution changes (resampling/downsampling)
    - Geographic subsetting (reading only a portion in destination CRS)
    - Data type conversions

    The function uses rasterio's warp.reproject under the hood, which leverages GDAL's
    high-performance reprojection engine. It automatically handles:
    - Non-intersecting regions (returns nodata-filled array)
    - Multi-band and multi-temporal data (iterates over all bands/times)
    - Boolean arrays (converts to float32 for interpolation, then back)
    - Edge cases near poles or antimeridian

    Algorithm:
    1. Determine output transform from bounds/resolution or use provided transform
    2. Check if source data intersects destination extent
    3. Read input data with small buffer (3 pixels) for edge handling
    4. Iterate over each band/time slice and call rasterio.warp.reproject
    5. Package result as GeoTensor with destination CRS and transform

    Args:
        data_in (GeoData): Input georeferenced data to reproject. Must have "x" and "y"
            spatial dimensions. Can be 2D (H, W), 3D (C, H, W), or 4D (T, C, H, W).
        dst_crs (Optional[str], optional): Destination coordinate reference system.
            If None, uses the same CRS as data_in (useful for resolution change only).
            Format: "EPSG:4326", "EPSG:32630", CRS object, or WKT string. Defaults to None.
        bounds (Optional[Tuple[float, float, float, float]], optional): Output extent as
            (xmin, ymin, xmax, ymax) in dst_crs coordinates. If None, must provide window_out.
            Useful for reading a specific geographic region. Defaults to None.
        resolution_dst_crs (Optional[Union[float, Tuple[float, float]]], optional):
            Target resolution in dst_crs units. If float, same resolution in x and y.
            If tuple, (res_x, res_y). If None, uses resolution from dst_transform.
            Units: meters for projected CRS, degrees for geographic CRS. Defaults to None.
        dst_transform (Optional[rasterio.Affine], optional): Output affine transform.
            If None, computed automatically from bounds and resolution. Useful for
            aligning to an existing grid. Defaults to None.
        window_out (Optional[rasterio.windows.Window], optional): Output size as
            Window(col_off=0, row_off=0, width=W, height=H). If None, computed from bounds.
            Defines output array dimensions. Defaults to None.
        resampling (rasterio.warp.Resampling, optional): Resampling algorithm.
            Options: nearest, bilinear, cubic, cubic_spline, lanczos, average, mode, etc.
            Default: cubic_spline (smooth, accurate for continuous data).
        dtype_dst (Any, optional): Output data type. If None, uses data_in.dtype.
            Examples: np.float32, np.uint8, np.int16. Defaults to None.
        return_only_data (bool, optional): If True, returns numpy array without georeference.
            If False, returns GeoTensor with spatial metadata. Defaults to False.
        dst_nodata (Optional[int], optional): Fill value for out-of-bounds regions.
            If None, uses data_in.fill_value_default. Defaults to None.

    Returns:
        Union[GeoTensor, np.ndarray]: Reprojected data.
            - If return_only_data=False: GeoTensor with shape matching window_out,
              georeferenced to dst_crs with dst_transform
            - If return_only_data=True: numpy array with same shape

    Examples:
        >>> # Example 1: Simple CRS change (WGS84 β†’ UTM Zone 30N)
        >>> from georeader import GeoTensor, read
        >>> import rasterio
        >>>
        >>> # Create sample data in WGS84
        >>> transform_wgs84 = rasterio.Affine(0.001, 0, -3.71, 0, -0.001, 40.42)
        >>> data_wgs84 = GeoTensor(np.random.rand(100, 100), transform_wgs84, "EPSG:4326")
        >>>
        >>> # Reproject to UTM (no bounds = full extent)
        >>> data_utm = read.read_reproject(data_wgs84, dst_crs="EPSG:32630")
        >>> print(f"Input shape: {data_wgs84.shape}, Output shape: {data_utm.shape}")
        >>> print(f"Output CRS: {data_utm.crs}, resolution: {data_utm.res}")

        >>> # Example 2: Reproject with specific resolution (10m pixels)
        >>> data_utm_10m = read.read_reproject(
        ...     data_wgs84,
        ...     dst_crs="EPSG:32630",
        ...     resolution_dst_crs=10.0  # 10 meters
        ... )
        >>> print(f"Resolution: {data_utm_10m.res}")  # (10.0, 10.0)

        >>> # Example 3: Reproject and subset by bounds
        >>> bounds_madrid = (437000, 4474000, 439000, 4476000)  # UTM coords (2km Γ— 2km)
        >>> data_subset = read.read_reproject(
        ...     data_wgs84,
        ...     dst_crs="EPSG:32630",
        ...     bounds=bounds_madrid,
        ...     resolution_dst_crs=10.0
        ... )
        >>> print(f"Subset shape: {data_subset.shape}")  # ~(200, 200) at 10m resolution

        >>> # Example 4: Align to existing grid (match another raster)
        >>> reference = GeoTensor.load_file("reference_grid.tif")
        >>> aligned = read.read_reproject(
        ...     data_wgs84,
        ...     dst_crs=reference.crs,
        ...     dst_transform=reference.transform,
        ...     window_out=rasterio.windows.Window(0, 0, reference.width, reference.height)
        ... )
        >>> # Output exactly matches reference grid

        >>> # Example 5: Custom resampling for categorical data
        >>> labels = GeoTensor(np.random.randint(0, 10, (100, 100)), transform_wgs84, "EPSG:4326")
        >>> labels_reprojected = read.read_reproject(
        ...     labels,
        ...     dst_crs="EPSG:32630",
        ...     resampling=rasterio.warp.Resampling.nearest  # Preserve class labels
        ... )

    Note:
        - Performance: Reads input data with 3-pixel buffer to avoid edge artifacts
        - Optimization: Detects no-op cases (same CRS + resolution + alignment) and
          uses faster read_from_window instead
        - Boolean handling: Converts bool β†’ float32 β†’ interpolate β†’ threshold > 0.5 β†’ bool
        - Multi-dimensional: Processes each (time, band) slice independently
        - Memory: Output array allocated upfront and filled via rasterio.warp.reproject
        - Non-intersecting: Returns nodata-filled array if source doesn't overlap destination
    """

    named_shape = OrderedDict(zip(data_in.dims, data_in.shape))

    # ─────────────────────────────────────────────────────────────────────────
    # STEP 1: DETERMINE OUTPUT TRANSFORM
    # ─────────────────────────────────────────────────────────────────────────
    # The output transform defines the mapping from pixel coordinates to
    # geographic coordinates in the destination CRS. It can be:
    # - Provided directly (dst_transform)
    # - Computed from bounds + resolution
    # - Computed from bounds with inferred resolution
    # 
    # Shape tracking:
    #   Input:  named_shape = {'band': C, 'y': H_in, 'x': W_in}
    #   Output: will have {'band': C, 'y': H_out, 'x': W_out}
    # ─────────────────────────────────────────────────────────────────────────
    dst_transform = window_utils.figure_out_transform(
        transform=dst_transform, bounds=bounds, resolution_dst=resolution_dst_crs
    )

    # ─────────────────────────────────────────────────────────────────────────
    # STEP 2: COMPUTE OUTPUT DIMENSIONS
    # ─────────────────────────────────────────────────────────────────────────
    # The output window defines pixel dimensions (W_out, H_out). Either:
    # - Provided directly (window_out)
    # - Computed from bounds in dst_crs coordinate units
    #
    # Example: bounds=(0, 0, 1000, 1000) with 10m resolution β†’ (100, 100) pixels
    # ─────────────────────────────────────────────────────────────────────────
    if window_out is None:
        assert bounds is not None, (
            "Both window_out and bounds are None. This is needed to figure out the size of the output array"
        )
        window_out = rasterio.windows.from_bounds(*bounds, transform=dst_transform).round_lengths(
            op="ceil", pixel_precision=PIXEL_PRECISION
        )

    crs_data_in = data_in.crs
    if dst_crs is None:
        dst_crs = crs_data_in

    # ─────────────────────────────────────────────────────────────────────────
    # STEP 3: CHECK FOR NO-OP OPTIMIZATION
    # ─────────────────────────────────────────────────────────────────────────
    # If source and destination have:
    #   - Same CRS
    #   - Same pixel size (transform.a and transform.e)
    #   - Grid-aligned origins (integer pixel offset)
    # Then we can skip reprojection entirely and use a simple window read.
    # This is ~10-100x faster for aligned data.
    # ─────────────────────────────────────────────────────────────────────────
    if window_utils.compare_crs(dst_crs, crs_data_in):
        transform_data = data_in.transform
        if (
            (dst_transform.a == transform_data.a)
            and (dst_transform.b == transform_data.b)
            and (dst_transform.d == transform_data.d)
            and (dst_transform.e == transform_data.e)
        ):
            # Find pixel offset between src and dst grids
            # Uses inverse transform: geo coords β†’ pixel coords
            x_dst, y_dst = dst_transform.c, dst_transform.f
            col_off, row_off = ~transform_data * (x_dst, y_dst)
            window_in_data = rasterio.windows.Window(col_off, row_off, window_out.width, window_out.height)

            # Only use fast path if offsets are exact integers (grids align)
            if _is_exact_round(window_in_data.row_off) and _is_exact_round(window_in_data.col_off):
                window_in_data = window_in_data.round_offsets(op="floor", pixel_precision=PIXEL_PRECISION)
                return read_from_window(data_in, window_in_data, return_only_data=return_only_data, trigger_load=True)

    # ─────────────────────────────────────────────────────────────────────────
    # STEP 4: HANDLE DATA TYPES
    # ─────────────────────────────────────────────────────────────────────────
    # Boolean arrays need special handling:
    #   bool β†’ float32 β†’ interpolate β†’ threshold(0.5) β†’ bool
    # This prevents interpolation artifacts in mask data while still
    # allowing smooth boundaries (anti-aliasing effect).
    # ─────────────────────────────────────────────────────────────────────────
    isbool_dtypein = data_in.dtype == "bool"
    isbool_dtypedst = False

    cast = True
    if dtype_dst is None:
        cast = False
        dtype_dst = data_in.dtype
        if isbool_dtypein:
            isbool_dtypedst = True
    elif np.dtype(dtype_dst) == "bool":
        isbool_dtypedst = True

    # ─────────────────────────────────────────────────────────────────────────
    # STEP 5: ALLOCATE OUTPUT ARRAY
    # ─────────────────────────────────────────────────────────────────────────
    # Pre-allocate with nodata fill. Shape is built by replacing x,y dims
    # with the new window dimensions while preserving other dims (band, time).
    #
    # Example shape transformation:
    #   Input:  (4, 1000, 1000)  β†’ 4 bands, 1000Γ—1000 pixels
    #   Output: (4, 500, 600)    β†’ 4 bands, 500Γ—600 pixels (new extent)
    # ─────────────────────────────────────────────────────────────────────────
    dict_shape_window_out = {"x": window_out.width, "y": window_out.height}
    shape_out = tuple([named_shape[s] if s not in ["x", "y"] else dict_shape_window_out[s] for s in named_shape])
    dst_nodata = dst_nodata or data_in.fill_value_default
    if isbool_dtypedst:
        dst_nodata = bool(dst_nodata)

    destination = np.full(shape_out, fill_value=dst_nodata, dtype=dtype_dst)

    # ─────────────────────────────────────────────────────────────────────────
    # STEP 6: CHECK INTERSECTION
    # ─────────────────────────────────────────────────────────────────────────
    # If the requested output region doesn't overlap the input data at all,
    # return early with the nodata-filled array. Saves computation.
    # ─────────────────────────────────────────────────────────────────────────
    polygon_dst_crs = window_utils.window_polygon(window_out, dst_transform)

    if not data_in.footprint(crs=dst_crs).intersects(polygon_dst_crs):
        return GeoTensor(destination, transform=dst_transform, crs=dst_crs, fill_value_default=dst_nodata)

    # ─────────────────────────────────────────────────────────────────────────
    # STEP 7: LOAD SOURCE DATA
    # ─────────────────────────────────────────────────────────────────────────
    # For lazy data (xarray/dask), read only the region that will contribute
    # to the output, plus a 3-pixel buffer for interpolation edge handling.
    # The buffer prevents edge artifacts from bilinear/cubic resampling.
    # ─────────────────────────────────────────────────────────────────────────
    if not isinstance(data_in, GeoTensor):
        geotensor_in = read_from_polygon(
            data_in, polygon_dst_crs, crs_polygon=dst_crs, pad_add=(3, 3), return_only_data=False, trigger_load=True
        )
    else:
        geotensor_in = data_in

    np_array_in = np.asanyarray(geotensor_in.values)

    # Type casting for interpolation compatibility
    if cast:
        if isbool_dtypedst:
            np_array_in = np_array_in.astype(np.float32)
        else:
            np_array_in = np_array_in.astype(dtype_dst)
    elif isbool_dtypein:
        np_array_in = np_array_in.astype(np.float32)

    # ─────────────────────────────────────────────────────────────────────────
    # STEP 8: ITERATE OVER NON-SPATIAL DIMENSIONS
    # ─────────────────────────────────────────────────────────────────────────
    # rasterio.warp.reproject operates on 2D arrays. For multi-band or
    # multi-temporal data, we iterate over all (time, band) combinations.
    #
    # Example: shape (4, 3, H, W) with dims ('time', 'band', 'y', 'x')
    #   β†’ iterates over 4Γ—3=12 slices: (0,0), (0,1), (0,2), (1,0), ...
    # ─────────────────────────────────────────────────────────────────────────
    index_iter = [[(ns, i) for i in range(s)] for ns, s in named_shape.items() if ns not in ["x", "y"]]

    for current_select_tuple in itertools.product(*index_iter):
        # Extract the index tuple: (('time', 0), ('band', 1)) β†’ (0, 1)
        i_sel_tuple = tuple(t[1] for t in current_select_tuple)

        np_array_iter = np_array_in[i_sel_tuple]
        if isbool_dtypedst:
            dst_iter_write = destination[i_sel_tuple].astype(np.float32)
            dst_nodata_iter = float(dst_nodata)
        else:
            dst_iter_write = destination[i_sel_tuple]
            dst_nodata_iter = dst_nodata

        # ─────────────────────────────────────────────────────────────────────
        # CORE REPROJECTION: rasterio.warp.reproject
        # ─────────────────────────────────────────────────────────────────────
        # This is where the actual coordinate transformation happens:
        # 1. For each output pixel, compute its geographic coordinates
        # 2. Transform those coords from dst_crs to src_crs
        # 3. Sample the input raster at those locations using resampling
        # 4. Write the result to the output array
        # ─────────────────────────────────────────────────────────────────────
        rasterio.warp.reproject(
            np_array_iter,
            dst_iter_write,
            src_transform=geotensor_in.transform,
            src_crs=crs_data_in,
            dst_transform=dst_transform,
            dst_crs=dst_crs,
            src_nodata=geotensor_in.fill_value_default,
            dst_nodata=dst_nodata_iter,
            resampling=resampling,
        )

        # Convert interpolated floats back to boolean via thresholding
        if isbool_dtypedst:
            destination[i_sel_tuple] = dst_iter_write > 0.5

    if return_only_data:
        return destination

    return GeoTensor(destination, transform=dst_transform, crs=dst_crs, fill_value_default=dst_nodata)

read_rpcs(input_npy, rpcs, fill_value_default=0, dst_crs=None, resolution_dst_crs=None, resampling=rasterio.warp.Resampling.cubic_spline, return_only_data=False)

This function georreferences an array using the RPCs. The RPCs are used to compute the transform from the input array to the destination crs.

This function assumes that the RPCs are in EPSG:4326.

Parameters:

Name Type Description Default
input_npy NDArray

Array to georeference. It must have 2, 3 or 4 dimensions.

required
rpcs RPC

RPCs to compute the transform.

required
fill_value_default int

how to encode the nodata value. Defaults to 0.

0
dst_crs Optional[Any]

Destination crs. Defaults to None. If None, the dst_crs is the same as in the RPC polynomial (EPSG:4326).

None
resampling Resampling

Resampling method. Defaults to rasterio.warp.Resampling.cubic_spline.

cubic_spline
return_only_data bool

If True it returns only the data. Defaults to False.

False

Returns:

Name Type Description
GeoTensor GeoTensor

GeoTensor with the georeferenced array based on the RPCs.

Source code in georeader/read.py
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
def read_rpcs(
    input_npy: NDArray,
    rpcs: rasterio.rpc.RPC,
    fill_value_default: int = 0,
    dst_crs: Optional[Any] = None,
    resolution_dst_crs: Optional[Union[float, Tuple[float, float]]] = None,
    resampling: rasterio.warp.Resampling = rasterio.warp.Resampling.cubic_spline,
    return_only_data: bool = False,
) -> GeoTensor:
    """
    This function georreferences an array using the RPCs.
        The RPCs are used to compute the transform from the input array to the destination crs.

        This function assumes that the RPCs are in EPSG:4326.

    Args:
        input_npy (NDArray): Array to georeference. It must have 2, 3 or 4 dimensions.
        rpcs (rasterio.rpc.RPC): RPCs to compute the transform.
        fill_value_default (int, optional): how to encode the nodata value. Defaults to 0.
        dst_crs (Optional[Any], optional): Destination crs. Defaults to None.
            If None, the dst_crs is the same as in the RPC polynomial (EPSG:4326).
        resampling (rasterio.warp.Resampling, optional): Resampling method.
            Defaults to rasterio.warp.Resampling.cubic_spline.
        return_only_data (bool, optional): If True it returns only the data. Defaults to False.

    Returns:
        GeoTensor: GeoTensor with the georeferenced array based on the RPCs.
    """

    isbool_dtypedst = input_npy.dtype == "bool"
    if isbool_dtypedst:
        fill_value_default = bool(fill_value_default)

    assert input_npy.ndim >= 2 and input_npy.ndim <= 4, "Input array must have 2, 3 or 4 dimensions"

    named_shape = OrderedDict(reversed(list(zip(["y", "x", "band", "time"], reversed(input_npy.shape)))))

    index_iter = [[(ns, i) for i in range(s)] for ns, s in named_shape.items() if ns not in ["x", "y"]]
    # e.g. if named_shape = {'time': 4, 'band': 2, 'x':10, 'y': 10} index_iter ->
    # [[('time', 0), ('time', 1), ('time', 2), ('time', 3)],
    #  [('band', 0), ('band', 1)]]

    if dst_crs is None:
        dst_crs = rasterio.crs.CRS.from_epsg(4326)

    src_crs = rasterio.crs.CRS.from_epsg(4326)

    if resolution_dst_crs is not None:
        if isinstance(resolution_dst_crs, float):
            resolution_dst_crs = (resolution_dst_crs, resolution_dst_crs)

    dst_transform, dst_width, dst_height = rasterio.warp.calculate_default_transform(
        src_crs=None,
        dst_crs=dst_crs,
        width=input_npy.shape[-1],
        height=input_npy.shape[-2],
        resolution=resolution_dst_crs,
        rpcs=rpcs,
        dst_width=None,
        dst_height=None,
    )

    destination = np.full(
        input_npy.shape[:-2] + (dst_height, dst_width), fill_value=fill_value_default, dtype=input_npy.dtype
    )

    for current_select_tuple in itertools.product(*index_iter):
        # current_select_tuple = (('time', 0), ('band', 0))
        i_sel_tuple = tuple(t[1] for t in current_select_tuple)

        np_array_iter = input_npy[i_sel_tuple]
        if isbool_dtypedst:
            dst_iter_write = destination[i_sel_tuple].astype(np.float32)
            fill_value_default_iter = float(fill_value_default)
        else:
            dst_iter_write = destination[i_sel_tuple]
            fill_value_default_iter = fill_value_default

        rasterio.warp.reproject(
            np_array_iter,
            dst_iter_write,
            src_transform=None,
            rpcs=rpcs,
            src_crs=src_crs,
            dst_transform=dst_transform,
            dst_crs=dst_crs,
            src_nodata=fill_value_default_iter,
            dst_nodata=fill_value_default_iter,
            resampling=resampling,
        )

        if isbool_dtypedst:
            destination[i_sel_tuple] = dst_iter_write > 0.5

    if return_only_data:
        return destination

    return GeoTensor(destination, transform=dst_transform, crs=dst_crs, fill_value_default=fill_value_default)

Mosaic Module: Combine multiple rasters into seamless composite images.

This module provides functions to merge multiple overlapping rasters into a single output, handling reprojection, resampling, and nodata filling. Essential for creating cloud-free composites and gap-free mosaics.

Spatial Mosaic Overview

Combining multiple rasters with varying coverage::

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    SPATIAL MOSAIC CONCEPT                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                          β”‚
β”‚  Input Rasters (with gaps)              Output Mosaic                   β”‚
β”‚  ─────────────────────────              ─────────────                   β”‚
β”‚                                                                          β”‚
β”‚   Raster 1         Raster 2                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚  β”‚β–“β–“β–“β–‘β–‘β–‘β–‘β–‘β–‘β”‚      β”‚β–‘β–‘β–‘β–‘β–‘β–“β–“β–“β–“β”‚           β”‚β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β”‚            β”‚
β”‚  β”‚β–“β–“β–“β–“β–‘β–‘β–‘β–‘β–‘β”‚  +   β”‚β–‘β–‘β–‘β–‘β–“β–“β–“β–“β–“β”‚    ═══►   β”‚β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β”‚            β”‚
β”‚  β”‚β–“β–“β–“β–“β–“β–‘β–‘β–‘β–‘β”‚      β”‚β–‘β–‘β–‘β–“β–“β–“β–“β–“β–“β”‚           β”‚β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β”‚            β”‚
β”‚  β”‚β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β”‚      β”‚β–‘β–‘β–“β–“β–“β–“β–“β–“β–“β”‚           β”‚β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β”‚            β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚                                                                          β”‚
β”‚   β–‘ = nodata/gaps                        Gaps filled from               β”‚
β”‚   β–“ = valid data                         overlapping rasters            β”‚
β”‚                                                                          β”‚
β”‚  Processing Order:                                                       β”‚
β”‚  β€’ First raster fills as much as possible                               β”‚
β”‚  β€’ Each subsequent raster fills remaining gaps                          β”‚
β”‚  β€’ Continues until no nodata remains (or list exhausted)                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Temporal Mosaic / Reduction

Combining rasters from multiple time steps::

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    TEMPORAL REDUCTION CONCEPT                            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                          β”‚
β”‚  Time Series Input                      Reduction Output                β”‚
β”‚  ─────────────────                      ────────────────                β”‚
β”‚                                                                          β”‚
β”‚   t=1    t=2    t=3                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”  β”Œβ”€β”€β”€β”  β”Œβ”€β”€β”€β”                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚
β”‚  β”‚ 5 β”‚  β”‚ 7 β”‚  β”‚ 6 β”‚                   β”‚               β”‚               β”‚
β”‚  β”‚   β”‚  β”‚   β”‚  β”‚   β”‚   ─────────────►  β”‚  median = 6   β”‚               β”‚
β”‚  β”‚   β”‚  β”‚   β”‚  β”‚   β”‚   np.nanmedian    β”‚  mean = 6.0   β”‚               β”‚
β”‚  β””β”€β”€β”€β”˜  β””β”€β”€β”€β”˜  β””β”€β”€β”€β”˜   np.nanmean      β”‚  max = 7      β”‚               β”‚
β”‚                                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
β”‚                                                                          β”‚
β”‚  Common Reduction Functions:                                            β”‚
β”‚  β€’ np.nanmedian: Robust to outliers (clouds, shadows)                   β”‚
β”‚  β€’ np.nanmean: Average value                                            β”‚
β”‚  β€’ np.nanmax: Maximum composite (e.g., max NDVI)                        β”‚
β”‚  β€’ np.nanmin: Minimum composite                                         β”‚
β”‚  β€’ np.nanstd: Temporal variability                                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Mosaic with Masks

Using external validity masks to control which pixels are used::

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    MASKED MOSAIC WORKFLOW                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                          β”‚
β”‚  Input: (data, mask) tuples                                             β”‚
β”‚  ─────────────────────────                                              β”‚
β”‚                                                                          β”‚
β”‚   Raster 1     Cloud Mask      β”‚    Raster 2     Cloud Mask            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚  β”‚β–“β–“β–“β–“β–“β–“β–“β–“β–“β”‚  β”‚β–‘β–‘β–‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β”‚      β”‚   β”‚β–“β–“β–“β–“β–“β–“β–“β–“β–“β”‚  β”‚β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β”‚            β”‚
β”‚  β”‚β–“β–“β–“β–“β–“β–“β–“β–“β–“β”‚  β”‚β–‘β–‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β”‚      β”‚   β”‚β–“β–“β–“β–“β–“β–“β–“β–“β–“β”‚  β”‚β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β”‚            β”‚
β”‚  β”‚β–“β–“β–“β–“β–“β–“β–“β–“β–“β”‚  β”‚β–‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β”‚   +  β”‚   β”‚β–“β–“β–“β–“β–“β–“β–“β–“β–“β”‚  β”‚β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β”‚            β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚                   ↑            β”‚                                        β”‚
β”‚               β–ˆ = invalid      β”‚   Uses Raster 2 where Raster 1        β”‚
β”‚               (cloud/shadow)   β”‚   is masked as invalid                 β”‚
β”‚                                                                          β”‚
β”‚  Usage:                                                                  β”‚
β”‚    data_list = [(raster1, mask1), (raster2, mask2), ...]               β”‚
β”‚    mosaic = spatial_mosaic(data_list, ...)                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Module Functions Overview

Spatial Mosaicking
  • :func:spatial_mosaic: Merge rasters to fill nodata gaps
  • :func:spatial_mosaic_chunked: Memory-efficient chunked processing
Temporal Reduction
  • :func:rasters_reduction: Apply reduction function across rasters
  • :func:pad_add_rasters: Align and stack rasters for reduction

Quick Start

Create a cloud-free mosaic from multiple images::

from georeader import mosaic, read

# List of overlapping raster readers
rasters = [reader1, reader2, reader3]

# Create mosaic (fills gaps with subsequent images)
result = mosaic.spatial_mosaic(
    rasters,
    bounds=(-122.5, 37.0, -122.0, 37.5),
    dst_crs="EPSG:4326",
    dst_nodata=0
)

Compute median composite from time series::

from georeader import mosaic
import numpy as np

# Stack aligned rasters and compute median
result = mosaic.rasters_reduction(
    raster_list,
    reducer=np.nanmedian,
    dst_crs="EPSG:32610"
)

See Also

georeader.read : Reading and reprojection functions georeader.slices : Array slicing for chunked processing georeader.geotensor : Output format

References

  • Rasterio merge: https://rasterio.readthedocs.io/en/latest/api/rasterio.merge.html
  • Cloud masking strategies: See georeader.readers.cloudsen12

spatial_mosaic(data_list, polygon=None, crs_polygon=None, dst_transform=None, bounds=None, dst_crs=None, dtype_dst=None, window_size=None, resampling=rasterio.warp.Resampling.cubic_spline, masking_function=None, dst_nodata=None)

Create a spatial mosaic by filling gaps with data from overlapping rasters.

Combines multiple rasters into a single output by iteratively filling nodata regions with valid data from subsequent rasters. The first raster is used as the base, and remaining rasters fill in only where the base has nodata values.

This function is similar to rasterio.merge.merge but with support for:

  • Custom validity masks per raster
  • Masking functions (e.g., cloud masks)
  • Windowed processing for memory efficiency

Parameters:

Name Type Description Default
data_list Union[List[GeoData], List[Tuple[GeoData, GeoData]]]

Input rasters to mosaic. Can be:

  • List of GeoData objects: Each raster's nodata value determines validity
  • List of (data, mask) tuples: mask indicates invalid pixels to fill

Rasters are processed in order; first valid pixel wins.

required
polygon Optional[Polygon]

Output extent as a shapely Polygon. If provided, mosaic is clipped to this polygon. CRS specified by crs_polygon.

None
crs_polygon Optional[str]

CRS of the polygon. If not provided, uses the CRS of the first raster.

None
dst_transform Optional[Affine]

Output transform. If not provided, computed from bounds or polygon.

None
bounds Optional[Tuple[float, float, float, float]]

Output extent as (minx, miny, maxx, maxy). Alternative to polygon.

None
dst_crs Optional[str]

Output CRS. If not provided, uses CRS of first raster.

None
dtype_dst Optional[str]

Output data type. If not provided, uses dtype of first raster.

None
window_size Optional[Tuple[int, int]]

Process in tiles of this size (height, width) for memory efficiency. Default None (process all at once).

None
resampling Resampling

Resampling method for reprojection. Default cubic_spline for continuous data.

cubic_spline
masking_function Optional[Callable[[GeoData], GeoData]]

Function that takes a GeoData and returns a boolean mask of INVALID pixels. Applied to each raster before mosaicking.

None
dst_nodata Optional[int]

Output nodata value. If not provided, uses fill_value_default of first raster.

None

Returns:

Name Type Description
GeoTensor GeoTensor

Mosaic covering the specified extent. Nodata regions are filled by iterating through data_list until all pixels are valid or list is exhausted.

Examples:

Basic mosaic of overlapping Sentinel-2 scenes:

>>> from georeader import mosaic
>>> from georeader.rasterio_reader import RasterioReader
>>>
>>> # Load overlapping scenes
>>> scene1 = RasterioReader("scene1.tif")
>>> scene2 = RasterioReader("scene2.tif")
>>> scene3 = RasterioReader("scene3.tif")
>>>
>>> # Create seamless mosaic
>>> result = mosaic.spatial_mosaic(
...     [scene1, scene2, scene3],
...     bounds=(-122.5, 37.0, -121.5, 38.0),
...     dst_crs="EPSG:4326"
... )

Mosaic with cloud masks (tuple format):

>>> # Each tuple is (data, cloud_mask) where cloud_mask=True means cloudy
>>> result = mosaic.spatial_mosaic(
...     [(scene1, cloud1), (scene2, cloud2), (scene3, cloud3)],
...     bounds=(-122.5, 37.0, -121.5, 38.0),
...     dst_crs="EPSG:4326"
... )
>>> # Cloud-covered pixels in scene1 are filled from scene2, etc.

Memory-efficient tiled processing:

>>> result = mosaic.spatial_mosaic(
...     large_scene_list,
...     bounds=extent,
...     window_size=(1024, 1024)  # Process in 1024x1024 tiles
... )
See Also

georeader.read.read_reproject: Underlying reprojection function. rasterio.merge.merge: Similar functionality in rasterio.

Note
  • Processing order matters: earlier rasters have priority
  • Use window_size for large outputs to avoid memory issues
  • Set appropriate resampling for your data type (nearest for categorical)
Source code in georeader/mosaic.py
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
def spatial_mosaic(data_list:Union[List[GeoData], List[Tuple[GeoData,GeoData]]],
                   polygon:Optional[Polygon]=None,
                   crs_polygon:Optional[str]=None,
                   dst_transform:Optional[rasterio.transform.Affine]=None,
                   bounds:Optional[Tuple[float, float, float, float]]=None,
                   dst_crs:Optional[str]=None,
                   dtype_dst:Optional[str]=None,
                   window_size: Optional[Tuple[int, int]]= None,
                   resampling:rasterio.warp.Resampling=rasterio.warp.Resampling.cubic_spline,
                   masking_function:Optional[Callable[[GeoData], GeoData]]=None,
                   dst_nodata:Optional[int]=None) -> GeoTensor:
    """
    Create a spatial mosaic by filling gaps with data from overlapping rasters.

    Combines multiple rasters into a single output by iteratively filling nodata
    regions with valid data from subsequent rasters. The first raster is used as
    the base, and remaining rasters fill in only where the base has nodata values.

    This function is similar to `rasterio.merge.merge` but with support for:

    - Custom validity masks per raster
    - Masking functions (e.g., cloud masks)
    - Windowed processing for memory efficiency

    Args:
        data_list (Union[List[GeoData], List[Tuple[GeoData, GeoData]]]): Input
            rasters to mosaic. Can be:

            - List of GeoData objects: Each raster's nodata value determines validity
            - List of (data, mask) tuples: mask indicates invalid pixels to fill

            Rasters are processed in order; first valid pixel wins.
        polygon (Optional[Polygon]): Output extent as a shapely Polygon. If provided,
            mosaic is clipped to this polygon. CRS specified by crs_polygon.
        crs_polygon (Optional[str]): CRS of the polygon. If not provided, uses
            the CRS of the first raster.
        dst_transform (Optional[Affine]): Output transform. If not provided, computed
            from bounds or polygon.
        bounds (Optional[Tuple[float, float, float, float]]): Output extent as
            (minx, miny, maxx, maxy). Alternative to polygon.
        dst_crs (Optional[str]): Output CRS. If not provided, uses CRS of first raster.
        dtype_dst (Optional[str]): Output data type. If not provided, uses dtype
            of first raster.
        window_size (Optional[Tuple[int, int]]): Process in tiles of this size
            (height, width) for memory efficiency. Default None (process all at once).
        resampling (Resampling): Resampling method for reprojection.
            Default cubic_spline for continuous data.
        masking_function (Optional[Callable[[GeoData], GeoData]]): Function that
            takes a GeoData and returns a boolean mask of INVALID pixels.
            Applied to each raster before mosaicking.
        dst_nodata (Optional[int]): Output nodata value. If not provided, uses
            fill_value_default of first raster.

    Returns:
        GeoTensor: Mosaic covering the specified extent. Nodata regions are filled
            by iterating through data_list until all pixels are valid or list is
            exhausted.

    Examples:
        Basic mosaic of overlapping Sentinel-2 scenes:

        >>> from georeader import mosaic
        >>> from georeader.rasterio_reader import RasterioReader
        >>>
        >>> # Load overlapping scenes
        >>> scene1 = RasterioReader("scene1.tif")
        >>> scene2 = RasterioReader("scene2.tif")
        >>> scene3 = RasterioReader("scene3.tif")
        >>>
        >>> # Create seamless mosaic
        >>> result = mosaic.spatial_mosaic(
        ...     [scene1, scene2, scene3],
        ...     bounds=(-122.5, 37.0, -121.5, 38.0),
        ...     dst_crs="EPSG:4326"
        ... )

        Mosaic with cloud masks (tuple format):

        >>> # Each tuple is (data, cloud_mask) where cloud_mask=True means cloudy
        >>> result = mosaic.spatial_mosaic(
        ...     [(scene1, cloud1), (scene2, cloud2), (scene3, cloud3)],
        ...     bounds=(-122.5, 37.0, -121.5, 38.0),
        ...     dst_crs="EPSG:4326"
        ... )
        >>> # Cloud-covered pixels in scene1 are filled from scene2, etc.

        Memory-efficient tiled processing:

        >>> result = mosaic.spatial_mosaic(
        ...     large_scene_list,
        ...     bounds=extent,
        ...     window_size=(1024, 1024)  # Process in 1024x1024 tiles
        ... )

    See Also:
        georeader.read.read_reproject: Underlying reprojection function.
        rasterio.merge.merge: Similar functionality in rasterio.

    Note:
        - Processing order matters: earlier rasters have priority
        - Use window_size for large outputs to avoid memory issues
        - Set appropriate resampling for your data type (nearest for categorical)
    """

    assert len(data_list) > 0, f"Expected at least one product found 0 {data_list}"

    if isinstance(data_list[0], tuple):
        first_data_object =  data_list[0][0]
        first_mask_object = data_list[0][1]
    else:
        first_data_object = data_list[0]
        first_mask_object = None

    if dst_transform is None:
        dst_transform = first_data_object.transform

    if dst_crs is None:
        dst_crs = first_data_object.crs

    if polygon is None:
        if bounds is not None:
            polygon = box(*bounds)
        else:
            # Polygon is the Union of the polygons of all the data
            for data in data_list:
                if isinstance(data, tuple):
                    data = data[0]
                polygon_iter = data.footprint(crs=dst_crs)

                if polygon is None:
                    polygon = polygon_iter
                else:
                    polygon = polygon.union(polygon_iter)
    else:
        if crs_polygon is None:
            crs_polygon = dst_crs
        elif not georeader.compare_crs(crs_polygon, dst_crs):
            polygon = window_utils.polygon_to_crs(polygon, crs_polygon, dst_crs)

    GeoDataFake = namedtuple("GeoDataFake", ["transform", "crs"])
    window_polygon = read.window_from_polygon(GeoDataFake(transform=dst_transform, crs=dst_crs),
                                              polygon, crs_polygon=dst_crs)

    window_polygon = window_utils.round_outer_window(window_polygon)

    # Shift transform to window
    dst_transform = rasterio.windows.transform(window_polygon, transform=dst_transform)
    dst_nodata = dst_nodata or first_data_object.fill_value_default

    # Get object to save the results
    data_return = read_reproject(first_data_object,
                                 dst_crs=dst_crs, dst_transform=dst_transform,
                                 resampling=resampling,
                                 dtype_dst=dtype_dst,
                                 window_out=rasterio.windows.Window(row_off=0, col_off=0, width=window_polygon.width,
                                                                    height=window_polygon.height),
                                 dst_nodata=dst_nodata)

    # invalid_values of spatial locations only  -> any
    invalid_values = data_return.values == dst_nodata
    if len(data_return.shape) > 2:
        axis_any = tuple(i for i in range(len(data_return.shape)-2))
        invalid_values = np.any(invalid_values, axis=axis_any) # (H, W)
    else:
        axis_any = None

    if first_mask_object is not None:
        if (masking_function is None) and len(first_mask_object.shape) > 2:
            assert (len(first_mask_object.shape) == 3) and (first_mask_object.shape[0] == 1), f"Expected two dims, found {first_mask_object.shape}"

        invalid_geotensor = read_reproject(first_mask_object,
                                           dst_crs=dst_crs, dst_transform=dst_transform,
                                           resampling=rasterio.warp.Resampling.nearest,
                                           window_out=rasterio.windows.Window(row_off=0, col_off=0,
                                                                              width=window_polygon.width,
                                                                              height=window_polygon.height))
        if masking_function is not None:
            invalid_geotensor = masking_function(invalid_geotensor)

        invalid_geotensor.values = invalid_geotensor.values.astype(bool)
        invalid_geotensor.values =  invalid_geotensor.values.squeeze()
        assert len(invalid_geotensor.shape) == 2, f"Invalid mask expected 2 dims found {invalid_geotensor.shape}"

        invalid_values|= invalid_geotensor.values
    elif masking_function is not None:
        # Apply masking funtion to the readed data
        invalid_geotensor = masking_function(data_return)

        invalid_geotensor.values = invalid_geotensor.values.astype(bool)
        invalid_geotensor.values = invalid_geotensor.values.squeeze()
        assert len(invalid_geotensor.shape) == 2, f"Invalid mask expected 2 dims found {invalid_geotensor.shape}"
        invalid_values |= invalid_geotensor.values

    # data_return.values[..., invalid_values] = data_return.fill_value_default

    if not np.any(invalid_values):
        return data_return

    if len(data_list) == 1:
        return data_return

    if window_size is not None:
        windows = slices.create_windows(data_return.shape[-2:], window_size)
    else:
        windows = [rasterio.windows.Window(row_off=0, col_off=0, width=data_return.shape[-1],
                                           height=data_return.shape[-2])]

    # Cache of the polygons geodata
    polygons_geodata = [None for _ in range(len(data_list)-1)]

    for window in windows:
        slice_spatial = window.toslices()
        invalid_values_window = invalid_values[slice_spatial]
        if not np.any(invalid_values_window):
            continue

        # Add dims to slice_obj
        slice_obj = tuple(slice(None) for _ in range(len(data_return.shape)-2)) + slice_spatial
        dst_transform_iter = rasterio.windows.transform(window, transform=dst_transform)
        window_reproject_iter = rasterio.windows.Window(row_off=0, col_off=0, width=window.width, height=window.height)
        polygon_iter = window_utils.window_polygon(window, dst_transform)

        for _i, data in enumerate(data_list[1:]):
            if isinstance(data, tuple):
                geodata = data[0]
                geomask = data[1]
            else:
                geodata = data
                geomask = None

            if polygons_geodata[_i] is None:
                polygons_geodata[_i] = geodata.footprint(crs=dst_crs)

            polygon_geodata = polygons_geodata[_i]

            if not polygon_geodata.intersects(polygon_iter):
                continue

            if geomask is not None:
                if (masking_function is None) and len(geomask.shape) > 2:
                    assert (len(geomask.shape) == 3) and (
                                geomask.shape[0] == 1), f"Expected two dims, found {geomask.shape}"

                invalid_geotensor = read_reproject(geomask,
                                                   dst_crs=dst_crs, dst_transform=dst_transform_iter,
                                                   resampling=rasterio.warp.Resampling.nearest,
                                                   window_out=window_reproject_iter)
                if masking_function is not None:
                    invalid_geotensor = masking_function(invalid_geotensor)

                invalid_geotensor.values = invalid_geotensor.values.astype(bool)
                invalid_geotensor.values = invalid_geotensor.values.squeeze()
                assert len(invalid_geotensor.shape) == 2, f"Invalid mask expected 2 dims found {invalid_geotensor.shape}"
                if np.all(invalid_geotensor.values):
                    continue
                invalid_values_iter = invalid_geotensor.values

            data_read = read_reproject(geodata, dst_crs=dst_crs, window_out=window_reproject_iter,
                                       dst_transform=dst_transform_iter, resampling=resampling,
                                       dtype_dst=dtype_dst,
                                       dst_nodata=dst_nodata)

            if (geomask is None) and (masking_function is not None):
                invalid_geotensor = masking_function(data_read)

                invalid_geotensor.values = invalid_geotensor.values.astype(bool)
                invalid_geotensor.values = invalid_geotensor.values.squeeze()
                assert len(invalid_geotensor.shape) == 2, f"Invalid mask expected 2 dims found {invalid_geotensor.shape}"
                if np.all(invalid_geotensor.values):
                    continue
                invalid_values_iter = invalid_geotensor.values

            # data_read could have more dims -> any
            masked_values_read = data_read.values == dst_nodata
            if axis_any is not None:
                masked_values_read = np.any(masked_values_read, axis=axis_any)  # (H, W)

            if (geomask is not None) or (masking_function is not None):
                invalid_values_iter |= masked_values_read
            else:
                invalid_values_iter = masked_values_read

            # Copy values invalids in window and valids in iter
            mask_values_copy_out = invalid_values_window & ~invalid_values_iter
            data_return.values[slice_obj][..., mask_values_copy_out] = data_read.values[...,mask_values_copy_out]

            invalid_values_window &= invalid_values_iter

            if not np.any(invalid_values_window):
                break


    return data_return

Window Methods

Read Module: Window-based raster reading with reprojection and resampling.

This module provides functions to read raster data from various sources using window-based access patterns. It handles coordinate transformations, reprojection, and resampling - the core I/O operations for geospatial raster processing.

Reading Workflow Overview

The module supports multiple ways to specify the area of interest::

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    READING WORKFLOW: AREA SPECIFICATION                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                          β”‚
β”‚  Input Specification          Function                     Output       β”‚
β”‚  ────────────────────         ─────────────────────        ──────────   β”‚
β”‚                                                                          β”‚
β”‚  Polygon (geometry)     ───►  read_from_polygon()    ───►  GeoTensor   β”‚
β”‚                                                                          β”‚
β”‚  Bounds (minx,miny,     ───►  read_from_bounds()     ───►  GeoTensor   β”‚
β”‚          maxx,maxy)                                                      β”‚
β”‚                                                                          β”‚
β”‚  Center + Shape         ───►  read_from_center_coords() ─► GeoTensor   β”‚
β”‚  (x, y) + (H, W)                                                         β”‚
β”‚                                                                          β”‚
β”‚  Window (row_off,       ───►  read_from_window()     ───►  GeoTensor   β”‚
β”‚          col_off, H, W)                                                  β”‚
β”‚                                                                          β”‚
β”‚  Web Tile (x, y, z)     ───►  read_from_tile()       ───►  GeoTensor   β”‚
β”‚                                                                          β”‚
β”‚  Match another raster   ───►  read_reproject_like()  ───►  GeoTensor   β”‚
β”‚                                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Window vs Bounds Coordinates

Understanding the difference between pixel windows and geographic bounds::

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              WINDOW (PIXELS) vs BOUNDS (GEOGRAPHIC COORDINATES)          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                          β”‚
β”‚  WINDOW (pixel space)                BOUNDS (CRS units)                 β”‚
β”‚  ─────────────────────               ──────────────────                 β”‚
β”‚                                                                          β”‚
β”‚  (col_off, row_off)                  (minx, maxy)  ← upper-left         β”‚
β”‚       ↓                                   ↓                              β”‚
β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”‚
β”‚    β”‚ width pixels β”‚                  β”‚              β”‚ geographic        β”‚
β”‚    β”‚              β”‚   ◄═══════►      β”‚              β”‚ extent in         β”‚
β”‚    β”‚ height pixelsβ”‚    transform     β”‚              β”‚ CRS units         β”‚
β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚
β”‚                                           ↑                              β”‚
β”‚                                      (maxx, miny)  ← lower-right        β”‚
β”‚                                                                          β”‚
β”‚  Window: rasterio.windows.Window(col_off, row_off, width, height)       β”‚
β”‚  Bounds: (minx, miny, maxx, maxy) - order matches shapely/rasterio      β”‚
β”‚                                                                          β”‚
β”‚  Conversion:                                                             β”‚
β”‚    bounds = window_utils.window_bounds(window, transform)               β”‚
β”‚    window = window_from_bounds(data, bounds, crs_bounds)                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Reprojection & Resampling

When reading data into a different CRS or resolution::

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     REPROJECTION WORKFLOW                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                          β”‚
β”‚  Source CRS (e.g., EPSG:4326)         Target CRS (e.g., EPSG:32633)    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚
β”‚  β”‚  β•±β•²    β•±β•²    β•±β•²    β”‚              β”‚ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β”‚           β”‚
β”‚  β”‚ β•±  β•²  β•±  β•²  β•±  β•²   β”‚    ═════►    β”‚ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β”‚           β”‚
β”‚  β”‚β•±    β•²β•±    β•²β•±    β•²  β”‚   Reproject  β”‚ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β–‘ β”‚           β”‚
β”‚  β”‚ Irregular grid     β”‚   + Resample β”‚ Regular UTM grid   β”‚           β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚
β”‚                                                                          β”‚
β”‚  Resampling Methods (rasterio.warp.Resampling):                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚ Method         β”‚ Best for                                       β”‚    β”‚
β”‚  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”‚
β”‚  β”‚ nearest        β”‚ Categorical data, masks, classification        β”‚    β”‚
β”‚  β”‚ bilinear       β”‚ Continuous data, fast                          β”‚    β”‚
β”‚  β”‚ cubic          β”‚ Continuous data, smooth                        β”‚    β”‚
β”‚  β”‚ cubic_spline   β”‚ Continuous data, very smooth (DEFAULT)         β”‚    β”‚
β”‚  β”‚ lanczos        β”‚ Downsampling, sharp edges                      β”‚    β”‚
β”‚  β”‚ average        β”‚ Downsampling, area-weighted mean               β”‚    β”‚
β”‚  β”‚ mode           β”‚ Downsampling categorical data                  β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                                                                          β”‚
β”‚  Anti-aliasing: Automatic Gaussian blur before downsampling to          β”‚
β”‚                 prevent aliasing artifacts. Controlled by:              β”‚
β”‚                 - anti_aliasing=True (default in resize)                β”‚
β”‚                 - anti_aliasing_sigma (auto-calculated or manual)       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Boundless Reading

Reading outside raster bounds returns fill values::

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    BOUNDLESS READING (boundless=True)                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                          β”‚
β”‚  Requested Window              Result with boundless=True               β”‚
β”‚  ─────────────────              ─────────────────────────               β”‚
β”‚                                                                          β”‚
β”‚       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                         β”‚
β”‚       β”‚ fill β”‚ data β”‚           β”‚  0  β”‚ data β”‚   fill_value_default    β”‚
β”‚       β”‚ ─────┼───── β”‚           β”‚ ────┼───── β”‚   fills out-of-bounds   β”‚
β”‚       β”‚ fill β”‚ data β”‚           β”‚  0  β”‚ data β”‚   pixels                β”‚
β”‚       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                         β”‚
β”‚            ↑                                                             β”‚
β”‚     Request extends                                                      β”‚
β”‚     beyond raster bounds                                                 β”‚
β”‚                                                                          β”‚
β”‚  boundless=False: Raises error or clips to valid region                 β”‚
β”‚  boundless=True:  Pads with fill_value_default (default behavior)       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Module Functions Overview

Window Creation
  • :func:window_from_polygon: Polygon geometry β†’ pixel window
  • :func:window_from_bounds: Geographic bounds β†’ pixel window
  • :func:window_from_center_coords: Center point + shape β†’ pixel window
  • :func:window_from_tile: Web mercator tile (x,y,z) β†’ pixel window
Reading Functions
  • :func:read_from_window: Read using pixel window
  • :func:read_from_polygon: Read area within polygon
  • :func:read_from_bounds: Read area within bounds
  • :func:read_from_center_coords: Read centered on point
  • :func:read_from_tile: Read web mercator tile
Reprojection
  • :func:read_reproject: Read with CRS transformation
  • :func:read_reproject_like: Match another raster's grid
  • :func:read_to_crs: Simple CRS conversion
  • :func:resize: Change resolution with anti-aliasing

Quick Start

Read a region by polygon::

from georeader import read
from shapely.geometry import box

# Define area of interest in WGS84
aoi = box(-122.5, 37.5, -122.0, 38.0)

# Read from raster (auto-transforms polygon to raster CRS)
gt = read.read_from_polygon(reader, aoi, crs_polygon="EPSG:4326")

Read and reproject to match another raster::

# Make data_in match data_like's grid exactly
gt_aligned = read.read_reproject_like(data_in, data_like)

Read a web map tile::

# Read tile at zoom 15, coordinates (x=5242, y=12661)
gt_tile = read.read_from_tile(reader, x=5242, y=12661, z=15)

See Also

georeader.geotensor : GeoTensor class returned by read functions georeader.window_utils : Lower-level window manipulation utilities georeader.rasterio_reader : RasterioReader for lazy file access

References

  • Rasterio windowed reading: https://rasterio.readthedocs.io/en/latest/topics/windowed-rw.html
  • Rasterio reprojection: https://rasterio.readthedocs.io/en/latest/topics/reproject.html
  • Web Mercator tiles: https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames

read_from_center_coords(data_in, center_coords, shape, crs_center_coords=None, return_only_data=False, trigger_load=False, boundless=True)

Extract a rectangular chip from raster data centered on geographic coordinates.

This function combines window calculation and data reading in one step. It's particularly useful for creating training chips for machine learning, extracting regions around points of interest, or generating thumbnails centered on specific locations.

Parameters:

Name Type Description Default
data_in GeoData

Input raster data with spatial reference (crs, transform). Must implement the GeoData protocol.

required
center_coords Tuple[float, float]

Center point as (x, y) in geographic coordinates. For WGS84, this would be (longitude, latitude). For projected CRS, (easting, northing).

required
shape Tuple[int, int]

Desired output size as (height, width) in pixels. The chip will have exactly this shape if boundless=True.

required
crs_center_coords Optional[Any]

Coordinate reference system of center_coords. If None, assumes coords are in the same CRS as data_in. Can be EPSG code string, CRS object, or WKT. Defaults to None.

None
return_only_data bool

If True, returns numpy array without georeferencing. If False, returns GeoData object with spatial metadata. Defaults to False.

False
trigger_load bool

If True, forces loading data into memory (for lazy readers). Defaults to False.

False
boundless bool

If True, output always matches shape, padding with fill_value_default for out-of-bounds areas. If False, clips to actual data extent. Defaults to True.

True

Returns:

Type Description
Union[GeoData, ndarray]

Union[GeoData, np.ndarray]: - If return_only_data=True: numpy array with shape (bands, height, width) or (height, width) - If return_only_data=False: GeoData object with transform adjusted to chip location

Examples:

>>> import rasterio
>>> from georeader import RasterioReader
>>>
>>> # Extract 512x512 chip centered on a location
>>> with rasterio.open('sentinel2.tif') as src:
...     reader = RasterioReader(src)
...     center = (-3.7038, 40.4168)  # Madrid (lon, lat)
...     chip = read_from_center_coords(reader, center, (512, 512),
...                                     crs_center_coords='EPSG:4326')
...     print(chip.shape)  # (bands, 512, 512)
...     print(chip.bounds)  # Geographic bounds of the chip
>>> # Get just the numpy array without georeference
>>> data_array = read_from_center_coords(reader, center, (256, 256),
...                                       crs_center_coords='EPSG:4326',
...                                       return_only_data=True)
>>> # Extract chip with different aspect ratio
>>> chip_rect = read_from_center_coords(reader, center, (256, 512))  # height=256, width=512
Note
  • The center coordinate refers to the geographic center, which maps to the pixel at (height/2, width/2) in the output chip.
  • For chips near image boundaries, boundless=True pads with fill_value_default.
  • The output transform is adjusted so the chip maintains correct georeferencing.
Source code in georeader/read.py
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
def read_from_center_coords(
    data_in: GeoData,
    center_coords: Tuple[float, float],
    shape: Tuple[int, int],
    crs_center_coords: Optional[Any] = None,
    return_only_data: bool = False,
    trigger_load: bool = False,
    boundless: bool = True,
) -> Union[GeoData, np.ndarray]:
    """
    Extract a rectangular chip from raster data centered on geographic coordinates.

    This function combines window calculation and data reading in one step. It's particularly
    useful for creating training chips for machine learning, extracting regions around points
    of interest, or generating thumbnails centered on specific locations.

    Args:
        data_in (GeoData): Input raster data with spatial reference (crs, transform).
            Must implement the GeoData protocol.
        center_coords (Tuple[float, float]): Center point as (x, y) in geographic coordinates.
            For WGS84, this would be (longitude, latitude). For projected CRS, (easting, northing).
        shape (Tuple[int, int]): Desired output size as (height, width) in pixels.
            The chip will have exactly this shape if boundless=True.
        crs_center_coords (Optional[Any], optional): Coordinate reference system of center_coords.
            If None, assumes coords are in the same CRS as `data_in`. Can be EPSG code string,
            CRS object, or WKT. Defaults to None.
        return_only_data (bool, optional): If True, returns numpy array without georeferencing.
            If False, returns GeoData object with spatial metadata. Defaults to False.
        trigger_load (bool, optional): If True, forces loading data into memory (for lazy readers).
            Defaults to False.
        boundless (bool, optional): If True, output always matches shape, padding with
            fill_value_default for out-of-bounds areas. If False, clips to actual data extent.
            Defaults to True.

    Returns:
        Union[GeoData, np.ndarray]:
            - If return_only_data=True: numpy array with shape (bands, height, width) or (height, width)
            - If return_only_data=False: GeoData object with transform adjusted to chip location

    Examples:
        >>> import rasterio
        >>> from georeader import RasterioReader
        >>>
        >>> # Extract 512x512 chip centered on a location
        >>> with rasterio.open('sentinel2.tif') as src:
        ...     reader = RasterioReader(src)
        ...     center = (-3.7038, 40.4168)  # Madrid (lon, lat)
        ...     chip = read_from_center_coords(reader, center, (512, 512),
        ...                                     crs_center_coords='EPSG:4326')
        ...     print(chip.shape)  # (bands, 512, 512)
        ...     print(chip.bounds)  # Geographic bounds of the chip

        >>> # Get just the numpy array without georeference
        >>> data_array = read_from_center_coords(reader, center, (256, 256),
        ...                                       crs_center_coords='EPSG:4326',
        ...                                       return_only_data=True)

        >>> # Extract chip with different aspect ratio
        >>> chip_rect = read_from_center_coords(reader, center, (256, 512))  # height=256, width=512

    Note:
        - The center coordinate refers to the geographic center, which maps to the pixel at
          (height/2, width/2) in the output chip.
        - For chips near image boundaries, boundless=True pads with fill_value_default.
        - The output transform is adjusted so the chip maintains correct georeferencing.
    """
    # Calculate the window that encompasses the desired chip area
    window = window_from_center_coords(data_in, center_coords, shape, crs_center_coords)

    # Read data from the calculated window
    return read_from_window(
        data_in, window=window, return_only_data=return_only_data, trigger_load=trigger_load, boundless=boundless
    )

window_from_bounds(data_in, bounds, crs_bounds=None)

Calculate the raster window corresponding to geographic bounds.

This function converts a bounding box from geographic coordinates to pixel coordinates, handling CRS transformation if needed. The bounds format follows the standard GIS convention.

Parameters:

Name Type Description Default
data_in Union[GeoDataBase, DatasetReader]

Raster data source with crs and transform attributes defining the spatial reference.

required
bounds Tuple[float, float, float, float]

Bounding box as (left, bottom, right, top) or (min_x, min_y, max_x, max_y) in the CRS specified by crs_bounds.

required
crs_bounds Optional[str]

Coordinate reference system of the bounds. If None, assumes bounds are in the same CRS as data_in. Defaults to None.

None

Returns:

Type Description
Window

rasterio.windows.Window: Window object with pixel coordinates (row_off, col_off, height, width) relative to data_in that corresponds to the geographic bounds.

Examples:

>>> import rasterio
>>> # Read a window from UTM bounds
>>> bounds_utm = (500000, 4649000, 501000, 4650000)  # 1km x 1km area
>>> with rasterio.open('utm_image.tif') as src:
...     window = window_from_bounds(src, bounds_utm)
...     data = src.read(window=window)
>>> # Read with CRS transformation
>>> bounds_wgs84 = (-3.71, 40.41, -3.69, 40.42)  # (lon_min, lat_min, lon_max, lat_max)
>>> with rasterio.open('utm_image.tif') as src:  # UTM image
...     window = window_from_bounds(src, bounds_wgs84, crs_bounds='EPSG:4326')
...     data = src.read(window=window)
Note

The returned window may extend beyond the raster boundaries. Use boundless reading or clip the window to raster extent as needed.

Source code in georeader/read.py
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
def window_from_bounds(
    data_in: Union[GeoDataBase, rasterio.DatasetReader],
    bounds: Tuple[float, float, float, float],
    crs_bounds: Optional[str] = None,
) -> rasterio.windows.Window:
    """
    Calculate the raster window corresponding to geographic bounds.

    This function converts a bounding box from geographic coordinates to pixel coordinates,
    handling CRS transformation if needed. The bounds format follows the standard GIS convention.

    Args:
        data_in (Union[GeoDataBase, rasterio.DatasetReader]): Raster data source with crs
            and transform attributes defining the spatial reference.
        bounds (Tuple[float, float, float, float]): Bounding box as (left, bottom, right, top)
            or (min_x, min_y, max_x, max_y) in the CRS specified by `crs_bounds`.
        crs_bounds (Optional[str], optional): Coordinate reference system of the bounds.
            If None, assumes bounds are in the same CRS as `data_in`. Defaults to None.

    Returns:
        rasterio.windows.Window: Window object with pixel coordinates (row_off, col_off, height, width)
            relative to `data_in` that corresponds to the geographic bounds.

    Examples:
        >>> import rasterio
        >>> # Read a window from UTM bounds
        >>> bounds_utm = (500000, 4649000, 501000, 4650000)  # 1km x 1km area
        >>> with rasterio.open('utm_image.tif') as src:
        ...     window = window_from_bounds(src, bounds_utm)
        ...     data = src.read(window=window)

        >>> # Read with CRS transformation
        >>> bounds_wgs84 = (-3.71, 40.41, -3.69, 40.42)  # (lon_min, lat_min, lon_max, lat_max)
        >>> with rasterio.open('utm_image.tif') as src:  # UTM image
        ...     window = window_from_bounds(src, bounds_wgs84, crs_bounds='EPSG:4326')
        ...     data = src.read(window=window)

    Note:
        The returned window may extend beyond the raster boundaries. Use boundless reading
        or clip the window to raster extent as needed.
    """
    # Transform bounds to raster's CRS if they're in a different CRS
    if (crs_bounds is not None) and not window_utils.compare_crs(crs_bounds, data_in.crs):
        # Reproject bounds: (left, bottom, right, top) β†’ same format in data_in.crs
        bounds_in = rasterio.warp.transform_bounds(crs_bounds, data_in.crs, *bounds)
    else:
        bounds_in = bounds

    # Convert geographic bounds to pixel window using raster's transform
    window_in = rasterio.windows.from_bounds(*bounds_in, transform=data_in.transform)

    return window_in

window_from_center_coords(data_in, center_coords, shape, crs_center_coords=None)

Calculate a raster window of specified size centered on geographic coordinates.

This function creates a window by converting the center point from geographic to pixel coordinates, then calculating the upper-left corner based on the desired shape. Handles both rectilinear and rotated/skewed transforms.

Parameters:

Name Type Description Default
data_in Union[GeoDataBase, DatasetReader]

Raster data source with crs and transform attributes defining the spatial reference.

required
center_coords Tuple[float, float]

Center point as (x, y) in geographic coordinates. For WGS84, this would be (longitude, latitude).

required
shape Tuple[int, int]

Desired window size as (height, width) in pixels. Shape format: (n_rows, n_cols).

required
crs_center_coords Optional[Any]

Coordinate reference system of center_coords. If None, assumes coords are in the same CRS as data_in. Defaults to None.

None

Returns:

Type Description
Window

rasterio.windows.Window: Window object centered on the specified coordinates with the requested shape: (row_off, col_off, height, width) in pixel coordinates.

Examples:

>>> import rasterio
>>> # Extract 256x256 window centered on a point
>>> center = (-3.7038, 40.4168)  # Madrid (lon, lat)
>>> window_shape = (256, 256)  # (height, width)
>>> with rasterio.open('image.tif') as src:
...     window = window_from_center_coords(src, center, window_shape,
...                                          crs_center_coords='EPSG:4326')
...     data = src.read(window=window)  # Shape: (bands, 256, 256)
>>> # For square chips, can use same value
>>> window = window_from_center_coords(src, center, (512, 512))
Note

The window may extend beyond raster boundaries if centered near edges. Use boundless reading to handle this case.

Source code in georeader/read.py
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
def window_from_center_coords(
    data_in: Union[GeoDataBase, rasterio.DatasetReader],
    center_coords: Tuple[float, float],
    shape: Tuple[int, int],
    crs_center_coords: Optional[Any] = None,
) -> rasterio.windows.Window:
    """
    Calculate a raster window of specified size centered on geographic coordinates.

    This function creates a window by converting the center point from geographic to pixel
    coordinates, then calculating the upper-left corner based on the desired shape. Handles
    both rectilinear and rotated/skewed transforms.

    Args:
        data_in (Union[GeoDataBase, rasterio.DatasetReader]): Raster data source with crs
            and transform attributes defining the spatial reference.
        center_coords (Tuple[float, float]): Center point as (x, y) in geographic coordinates.
            For WGS84, this would be (longitude, latitude).
        shape (Tuple[int, int]): Desired window size as (height, width) in pixels.
            Shape format: (n_rows, n_cols).
        crs_center_coords (Optional[Any], optional): Coordinate reference system of center_coords.
            If None, assumes coords are in the same CRS as `data_in`. Defaults to None.

    Returns:
        rasterio.windows.Window: Window object centered on the specified coordinates with
            the requested shape: (row_off, col_off, height, width) in pixel coordinates.

    Examples:
        >>> import rasterio
        >>> # Extract 256x256 window centered on a point
        >>> center = (-3.7038, 40.4168)  # Madrid (lon, lat)
        >>> window_shape = (256, 256)  # (height, width)
        >>> with rasterio.open('image.tif') as src:
        ...     window = window_from_center_coords(src, center, window_shape,
        ...                                          crs_center_coords='EPSG:4326')
        ...     data = src.read(window=window)  # Shape: (bands, 256, 256)

        >>> # For square chips, can use same value
        >>> window = window_from_center_coords(src, center, (512, 512))

    Note:
        The window may extend beyond raster boundaries if centered near edges.
        Use boundless reading to handle this case.
    """
    # Transform center coordinates to raster's CRS if needed
    if (crs_center_coords is not None) and not window_utils.compare_crs(crs_center_coords, data_in.crs):
        center_coords = _transform_from_crs(center_coords, crs_center_coords, data_in.crs)

    transform = data_in.transform

    # Convert geographic center to pixel coordinates
    # ~transform is the inverse: geo β†’ pixel
    pixel_center_coords = ~transform * tuple(center_coords)

    # Calculate upper-left corner in pixel coordinates
    # For a window of shape (H, W), center is at (W/2, H/2) from upper-left
    # pixel_upper_left = pixel_center - (W/2, H/2)
    pixel_upper_left = _round_all((pixel_center_coords[0] - shape[1] / 2, pixel_center_coords[1] - shape[0] / 2))

    # Create window with calculated upper-left corner and requested shape
    # Window format: (col_off, row_off, width, height)
    window = rasterio.windows.Window(
        row_off=pixel_upper_left[1], col_off=pixel_upper_left[0], width=shape[1], height=shape[0]
    )
    return window

window_from_polygon(data_in, polygon, crs_polygon=None, window_surrounding=False)

Calculate the raster window that contains a polygon in pixel coordinates.

This function converts polygon vertices from geographic coordinates to pixel coordinates, then creates a window that encompasses all vertices. Useful for extracting raster data within a specific geographic area.

Parameters:

Name Type Description Default
data_in Union[GeoDataBase, DatasetReader]

Raster data source with crs and transform attributes defining the spatial reference.

required
polygon Union[Polygon, MultiPolygon]

Shapely geometry defining the area of interest. Can be a simple Polygon or MultiPolygon for complex areas.

required
crs_polygon Optional[str]

Coordinate reference system of the polygon. If None, assumes polygon is in the same CRS as data_in. Defaults to None.

None
window_surrounding bool

If True, adds a 1-pixel buffer around the polygon to ensure complete coverage (i.e., window.row_off + window.height will not be a vertex). Defaults to False.

False

Returns:

Type Description
Window

rasterio.windows.Window: Window object with pixel coordinates (row_off, col_off, height, width) relative to data_in that encompasses the polygon.

Examples:

>>> from shapely.geometry import box
>>> import rasterio
>>> # Create a polygon in WGS84
>>> polygon = box(-3.71, 40.41, -3.69, 40.42)  # Madrid area
>>> with rasterio.open('image.tif') as src:
...     window = window_from_polygon(src, polygon, crs_polygon='EPSG:4326')
...     print(f"Window: {window.width}x{window.height} at ({window.col_off}, {window.row_off})")
Note

The window coordinates are in pixel space, not geographic coordinates. Use with read_from_window to extract the actual data.

Source code in georeader/read.py
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
def window_from_polygon(
    data_in: Union[GeoDataBase, rasterio.DatasetReader],
    polygon: Union[Polygon, MultiPolygon],
    crs_polygon: Optional[str] = None,
    window_surrounding: bool = False,
) -> rasterio.windows.Window:
    """
    Calculate the raster window that contains a polygon in pixel coordinates.

    This function converts polygon vertices from geographic coordinates to pixel coordinates,
    then creates a window that encompasses all vertices. Useful for extracting raster data
    within a specific geographic area.

    Args:
        data_in (Union[GeoDataBase, rasterio.DatasetReader]): Raster data source with crs
            and transform attributes defining the spatial reference.
        polygon (Union[Polygon, MultiPolygon]): Shapely geometry defining the area of interest.
            Can be a simple Polygon or MultiPolygon for complex areas.
        crs_polygon (Optional[str], optional): Coordinate reference system of the polygon.
            If None, assumes polygon is in the same CRS as `data_in`. Defaults to None.
        window_surrounding (bool, optional): If True, adds a 1-pixel buffer around the polygon
            to ensure complete coverage (i.e., window.row_off + window.height will not be a vertex).
            Defaults to False.

    Returns:
        rasterio.windows.Window: Window object with pixel coordinates (row_off, col_off, height, width)
            relative to `data_in` that encompasses the polygon.

    Examples:
        >>> from shapely.geometry import box
        >>> import rasterio
        >>> # Create a polygon in WGS84
        >>> polygon = box(-3.71, 40.41, -3.69, 40.42)  # Madrid area
        >>> with rasterio.open('image.tif') as src:
        ...     window = window_from_polygon(src, polygon, crs_polygon='EPSG:4326')
        ...     print(f"Window: {window.width}x{window.height} at ({window.col_off}, {window.row_off})")

    Note:
        The window coordinates are in pixel space, not geographic coordinates.
        Use with `read_from_window` to extract the actual data.
    """
    data_in_crs = data_in.crs
    data_in_transform = data_in.transform

    # Convert polygon vertices to pixel coordinates in the raster's CRS
    # This handles CRS transformation if polygon is in a different CRS
    coords_multipol = window_utils.exterior_pixel_coords(
        polygon=polygon, crs_polygon=crs_polygon, crs=data_in_crs, transform=data_in_transform
    )

    # Calculate bounding box in pixel coordinates
    # Find minimum row/col (upper-left corner)
    row_off = min(c[1] for coords in coords_multipol for c in coords)
    col_off = min(c[0] for coords in coords_multipol for c in coords)

    # Find maximum row/col (lower-right corner)
    row_max = max(c[1] for coords in coords_multipol for c in coords)
    col_max = max(c[0] for coords in coords_multipol for c in coords)

    # Add 1-pixel buffer if requested for complete surrounding coverage
    if window_surrounding:
        row_max += 1
        col_max += 1

    # Create window: (col_off, row_off, width, height)
    return rasterio.windows.Window(row_off=row_off, col_off=col_off, width=col_max - col_off, height=row_max - row_off)

window_from_tile(data_in, x, y, z)

Calculate the raster window corresponding to a Web Mercator (XYZ) tile.

This function converts XYZ tile coordinates (as used by web mapping services like OpenStreetMap, Google Maps) to a raster window. Tiles follow the TMS/Slippy Map convention where the world is divided into 2^z Γ— 2^z tiles at zoom level z.

At zoom z: - Tile (0, 0) is the top-left - x ranges from 0 to 2^z - 1 (west to east) - y ranges from 0 to 2^z - 1 (north to south)

Parameters:

Name Type Description Default
data_in Union[GeoDataBase, DatasetReader]

Raster data source with crs and transform attributes. Can be in any CRS; tile bounds will be transformed.

required
x int

Tile column index (0 to 2^z - 1). Increases eastward.

required
y int

Tile row index (0 to 2^z - 1). Increases southward.

required
z int

Zoom level (0-22 typically). At z=0, the entire world is one tile.

required

Returns:

Type Description
Window

rasterio.windows.Window: Window object in pixel coordinates that corresponds to the geographic extent of the XYZ tile.

Examples:

>>> import rasterio
>>> # Get window for a tile covering Madrid area at zoom 12
>>> with rasterio.open('spain.tif') as src:
...     window = window_from_tile(src, x=2046, y=1537, z=12)
...     tile_data = src.read(window=window)
>>> # Tile coordinates for lower zoom (more area coverage)
>>> window_z8 = window_from_tile(src, x=127, y=96, z=8)  # Larger area
>>> # Higher zoom = smaller area, more detail
>>> window_z15 = window_from_tile(src, x=16374, y=12297, z=15)
References
  • OSM Slippy map tilenames: https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames
  • XYZ tiles: https://en.wikipedia.org/wiki/Tiled_web_map
Note

Tiles are in Web Mercator projection (EPSG:3857). The function handles transformation to the raster's native CRS automatically.

Source code in georeader/read.py
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
def window_from_tile(
    data_in: Union[GeoDataBase, rasterio.DatasetReader], x: int, y: int, z: int
) -> rasterio.windows.Window:
    """
    Calculate the raster window corresponding to a Web Mercator (XYZ) tile.

    This function converts XYZ tile coordinates (as used by web mapping services like
    OpenStreetMap, Google Maps) to a raster window. Tiles follow the TMS/Slippy Map
    convention where the world is divided into 2^z Γ— 2^z tiles at zoom level z.

    At zoom z:
    - Tile (0, 0) is the top-left
    - x ranges from 0 to 2^z - 1 (west to east)
    - y ranges from 0 to 2^z - 1 (north to south)

    Args:
        data_in (Union[GeoDataBase, rasterio.DatasetReader]): Raster data source with crs
            and transform attributes. Can be in any CRS; tile bounds will be transformed.
        x (int): Tile column index (0 to 2^z - 1). Increases eastward.
        y (int): Tile row index (0 to 2^z - 1). Increases southward.
        z (int): Zoom level (0-22 typically). At z=0, the entire world is one tile.

    Returns:
        rasterio.windows.Window: Window object in pixel coordinates that corresponds
            to the geographic extent of the XYZ tile.

    Examples:
        >>> import rasterio
        >>> # Get window for a tile covering Madrid area at zoom 12
        >>> with rasterio.open('spain.tif') as src:
        ...     window = window_from_tile(src, x=2046, y=1537, z=12)
        ...     tile_data = src.read(window=window)

        >>> # Tile coordinates for lower zoom (more area coverage)
        >>> window_z8 = window_from_tile(src, x=127, y=96, z=8)  # Larger area

        >>> # Higher zoom = smaller area, more detail
        >>> window_z15 = window_from_tile(src, x=16374, y=12297, z=15)

    References:
        - OSM Slippy map tilenames: https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames
        - XYZ tiles: https://en.wikipedia.org/wiki/Tiled_web_map

    Note:
        Tiles are in Web Mercator projection (EPSG:3857). The function handles
        transformation to the raster's native CRS automatically.
    """
    # Get tile bounds in Web Mercator (EPSG:3857) coordinates
    bounds_wgs = mercantile.xy_bounds(int(x), int(y), int(z))

    # Create polygon from tile bounds
    polygon_crs_webmercator = box(bounds_wgs.left, bounds_wgs.bottom, bounds_wgs.right, bounds_wgs.top)

    # Convert to window with surrounding buffer for complete tile coverage
    return window_from_polygon(data_in, polygon_crs_webmercator, WEB_MERCATOR_CRS, window_surrounding=True)

Mosaic Module: Combine multiple rasters into seamless composite images.

This module provides functions to merge multiple overlapping rasters into a single output, handling reprojection, resampling, and nodata filling. Essential for creating cloud-free composites and gap-free mosaics.

Spatial Mosaic Overview

Combining multiple rasters with varying coverage::

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    SPATIAL MOSAIC CONCEPT                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                          β”‚
β”‚  Input Rasters (with gaps)              Output Mosaic                   β”‚
β”‚  ─────────────────────────              ─────────────                   β”‚
β”‚                                                                          β”‚
β”‚   Raster 1         Raster 2                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚  β”‚β–“β–“β–“β–‘β–‘β–‘β–‘β–‘β–‘β”‚      β”‚β–‘β–‘β–‘β–‘β–‘β–“β–“β–“β–“β”‚           β”‚β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β”‚            β”‚
β”‚  β”‚β–“β–“β–“β–“β–‘β–‘β–‘β–‘β–‘β”‚  +   β”‚β–‘β–‘β–‘β–‘β–“β–“β–“β–“β–“β”‚    ═══►   β”‚β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β”‚            β”‚
β”‚  β”‚β–“β–“β–“β–“β–“β–‘β–‘β–‘β–‘β”‚      β”‚β–‘β–‘β–‘β–“β–“β–“β–“β–“β–“β”‚           β”‚β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β”‚            β”‚
β”‚  β”‚β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β”‚      β”‚β–‘β–‘β–“β–“β–“β–“β–“β–“β–“β”‚           β”‚β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β”‚            β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚                                                                          β”‚
β”‚   β–‘ = nodata/gaps                        Gaps filled from               β”‚
β”‚   β–“ = valid data                         overlapping rasters            β”‚
β”‚                                                                          β”‚
β”‚  Processing Order:                                                       β”‚
β”‚  β€’ First raster fills as much as possible                               β”‚
β”‚  β€’ Each subsequent raster fills remaining gaps                          β”‚
β”‚  β€’ Continues until no nodata remains (or list exhausted)                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Temporal Mosaic / Reduction

Combining rasters from multiple time steps::

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    TEMPORAL REDUCTION CONCEPT                            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                          β”‚
β”‚  Time Series Input                      Reduction Output                β”‚
β”‚  ─────────────────                      ────────────────                β”‚
β”‚                                                                          β”‚
β”‚   t=1    t=2    t=3                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”  β”Œβ”€β”€β”€β”  β”Œβ”€β”€β”€β”                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚
β”‚  β”‚ 5 β”‚  β”‚ 7 β”‚  β”‚ 6 β”‚                   β”‚               β”‚               β”‚
β”‚  β”‚   β”‚  β”‚   β”‚  β”‚   β”‚   ─────────────►  β”‚  median = 6   β”‚               β”‚
β”‚  β”‚   β”‚  β”‚   β”‚  β”‚   β”‚   np.nanmedian    β”‚  mean = 6.0   β”‚               β”‚
β”‚  β””β”€β”€β”€β”˜  β””β”€β”€β”€β”˜  β””β”€β”€β”€β”˜   np.nanmean      β”‚  max = 7      β”‚               β”‚
β”‚                                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
β”‚                                                                          β”‚
β”‚  Common Reduction Functions:                                            β”‚
β”‚  β€’ np.nanmedian: Robust to outliers (clouds, shadows)                   β”‚
β”‚  β€’ np.nanmean: Average value                                            β”‚
β”‚  β€’ np.nanmax: Maximum composite (e.g., max NDVI)                        β”‚
β”‚  β€’ np.nanmin: Minimum composite                                         β”‚
β”‚  β€’ np.nanstd: Temporal variability                                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Mosaic with Masks

Using external validity masks to control which pixels are used::

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    MASKED MOSAIC WORKFLOW                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                          β”‚
β”‚  Input: (data, mask) tuples                                             β”‚
β”‚  ─────────────────────────                                              β”‚
β”‚                                                                          β”‚
β”‚   Raster 1     Cloud Mask      β”‚    Raster 2     Cloud Mask            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚  β”‚β–“β–“β–“β–“β–“β–“β–“β–“β–“β”‚  β”‚β–‘β–‘β–‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β”‚      β”‚   β”‚β–“β–“β–“β–“β–“β–“β–“β–“β–“β”‚  β”‚β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β”‚            β”‚
β”‚  β”‚β–“β–“β–“β–“β–“β–“β–“β–“β–“β”‚  β”‚β–‘β–‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β”‚      β”‚   β”‚β–“β–“β–“β–“β–“β–“β–“β–“β–“β”‚  β”‚β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β”‚            β”‚
β”‚  β”‚β–“β–“β–“β–“β–“β–“β–“β–“β–“β”‚  β”‚β–‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β”‚   +  β”‚   β”‚β–“β–“β–“β–“β–“β–“β–“β–“β–“β”‚  β”‚β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β”‚            β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚                   ↑            β”‚                                        β”‚
β”‚               β–ˆ = invalid      β”‚   Uses Raster 2 where Raster 1        β”‚
β”‚               (cloud/shadow)   β”‚   is masked as invalid                 β”‚
β”‚                                                                          β”‚
β”‚  Usage:                                                                  β”‚
β”‚    data_list = [(raster1, mask1), (raster2, mask2), ...]               β”‚
β”‚    mosaic = spatial_mosaic(data_list, ...)                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Module Functions Overview

Spatial Mosaicking
  • :func:spatial_mosaic: Merge rasters to fill nodata gaps
  • :func:spatial_mosaic_chunked: Memory-efficient chunked processing
Temporal Reduction
  • :func:rasters_reduction: Apply reduction function across rasters
  • :func:pad_add_rasters: Align and stack rasters for reduction

Quick Start

Create a cloud-free mosaic from multiple images::

from georeader import mosaic, read

# List of overlapping raster readers
rasters = [reader1, reader2, reader3]

# Create mosaic (fills gaps with subsequent images)
result = mosaic.spatial_mosaic(
    rasters,
    bounds=(-122.5, 37.0, -122.0, 37.5),
    dst_crs="EPSG:4326",
    dst_nodata=0
)

Compute median composite from time series::

from georeader import mosaic
import numpy as np

# Stack aligned rasters and compute median
result = mosaic.rasters_reduction(
    raster_list,
    reducer=np.nanmedian,
    dst_crs="EPSG:32610"
)

See Also

georeader.read : Reading and reprojection functions georeader.slices : Array slicing for chunked processing georeader.geotensor : Output format

References

  • Rasterio merge: https://rasterio.readthedocs.io/en/latest/api/rasterio.merge.html
  • Cloud masking strategies: See georeader.readers.cloudsen12

spatial_mosaic(data_list, polygon=None, crs_polygon=None, dst_transform=None, bounds=None, dst_crs=None, dtype_dst=None, window_size=None, resampling=rasterio.warp.Resampling.cubic_spline, masking_function=None, dst_nodata=None)

Create a spatial mosaic by filling gaps with data from overlapping rasters.

Combines multiple rasters into a single output by iteratively filling nodata regions with valid data from subsequent rasters. The first raster is used as the base, and remaining rasters fill in only where the base has nodata values.

This function is similar to rasterio.merge.merge but with support for:

  • Custom validity masks per raster
  • Masking functions (e.g., cloud masks)
  • Windowed processing for memory efficiency

Parameters:

Name Type Description Default
data_list Union[List[GeoData], List[Tuple[GeoData, GeoData]]]

Input rasters to mosaic. Can be:

  • List of GeoData objects: Each raster's nodata value determines validity
  • List of (data, mask) tuples: mask indicates invalid pixels to fill

Rasters are processed in order; first valid pixel wins.

required
polygon Optional[Polygon]

Output extent as a shapely Polygon. If provided, mosaic is clipped to this polygon. CRS specified by crs_polygon.

None
crs_polygon Optional[str]

CRS of the polygon. If not provided, uses the CRS of the first raster.

None
dst_transform Optional[Affine]

Output transform. If not provided, computed from bounds or polygon.

None
bounds Optional[Tuple[float, float, float, float]]

Output extent as (minx, miny, maxx, maxy). Alternative to polygon.

None
dst_crs Optional[str]

Output CRS. If not provided, uses CRS of first raster.

None
dtype_dst Optional[str]

Output data type. If not provided, uses dtype of first raster.

None
window_size Optional[Tuple[int, int]]

Process in tiles of this size (height, width) for memory efficiency. Default None (process all at once).

None
resampling Resampling

Resampling method for reprojection. Default cubic_spline for continuous data.

cubic_spline
masking_function Optional[Callable[[GeoData], GeoData]]

Function that takes a GeoData and returns a boolean mask of INVALID pixels. Applied to each raster before mosaicking.

None
dst_nodata Optional[int]

Output nodata value. If not provided, uses fill_value_default of first raster.

None

Returns:

Name Type Description
GeoTensor GeoTensor

Mosaic covering the specified extent. Nodata regions are filled by iterating through data_list until all pixels are valid or list is exhausted.

Examples:

Basic mosaic of overlapping Sentinel-2 scenes:

>>> from georeader import mosaic
>>> from georeader.rasterio_reader import RasterioReader
>>>
>>> # Load overlapping scenes
>>> scene1 = RasterioReader("scene1.tif")
>>> scene2 = RasterioReader("scene2.tif")
>>> scene3 = RasterioReader("scene3.tif")
>>>
>>> # Create seamless mosaic
>>> result = mosaic.spatial_mosaic(
...     [scene1, scene2, scene3],
...     bounds=(-122.5, 37.0, -121.5, 38.0),
...     dst_crs="EPSG:4326"
... )

Mosaic with cloud masks (tuple format):

>>> # Each tuple is (data, cloud_mask) where cloud_mask=True means cloudy
>>> result = mosaic.spatial_mosaic(
...     [(scene1, cloud1), (scene2, cloud2), (scene3, cloud3)],
...     bounds=(-122.5, 37.0, -121.5, 38.0),
...     dst_crs="EPSG:4326"
... )
>>> # Cloud-covered pixels in scene1 are filled from scene2, etc.

Memory-efficient tiled processing:

>>> result = mosaic.spatial_mosaic(
...     large_scene_list,
...     bounds=extent,
...     window_size=(1024, 1024)  # Process in 1024x1024 tiles
... )
See Also

georeader.read.read_reproject: Underlying reprojection function. rasterio.merge.merge: Similar functionality in rasterio.

Note
  • Processing order matters: earlier rasters have priority
  • Use window_size for large outputs to avoid memory issues
  • Set appropriate resampling for your data type (nearest for categorical)
Source code in georeader/mosaic.py
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
def spatial_mosaic(data_list:Union[List[GeoData], List[Tuple[GeoData,GeoData]]],
                   polygon:Optional[Polygon]=None,
                   crs_polygon:Optional[str]=None,
                   dst_transform:Optional[rasterio.transform.Affine]=None,
                   bounds:Optional[Tuple[float, float, float, float]]=None,
                   dst_crs:Optional[str]=None,
                   dtype_dst:Optional[str]=None,
                   window_size: Optional[Tuple[int, int]]= None,
                   resampling:rasterio.warp.Resampling=rasterio.warp.Resampling.cubic_spline,
                   masking_function:Optional[Callable[[GeoData], GeoData]]=None,
                   dst_nodata:Optional[int]=None) -> GeoTensor:
    """
    Create a spatial mosaic by filling gaps with data from overlapping rasters.

    Combines multiple rasters into a single output by iteratively filling nodata
    regions with valid data from subsequent rasters. The first raster is used as
    the base, and remaining rasters fill in only where the base has nodata values.

    This function is similar to `rasterio.merge.merge` but with support for:

    - Custom validity masks per raster
    - Masking functions (e.g., cloud masks)
    - Windowed processing for memory efficiency

    Args:
        data_list (Union[List[GeoData], List[Tuple[GeoData, GeoData]]]): Input
            rasters to mosaic. Can be:

            - List of GeoData objects: Each raster's nodata value determines validity
            - List of (data, mask) tuples: mask indicates invalid pixels to fill

            Rasters are processed in order; first valid pixel wins.
        polygon (Optional[Polygon]): Output extent as a shapely Polygon. If provided,
            mosaic is clipped to this polygon. CRS specified by crs_polygon.
        crs_polygon (Optional[str]): CRS of the polygon. If not provided, uses
            the CRS of the first raster.
        dst_transform (Optional[Affine]): Output transform. If not provided, computed
            from bounds or polygon.
        bounds (Optional[Tuple[float, float, float, float]]): Output extent as
            (minx, miny, maxx, maxy). Alternative to polygon.
        dst_crs (Optional[str]): Output CRS. If not provided, uses CRS of first raster.
        dtype_dst (Optional[str]): Output data type. If not provided, uses dtype
            of first raster.
        window_size (Optional[Tuple[int, int]]): Process in tiles of this size
            (height, width) for memory efficiency. Default None (process all at once).
        resampling (Resampling): Resampling method for reprojection.
            Default cubic_spline for continuous data.
        masking_function (Optional[Callable[[GeoData], GeoData]]): Function that
            takes a GeoData and returns a boolean mask of INVALID pixels.
            Applied to each raster before mosaicking.
        dst_nodata (Optional[int]): Output nodata value. If not provided, uses
            fill_value_default of first raster.

    Returns:
        GeoTensor: Mosaic covering the specified extent. Nodata regions are filled
            by iterating through data_list until all pixels are valid or list is
            exhausted.

    Examples:
        Basic mosaic of overlapping Sentinel-2 scenes:

        >>> from georeader import mosaic
        >>> from georeader.rasterio_reader import RasterioReader
        >>>
        >>> # Load overlapping scenes
        >>> scene1 = RasterioReader("scene1.tif")
        >>> scene2 = RasterioReader("scene2.tif")
        >>> scene3 = RasterioReader("scene3.tif")
        >>>
        >>> # Create seamless mosaic
        >>> result = mosaic.spatial_mosaic(
        ...     [scene1, scene2, scene3],
        ...     bounds=(-122.5, 37.0, -121.5, 38.0),
        ...     dst_crs="EPSG:4326"
        ... )

        Mosaic with cloud masks (tuple format):

        >>> # Each tuple is (data, cloud_mask) where cloud_mask=True means cloudy
        >>> result = mosaic.spatial_mosaic(
        ...     [(scene1, cloud1), (scene2, cloud2), (scene3, cloud3)],
        ...     bounds=(-122.5, 37.0, -121.5, 38.0),
        ...     dst_crs="EPSG:4326"
        ... )
        >>> # Cloud-covered pixels in scene1 are filled from scene2, etc.

        Memory-efficient tiled processing:

        >>> result = mosaic.spatial_mosaic(
        ...     large_scene_list,
        ...     bounds=extent,
        ...     window_size=(1024, 1024)  # Process in 1024x1024 tiles
        ... )

    See Also:
        georeader.read.read_reproject: Underlying reprojection function.
        rasterio.merge.merge: Similar functionality in rasterio.

    Note:
        - Processing order matters: earlier rasters have priority
        - Use window_size for large outputs to avoid memory issues
        - Set appropriate resampling for your data type (nearest for categorical)
    """

    assert len(data_list) > 0, f"Expected at least one product found 0 {data_list}"

    if isinstance(data_list[0], tuple):
        first_data_object =  data_list[0][0]
        first_mask_object = data_list[0][1]
    else:
        first_data_object = data_list[0]
        first_mask_object = None

    if dst_transform is None:
        dst_transform = first_data_object.transform

    if dst_crs is None:
        dst_crs = first_data_object.crs

    if polygon is None:
        if bounds is not None:
            polygon = box(*bounds)
        else:
            # Polygon is the Union of the polygons of all the data
            for data in data_list:
                if isinstance(data, tuple):
                    data = data[0]
                polygon_iter = data.footprint(crs=dst_crs)

                if polygon is None:
                    polygon = polygon_iter
                else:
                    polygon = polygon.union(polygon_iter)
    else:
        if crs_polygon is None:
            crs_polygon = dst_crs
        elif not georeader.compare_crs(crs_polygon, dst_crs):
            polygon = window_utils.polygon_to_crs(polygon, crs_polygon, dst_crs)

    GeoDataFake = namedtuple("GeoDataFake", ["transform", "crs"])
    window_polygon = read.window_from_polygon(GeoDataFake(transform=dst_transform, crs=dst_crs),
                                              polygon, crs_polygon=dst_crs)

    window_polygon = window_utils.round_outer_window(window_polygon)

    # Shift transform to window
    dst_transform = rasterio.windows.transform(window_polygon, transform=dst_transform)
    dst_nodata = dst_nodata or first_data_object.fill_value_default

    # Get object to save the results
    data_return = read_reproject(first_data_object,
                                 dst_crs=dst_crs, dst_transform=dst_transform,
                                 resampling=resampling,
                                 dtype_dst=dtype_dst,
                                 window_out=rasterio.windows.Window(row_off=0, col_off=0, width=window_polygon.width,
                                                                    height=window_polygon.height),
                                 dst_nodata=dst_nodata)

    # invalid_values of spatial locations only  -> any
    invalid_values = data_return.values == dst_nodata
    if len(data_return.shape) > 2:
        axis_any = tuple(i for i in range(len(data_return.shape)-2))
        invalid_values = np.any(invalid_values, axis=axis_any) # (H, W)
    else:
        axis_any = None

    if first_mask_object is not None:
        if (masking_function is None) and len(first_mask_object.shape) > 2:
            assert (len(first_mask_object.shape) == 3) and (first_mask_object.shape[0] == 1), f"Expected two dims, found {first_mask_object.shape}"

        invalid_geotensor = read_reproject(first_mask_object,
                                           dst_crs=dst_crs, dst_transform=dst_transform,
                                           resampling=rasterio.warp.Resampling.nearest,
                                           window_out=rasterio.windows.Window(row_off=0, col_off=0,
                                                                              width=window_polygon.width,
                                                                              height=window_polygon.height))
        if masking_function is not None:
            invalid_geotensor = masking_function(invalid_geotensor)

        invalid_geotensor.values = invalid_geotensor.values.astype(bool)
        invalid_geotensor.values =  invalid_geotensor.values.squeeze()
        assert len(invalid_geotensor.shape) == 2, f"Invalid mask expected 2 dims found {invalid_geotensor.shape}"

        invalid_values|= invalid_geotensor.values
    elif masking_function is not None:
        # Apply masking funtion to the readed data
        invalid_geotensor = masking_function(data_return)

        invalid_geotensor.values = invalid_geotensor.values.astype(bool)
        invalid_geotensor.values = invalid_geotensor.values.squeeze()
        assert len(invalid_geotensor.shape) == 2, f"Invalid mask expected 2 dims found {invalid_geotensor.shape}"
        invalid_values |= invalid_geotensor.values

    # data_return.values[..., invalid_values] = data_return.fill_value_default

    if not np.any(invalid_values):
        return data_return

    if len(data_list) == 1:
        return data_return

    if window_size is not None:
        windows = slices.create_windows(data_return.shape[-2:], window_size)
    else:
        windows = [rasterio.windows.Window(row_off=0, col_off=0, width=data_return.shape[-1],
                                           height=data_return.shape[-2])]

    # Cache of the polygons geodata
    polygons_geodata = [None for _ in range(len(data_list)-1)]

    for window in windows:
        slice_spatial = window.toslices()
        invalid_values_window = invalid_values[slice_spatial]
        if not np.any(invalid_values_window):
            continue

        # Add dims to slice_obj
        slice_obj = tuple(slice(None) for _ in range(len(data_return.shape)-2)) + slice_spatial
        dst_transform_iter = rasterio.windows.transform(window, transform=dst_transform)
        window_reproject_iter = rasterio.windows.Window(row_off=0, col_off=0, width=window.width, height=window.height)
        polygon_iter = window_utils.window_polygon(window, dst_transform)

        for _i, data in enumerate(data_list[1:]):
            if isinstance(data, tuple):
                geodata = data[0]
                geomask = data[1]
            else:
                geodata = data
                geomask = None

            if polygons_geodata[_i] is None:
                polygons_geodata[_i] = geodata.footprint(crs=dst_crs)

            polygon_geodata = polygons_geodata[_i]

            if not polygon_geodata.intersects(polygon_iter):
                continue

            if geomask is not None:
                if (masking_function is None) and len(geomask.shape) > 2:
                    assert (len(geomask.shape) == 3) and (
                                geomask.shape[0] == 1), f"Expected two dims, found {geomask.shape}"

                invalid_geotensor = read_reproject(geomask,
                                                   dst_crs=dst_crs, dst_transform=dst_transform_iter,
                                                   resampling=rasterio.warp.Resampling.nearest,
                                                   window_out=window_reproject_iter)
                if masking_function is not None:
                    invalid_geotensor = masking_function(invalid_geotensor)

                invalid_geotensor.values = invalid_geotensor.values.astype(bool)
                invalid_geotensor.values = invalid_geotensor.values.squeeze()
                assert len(invalid_geotensor.shape) == 2, f"Invalid mask expected 2 dims found {invalid_geotensor.shape}"
                if np.all(invalid_geotensor.values):
                    continue
                invalid_values_iter = invalid_geotensor.values

            data_read = read_reproject(geodata, dst_crs=dst_crs, window_out=window_reproject_iter,
                                       dst_transform=dst_transform_iter, resampling=resampling,
                                       dtype_dst=dtype_dst,
                                       dst_nodata=dst_nodata)

            if (geomask is None) and (masking_function is not None):
                invalid_geotensor = masking_function(data_read)

                invalid_geotensor.values = invalid_geotensor.values.astype(bool)
                invalid_geotensor.values = invalid_geotensor.values.squeeze()
                assert len(invalid_geotensor.shape) == 2, f"Invalid mask expected 2 dims found {invalid_geotensor.shape}"
                if np.all(invalid_geotensor.values):
                    continue
                invalid_values_iter = invalid_geotensor.values

            # data_read could have more dims -> any
            masked_values_read = data_read.values == dst_nodata
            if axis_any is not None:
                masked_values_read = np.any(masked_values_read, axis=axis_any)  # (H, W)

            if (geomask is not None) or (masking_function is not None):
                invalid_values_iter |= masked_values_read
            else:
                invalid_values_iter = masked_values_read

            # Copy values invalids in window and valids in iter
            mask_values_copy_out = invalid_values_window & ~invalid_values_iter
            data_return.values[slice_obj][..., mask_values_copy_out] = data_read.values[...,mask_values_copy_out]

            invalid_values_window &= invalid_values_iter

            if not np.any(invalid_values_window):
                break


    return data_return