Skip to content

Mosaic Module

The mosaic module provides functionality for creating spatial mosaics from multiple overlapping raster images. This is essential when working with satellite imagery that spans multiple tiles or when combining data from different acquisition times.

Overview

Mosaicking is the process of combining multiple raster images into a single, seamless composite. The module handles:

  • Spatial alignment: Reprojecting images to a common coordinate reference system
  • No-data handling: Filling gaps in one image with data from overlapping images
  • Masking: Custom masking functions to exclude invalid pixels (clouds, shadows, etc.)
  • Memory efficiency: Window-based processing for large datasets
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Image 1 β”‚   β”‚ Image 2 β”‚     Input: Multiple overlapping rasters
    β”‚  β–ˆβ–ˆβ–ˆβ–ˆ   β”‚   β”‚   β–ˆβ–ˆβ–ˆβ–ˆ  β”‚
    β”‚  β–ˆβ–ˆβ–ˆβ–ˆ   β”‚   β”‚   β–ˆβ–ˆβ–ˆβ–ˆ  β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         \           /
          \         /
           ↓       ↓
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚  Mosaic   β”‚          Output: Single seamless composite
         β”‚  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β”‚
         β”‚  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Quick Start

from georeader import mosaic
from georeader.rasterio_reader import RasterioReader

# Load multiple overlapping images
images = [
    RasterioReader("image1.tif"),
    RasterioReader("image2.tif"),
    RasterioReader("image3.tif"),
]

# Create mosaic over the union of all footprints
result = mosaic.spatial_mosaic(images)

# Create mosaic over a specific polygon
from shapely.geometry import box
aoi = box(minx, miny, maxx, maxy)
result = mosaic.spatial_mosaic(images, polygon=aoi, crs_polygon="EPSG:4326")

Key Functions

spatial_mosaic

The main function for creating mosaics from a list of raster images.

Create a spatial mosaic by filling gaps with data from overlapping rasters.

Combines multiple rasters into a single output by iteratively filling nodata regions with valid data from subsequent rasters. The first raster is used as the base, and remaining rasters fill in only where the base has nodata values.

This function is similar to rasterio.merge.merge but with support for:

  • Custom validity masks per raster
  • Masking functions (e.g., cloud masks)
  • Windowed processing for memory efficiency

Parameters:

Name Type Description Default
data_list Union[List[GeoData], List[Tuple[GeoData, GeoData]]]

Input rasters to mosaic. Can be:

  • List of GeoData objects: Each raster's nodata value determines validity
  • List of (data, mask) tuples: mask indicates invalid pixels to fill

Rasters are processed in order; first valid pixel wins.

required
polygon Optional[Polygon]

Output extent as a shapely Polygon. If provided, mosaic is clipped to this polygon. CRS specified by crs_polygon.

None
crs_polygon Optional[str]

CRS of the polygon. If not provided, uses the CRS of the first raster.

None
dst_transform Optional[Affine]

Output transform. If not provided, computed from bounds or polygon.

None
bounds Optional[Tuple[float, float, float, float]]

Output extent as (minx, miny, maxx, maxy). Alternative to polygon.

None
dst_crs Optional[str]

Output CRS. If not provided, uses CRS of first raster.

None
dtype_dst Optional[str]

Output data type. If not provided, uses dtype of first raster.

None
window_size Optional[Tuple[int, int]]

Process in tiles of this size (height, width) for memory efficiency. Default None (process all at once).

None
resampling Resampling

Resampling method for reprojection. Default cubic_spline for continuous data.

cubic_spline
masking_function Optional[Callable[[GeoData], GeoData]]

Function that takes a GeoData and returns a boolean mask of INVALID pixels. Applied to each raster before mosaicking.

None
dst_nodata Optional[int]

Output nodata value. If not provided, uses fill_value_default of first raster.

None

Returns:

Name Type Description
GeoTensor GeoTensor

Mosaic covering the specified extent. Nodata regions are filled by iterating through data_list until all pixels are valid or list is exhausted.

Examples:

Basic mosaic of overlapping Sentinel-2 scenes:

>>> from georeader import mosaic
>>> from georeader.rasterio_reader import RasterioReader
>>>
>>> # Load overlapping scenes
>>> scene1 = RasterioReader("scene1.tif")
>>> scene2 = RasterioReader("scene2.tif")
>>> scene3 = RasterioReader("scene3.tif")
>>>
>>> # Create seamless mosaic
>>> result = mosaic.spatial_mosaic(
...     [scene1, scene2, scene3],
...     bounds=(-122.5, 37.0, -121.5, 38.0),
...     dst_crs="EPSG:4326"
... )

Mosaic with cloud masks (tuple format):

>>> # Each tuple is (data, cloud_mask) where cloud_mask=True means cloudy
>>> result = mosaic.spatial_mosaic(
...     [(scene1, cloud1), (scene2, cloud2), (scene3, cloud3)],
...     bounds=(-122.5, 37.0, -121.5, 38.0),
...     dst_crs="EPSG:4326"
... )
>>> # Cloud-covered pixels in scene1 are filled from scene2, etc.

Memory-efficient tiled processing:

>>> result = mosaic.spatial_mosaic(
...     large_scene_list,
...     bounds=extent,
...     window_size=(1024, 1024)  # Process in 1024x1024 tiles
... )
See Also

georeader.read.read_reproject: Underlying reprojection function. rasterio.merge.merge: Similar functionality in rasterio.

Note
  • Processing order matters: earlier rasters have priority
  • Use window_size for large outputs to avoid memory issues
  • Set appropriate resampling for your data type (nearest for categorical)
Source code in georeader/mosaic.py
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
def spatial_mosaic(data_list:Union[List[GeoData], List[Tuple[GeoData,GeoData]]],
                   polygon:Optional[Polygon]=None,
                   crs_polygon:Optional[str]=None,
                   dst_transform:Optional[rasterio.transform.Affine]=None,
                   bounds:Optional[Tuple[float, float, float, float]]=None,
                   dst_crs:Optional[str]=None,
                   dtype_dst:Optional[str]=None,
                   window_size: Optional[Tuple[int, int]]= None,
                   resampling:rasterio.warp.Resampling=rasterio.warp.Resampling.cubic_spline,
                   masking_function:Optional[Callable[[GeoData], GeoData]]=None,
                   dst_nodata:Optional[int]=None) -> GeoTensor:
    """
    Create a spatial mosaic by filling gaps with data from overlapping rasters.

    Combines multiple rasters into a single output by iteratively filling nodata
    regions with valid data from subsequent rasters. The first raster is used as
    the base, and remaining rasters fill in only where the base has nodata values.

    This function is similar to `rasterio.merge.merge` but with support for:

    - Custom validity masks per raster
    - Masking functions (e.g., cloud masks)
    - Windowed processing for memory efficiency

    Args:
        data_list (Union[List[GeoData], List[Tuple[GeoData, GeoData]]]): Input
            rasters to mosaic. Can be:

            - List of GeoData objects: Each raster's nodata value determines validity
            - List of (data, mask) tuples: mask indicates invalid pixels to fill

            Rasters are processed in order; first valid pixel wins.
        polygon (Optional[Polygon]): Output extent as a shapely Polygon. If provided,
            mosaic is clipped to this polygon. CRS specified by crs_polygon.
        crs_polygon (Optional[str]): CRS of the polygon. If not provided, uses
            the CRS of the first raster.
        dst_transform (Optional[Affine]): Output transform. If not provided, computed
            from bounds or polygon.
        bounds (Optional[Tuple[float, float, float, float]]): Output extent as
            (minx, miny, maxx, maxy). Alternative to polygon.
        dst_crs (Optional[str]): Output CRS. If not provided, uses CRS of first raster.
        dtype_dst (Optional[str]): Output data type. If not provided, uses dtype
            of first raster.
        window_size (Optional[Tuple[int, int]]): Process in tiles of this size
            (height, width) for memory efficiency. Default None (process all at once).
        resampling (Resampling): Resampling method for reprojection.
            Default cubic_spline for continuous data.
        masking_function (Optional[Callable[[GeoData], GeoData]]): Function that
            takes a GeoData and returns a boolean mask of INVALID pixels.
            Applied to each raster before mosaicking.
        dst_nodata (Optional[int]): Output nodata value. If not provided, uses
            fill_value_default of first raster.

    Returns:
        GeoTensor: Mosaic covering the specified extent. Nodata regions are filled
            by iterating through data_list until all pixels are valid or list is
            exhausted.

    Examples:
        Basic mosaic of overlapping Sentinel-2 scenes:

        >>> from georeader import mosaic
        >>> from georeader.rasterio_reader import RasterioReader
        >>>
        >>> # Load overlapping scenes
        >>> scene1 = RasterioReader("scene1.tif")
        >>> scene2 = RasterioReader("scene2.tif")
        >>> scene3 = RasterioReader("scene3.tif")
        >>>
        >>> # Create seamless mosaic
        >>> result = mosaic.spatial_mosaic(
        ...     [scene1, scene2, scene3],
        ...     bounds=(-122.5, 37.0, -121.5, 38.0),
        ...     dst_crs="EPSG:4326"
        ... )

        Mosaic with cloud masks (tuple format):

        >>> # Each tuple is (data, cloud_mask) where cloud_mask=True means cloudy
        >>> result = mosaic.spatial_mosaic(
        ...     [(scene1, cloud1), (scene2, cloud2), (scene3, cloud3)],
        ...     bounds=(-122.5, 37.0, -121.5, 38.0),
        ...     dst_crs="EPSG:4326"
        ... )
        >>> # Cloud-covered pixels in scene1 are filled from scene2, etc.

        Memory-efficient tiled processing:

        >>> result = mosaic.spatial_mosaic(
        ...     large_scene_list,
        ...     bounds=extent,
        ...     window_size=(1024, 1024)  # Process in 1024x1024 tiles
        ... )

    See Also:
        georeader.read.read_reproject: Underlying reprojection function.
        rasterio.merge.merge: Similar functionality in rasterio.

    Note:
        - Processing order matters: earlier rasters have priority
        - Use window_size for large outputs to avoid memory issues
        - Set appropriate resampling for your data type (nearest for categorical)
    """

    assert len(data_list) > 0, f"Expected at least one product found 0 {data_list}"

    if isinstance(data_list[0], tuple):
        first_data_object =  data_list[0][0]
        first_mask_object = data_list[0][1]
    else:
        first_data_object = data_list[0]
        first_mask_object = None

    if dst_transform is None:
        dst_transform = first_data_object.transform

    if dst_crs is None:
        dst_crs = first_data_object.crs

    if polygon is None:
        if bounds is not None:
            polygon = box(*bounds)
        else:
            # Polygon is the Union of the polygons of all the data
            for data in data_list:
                if isinstance(data, tuple):
                    data = data[0]
                polygon_iter = data.footprint(crs=dst_crs)

                if polygon is None:
                    polygon = polygon_iter
                else:
                    polygon = polygon.union(polygon_iter)
    else:
        if crs_polygon is None:
            crs_polygon = dst_crs
        elif not georeader.compare_crs(crs_polygon, dst_crs):
            polygon = window_utils.polygon_to_crs(polygon, crs_polygon, dst_crs)

    GeoDataFake = namedtuple("GeoDataFake", ["transform", "crs"])
    window_polygon = read.window_from_polygon(GeoDataFake(transform=dst_transform, crs=dst_crs),
                                              polygon, crs_polygon=dst_crs)

    window_polygon = window_utils.round_outer_window(window_polygon)

    # Shift transform to window
    dst_transform = rasterio.windows.transform(window_polygon, transform=dst_transform)
    dst_nodata = dst_nodata or first_data_object.fill_value_default

    # Get object to save the results
    data_return = read_reproject(first_data_object,
                                 dst_crs=dst_crs, dst_transform=dst_transform,
                                 resampling=resampling,
                                 dtype_dst=dtype_dst,
                                 window_out=rasterio.windows.Window(row_off=0, col_off=0, width=window_polygon.width,
                                                                    height=window_polygon.height),
                                 dst_nodata=dst_nodata)

    # invalid_values of spatial locations only  -> any
    invalid_values = data_return.values == dst_nodata
    if len(data_return.shape) > 2:
        axis_any = tuple(i for i in range(len(data_return.shape)-2))
        invalid_values = np.any(invalid_values, axis=axis_any) # (H, W)
    else:
        axis_any = None

    if first_mask_object is not None:
        if (masking_function is None) and len(first_mask_object.shape) > 2:
            assert (len(first_mask_object.shape) == 3) and (first_mask_object.shape[0] == 1), f"Expected two dims, found {first_mask_object.shape}"

        invalid_geotensor = read_reproject(first_mask_object,
                                           dst_crs=dst_crs, dst_transform=dst_transform,
                                           resampling=rasterio.warp.Resampling.nearest,
                                           window_out=rasterio.windows.Window(row_off=0, col_off=0,
                                                                              width=window_polygon.width,
                                                                              height=window_polygon.height))
        if masking_function is not None:
            invalid_geotensor = masking_function(invalid_geotensor)

        invalid_geotensor.values = invalid_geotensor.values.astype(bool)
        invalid_geotensor.values =  invalid_geotensor.values.squeeze()
        assert len(invalid_geotensor.shape) == 2, f"Invalid mask expected 2 dims found {invalid_geotensor.shape}"

        invalid_values|= invalid_geotensor.values
    elif masking_function is not None:
        # Apply masking funtion to the readed data
        invalid_geotensor = masking_function(data_return)

        invalid_geotensor.values = invalid_geotensor.values.astype(bool)
        invalid_geotensor.values = invalid_geotensor.values.squeeze()
        assert len(invalid_geotensor.shape) == 2, f"Invalid mask expected 2 dims found {invalid_geotensor.shape}"
        invalid_values |= invalid_geotensor.values

    # data_return.values[..., invalid_values] = data_return.fill_value_default

    if not np.any(invalid_values):
        return data_return

    if len(data_list) == 1:
        return data_return

    if window_size is not None:
        windows = slices.create_windows(data_return.shape[-2:], window_size)
    else:
        windows = [rasterio.windows.Window(row_off=0, col_off=0, width=data_return.shape[-1],
                                           height=data_return.shape[-2])]

    # Cache of the polygons geodata
    polygons_geodata = [None for _ in range(len(data_list)-1)]

    for window in windows:
        slice_spatial = window.toslices()
        invalid_values_window = invalid_values[slice_spatial]
        if not np.any(invalid_values_window):
            continue

        # Add dims to slice_obj
        slice_obj = tuple(slice(None) for _ in range(len(data_return.shape)-2)) + slice_spatial
        dst_transform_iter = rasterio.windows.transform(window, transform=dst_transform)
        window_reproject_iter = rasterio.windows.Window(row_off=0, col_off=0, width=window.width, height=window.height)
        polygon_iter = window_utils.window_polygon(window, dst_transform)

        for _i, data in enumerate(data_list[1:]):
            if isinstance(data, tuple):
                geodata = data[0]
                geomask = data[1]
            else:
                geodata = data
                geomask = None

            if polygons_geodata[_i] is None:
                polygons_geodata[_i] = geodata.footprint(crs=dst_crs)

            polygon_geodata = polygons_geodata[_i]

            if not polygon_geodata.intersects(polygon_iter):
                continue

            if geomask is not None:
                if (masking_function is None) and len(geomask.shape) > 2:
                    assert (len(geomask.shape) == 3) and (
                                geomask.shape[0] == 1), f"Expected two dims, found {geomask.shape}"

                invalid_geotensor = read_reproject(geomask,
                                                   dst_crs=dst_crs, dst_transform=dst_transform_iter,
                                                   resampling=rasterio.warp.Resampling.nearest,
                                                   window_out=window_reproject_iter)
                if masking_function is not None:
                    invalid_geotensor = masking_function(invalid_geotensor)

                invalid_geotensor.values = invalid_geotensor.values.astype(bool)
                invalid_geotensor.values = invalid_geotensor.values.squeeze()
                assert len(invalid_geotensor.shape) == 2, f"Invalid mask expected 2 dims found {invalid_geotensor.shape}"
                if np.all(invalid_geotensor.values):
                    continue
                invalid_values_iter = invalid_geotensor.values

            data_read = read_reproject(geodata, dst_crs=dst_crs, window_out=window_reproject_iter,
                                       dst_transform=dst_transform_iter, resampling=resampling,
                                       dtype_dst=dtype_dst,
                                       dst_nodata=dst_nodata)

            if (geomask is None) and (masking_function is not None):
                invalid_geotensor = masking_function(data_read)

                invalid_geotensor.values = invalid_geotensor.values.astype(bool)
                invalid_geotensor.values = invalid_geotensor.values.squeeze()
                assert len(invalid_geotensor.shape) == 2, f"Invalid mask expected 2 dims found {invalid_geotensor.shape}"
                if np.all(invalid_geotensor.values):
                    continue
                invalid_values_iter = invalid_geotensor.values

            # data_read could have more dims -> any
            masked_values_read = data_read.values == dst_nodata
            if axis_any is not None:
                masked_values_read = np.any(masked_values_read, axis=axis_any)  # (H, W)

            if (geomask is not None) or (masking_function is not None):
                invalid_values_iter |= masked_values_read
            else:
                invalid_values_iter = masked_values_read

            # Copy values invalids in window and valids in iter
            mask_values_copy_out = invalid_values_window & ~invalid_values_iter
            data_return.values[slice_obj][..., mask_values_copy_out] = data_read.values[...,mask_values_copy_out]

            invalid_values_window &= invalid_values_iter

            if not np.any(invalid_values_window):
                break


    return data_return

Algorithm Details

The spatial_mosaic function uses an iterative fill algorithm:

  1. Initialize output: Load the first image, reprojecting to target CRS/bounds
  2. Track invalid pixels: Identify pixels with no-data values
  3. Iterative fill: For each subsequent image:
  4. Skip if footprint doesn't intersect remaining invalid regions
  5. Load only the overlapping region
  6. Fill invalid pixels with valid data from the new image
  7. Update the invalid pixel mask
  8. Early termination: Stop when all pixels are valid
    Iteration 1              Iteration 2              Final Result
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ β–ˆβ–ˆβ–ˆβ–ˆ β–‘β–‘β–‘β–‘ β”‚     +      β”‚ β–‘β–‘β–‘β–‘ β–ˆβ–ˆβ–ˆβ–ˆ β”‚     =      β”‚ β–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆ β”‚
    β”‚ β–ˆβ–ˆβ–ˆβ–ˆ β–‘β–‘β–‘β–‘ β”‚            β”‚ β–‘β–‘β–‘β–‘ β–ˆβ–ˆβ–ˆβ–ˆ β”‚            β”‚ β–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆ β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    (from image 1)           (from image 2)           (complete)

    β–ˆβ–ˆβ–ˆβ–ˆ = valid data        β–‘β–‘β–‘β–‘ = no-data

Working with Masks

The module supports custom masking functions to exclude invalid pixels beyond simple no-data values:

def cloud_mask_function(geotensor):
    """Custom mask function that returns True for invalid pixels."""
    # Example: mask pixels where any band exceeds threshold
    return geotensor.values.max(axis=0) > 10000

# Use with spatial_mosaic
result = mosaic.spatial_mosaic(
    images,
    masking_function=cloud_mask_function
)

# Or provide explicit masks as tuples
images_with_masks = [
    (image1, mask1),
    (image2, mask2),
]
result = mosaic.spatial_mosaic(images_with_masks)

Memory-Efficient Processing

For large mosaics, use the window_size parameter to process in tiles:

# Process in 512x512 windows
result = mosaic.spatial_mosaic(
    images,
    window_size=(512, 512),
    polygon=large_aoi,
    crs_polygon="EPSG:4326"
)

Resampling Methods

The resampling parameter controls interpolation during reprojection:

Method Use Case
nearest Categorical data (land cover, masks)
bilinear Continuous data, faster
cubic Continuous data, smoother
cubic_spline Default, best quality for most imagery
import rasterio.warp

result = mosaic.spatial_mosaic(
    images,
    resampling=rasterio.warp.Resampling.bilinear
)

See Also