Skip to content

georeader.vectorize

This module provides functions to convert raster data (binary masks, segmentation outputs) into vector geometries (polygons).

Overview

Vectorization is essential for:

  • Converting segmentation masks to GeoJSON/Shapefile
  • Extracting object boundaries from classification results
  • Creating vector features from raster analysis

Quick Start

from georeader import vectorize
from georeader.geotensor import GeoTensor
import numpy as np
import rasterio

# Example 1: Get polygons from GeoTensor (automatically in the GeoTensor's CRS)
mask_data = np.zeros((100, 100), dtype=np.uint8)
mask_data[20:80, 20:80] = 1  # A square region
transform = rasterio.Affine(10.0, 0, 500000, 0, -10.0, 4500000)
gt_mask = GeoTensor(mask_data, transform, crs="EPSG:32610")

# Polygons are automatically in the GeoTensor's CRS (EPSG:32610)
polygons = vectorize.get_polygons(gt_mask, min_area=100)

# For CRS reprojection, use window_utils.polygon_to_crs
from georeader import window_utils
polygon_wgs84 = window_utils.polygon_to_crs(polygons[0], 
                                             crs_polygon="EPSG:32610",
                                             dst_crs="EPSG:4326")

Key Functions

Function Description
get_polygons Extract polygons from binary mask with optional area filtering

Parameters

get_polygons

  • binary_mask: Input mask (GeoTensor or numpy array)
  • min_area: Minimum polygon area in pixel units (default: 25.5), applied before affine transform
  • Returns: List of shapely Polygon objects (in CRS coordinates if transform provided, else pixel coordinates)

Vectorize Module: Convert raster masks to vector polygons.

This module provides functions to extract polygon geometries from binary raster masks. Essential for converting classification results, segmentation outputs, and masks into GIS-compatible vector formats.

Vectorization Process

Converting binary rasters to polygon geometries::

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    VECTORIZATION PROCESS                                 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                          โ”‚
โ”‚  Raster (Binary Mask)                Vector (Polygons)                  โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                  โ”‚
โ”‚                                                                          โ”‚
โ”‚  โ”Œโ”€โ”ฌโ”€โ”ฌโ”€โ”ฌโ”€โ”ฌโ”€โ”ฌโ”€โ”ฌโ”€โ”ฌโ”€โ”                       โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—                  โ”‚
โ”‚  โ”‚0โ”‚0โ”‚0โ”‚1โ”‚1โ”‚1โ”‚0โ”‚0โ”‚                      โ•”โ•           โ•šโ•—                 โ”‚
โ”‚  โ”œโ”€โ”ผโ”€โ”ผโ”€โ”ผโ”€โ”ผโ”€โ”ผโ”€โ”ผโ”€โ”ผโ”€โ”ค                     โ•”โ•             โ•šโ•—                โ”‚
โ”‚  โ”‚0โ”‚0โ”‚1โ”‚1โ”‚1โ”‚1โ”‚1โ”‚0โ”‚   โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ–บ     โ•”โ•               โ•šโ•—               โ”‚
โ”‚  โ”œโ”€โ”ผโ”€โ”ผโ”€โ”ผโ”€โ”ผโ”€โ”ผโ”€โ”ผโ”€โ”ผโ”€โ”ค   Vectorize        โ•‘    Polygon 1    โ•‘               โ”‚
โ”‚  โ”‚0โ”‚1โ”‚1โ”‚1โ”‚1โ”‚1โ”‚1โ”‚0โ”‚                    โ•šโ•—               โ•”โ•               โ”‚
โ”‚  โ”œโ”€โ”ผโ”€โ”ผโ”€โ”ผโ”€โ”ผโ”€โ”ผโ”€โ”ผโ”€โ”ผโ”€โ”ค                     โ•šโ•—             โ•”โ•                โ”‚
โ”‚  โ”‚0โ”‚0โ”‚1โ”‚1โ”‚1โ”‚1โ”‚0โ”‚0โ”‚                      โ•šโ•—           โ•”โ•                 โ”‚
โ”‚  โ””โ”€โ”ดโ”€โ”ดโ”€โ”ดโ”€โ”ดโ”€โ”ดโ”€โ”ดโ”€โ”ดโ”€โ”˜                       โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•                  โ”‚
โ”‚                                                                          โ”‚
โ”‚  1 = foreground (vectorized)                                            โ”‚
โ”‚  0 = background (ignored)                                               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Polygon Simplification

Reducing vertex count while preserving shape::

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              POLYGON SIMPLIFICATION (tolerance parameter)                โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                          โ”‚
โ”‚  Raw (pixelated)              Simplified (tolerance=1)                  โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€              โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                   โ”‚
โ”‚                                                                          โ”‚
โ”‚  โ”Œโ”€โ”                                โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ                            โ”‚
โ”‚  โ”‚ โ””โ”€โ”                             โ•ฑ         โ•ฒ                           โ”‚
โ”‚  โ”‚   โ””โ”€โ”                          โ•ฑ           โ•ฒ                          โ”‚
โ”‚  โ”‚     โ””โ”€โ”   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ       โ”‚             โ”‚    Fewer vertices,     โ”‚
โ”‚  โ”‚       โ”‚   simplify            โ”‚             โ”‚    smoother edges      โ”‚
โ”‚  โ”‚     โ”Œโ”€โ”˜                        โ•ฒ           โ•ฑ                          โ”‚
โ”‚  โ”‚   โ”Œโ”€โ”˜                           โ•ฒ         โ•ฑ                           โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”˜                              โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ                            โ”‚
โ”‚                                                                          โ”‚
โ”‚  tolerance=0: Keep all vertices (staircase pattern)                     โ”‚
โ”‚  tolerance=1: Simplify ~1 pixel tolerance (DEFAULT)                     โ”‚
โ”‚  tolerance>1: More aggressive simplification                            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Filtering Options

Removing small or unwanted polygons::

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    POLYGON FILTERING                                     โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                          โ”‚
โ”‚  Parameters:                                                             โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                             โ”‚
โ”‚                                                                          โ”‚
โ”‚  min_area=25.5 (default)    Remove polygons smaller than ~5x5 pixels    โ”‚
โ”‚                             Helps filter noise and artifacts             โ”‚
โ”‚                                                                          โ”‚
โ”‚  polygon_buffer=0           Buffer/erode polygons by N pixels           โ”‚
โ”‚                             Positive: expand                             โ”‚
โ”‚                             Negative: shrink (erode)                     โ”‚
โ”‚                                                                          โ”‚
โ”‚  Before (min_area=0):                After (min_area=25):               โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”             โ”‚
โ”‚  โ”‚  โ–    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚              โ”‚      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚             โ”‚
โ”‚  โ”‚ โ–  โ–   โ”‚       โ”‚  โ–   โ”‚   โ•โ•โ•โ•โ•โ•โ•โ–บ   โ”‚      โ”‚       โ”‚     โ”‚             โ”‚
โ”‚  โ”‚      โ”‚       โ”‚     โ”‚   Filter     โ”‚      โ”‚       โ”‚     โ”‚             โ”‚
โ”‚  โ”‚ โ–     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚              โ”‚      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚             โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜             โ”‚
โ”‚     โ†‘ small polygons removed                                            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Module Functions Overview

Vectorization
  • :func:get_polygons: Extract polygons from binary mask
  • :func:vectorize_raster: Vectorize with geographic coordinates
Utilities
  • Automatic CRS handling from GeoData inputs
  • Integration with shapely for geometry operations

Quick Start

Extract polygons from a binary mask::

from georeader import vectorize
import numpy as np

# Binary mask (e.g., from classification)
mask = (classified_image == 1).astype(np.uint8)

# Extract polygons
polygons = vectorize.get_polygons(
    mask,
    min_area=100,        # Minimum 10x10 pixel area
    tolerance=1.5,       # Simplification tolerance
    transform=transform  # Affine transform for georeferencing
)

# Polygons are in CRS units (georeferenced)
for poly in polygons:
    print(f"Area: {poly.area} sq. CRS units")

Vectorize a GeoTensor mask::

# GeoTensor carries its own transform
polygons = vectorize.get_polygons(
    water_mask_geotensor,  # GeoTensor with transform
    min_area=50,
    polygon_buffer=-1      # Erode by 1 pixel
)

See Also

georeader.rasterize : Inverse operation (vector โ†’ raster) georeader.geotensor : Input format with transform rasterio.features.shapes : Underlying implementation

References

  • Rasterio shapes: https://rasterio.readthedocs.io/en/latest/api/rasterio.features.html
  • Shapely simplify: https://shapely.readthedocs.io/en/latest/manual.html#object.simplify

get_polygons(binary_mask, min_area=25.5, polygon_buffer=0, tolerance=1.0, transform=None)

Convert a binary raster mask to vector polygons.

Extracts connected regions of True/nonzero pixels as polygon geometries. Includes options for filtering small polygons, buffering/eroding boundaries, and simplifying vertex counts.

If binary_mask is a GeoTensor (has .transform attribute), polygons are returned in geographic coordinates. If it's a numpy array, polygons are in pixel coordinates unless a transform is provided.

Parameters:

Name Type Description Default
binary_mask Union[ndarray, GeoData]

2D binary mask where nonzero pixels represent features to vectorize. Accepts GeoTensor (uses its transform) or numpy array.

required
min_area float

Minimum polygon area in square pixels to include. Polygons smaller than this are filtered out. Default 25.5 (~5x5 px). Set to 0 to keep all polygons.

25.5
polygon_buffer int

Buffer distance in pixels to apply to polygons. Default 0 (no buffering).

  • Positive: Expand/dilate polygons
  • Negative: Shrink/erode polygons (useful to remove edge noise)
0
tolerance float

Simplification tolerance in pixels. Higher values produce simpler polygons with fewer vertices. Default 1.0. Set to 0 for no simplification (keeps staircase pixel edges).

1.0
transform Optional[Affine]

Affine transform to convert pixel coordinates to geographic coordinates. Only used if binary_mask is a numpy array. If binary_mask is GeoTensor, uses its transform.

None

Returns:

Type Description
List[Polygon]

List[Polygon]: List of shapely Polygon objects. Coordinates are in: - Geographic CRS units if transform provided or binary_mask is GeoTensor - Pixel coordinates otherwise

Examples:

Vectorize a classification result:

>>> import numpy as np
>>> from georeader import vectorize
>>>
>>> # Binary mask from classification
>>> water_mask = (classification == 1).astype(np.uint8)
>>>
>>> # Extract water body polygons
>>> polygons = vectorize.get_polygons(
...     water_mask,
...     min_area=100,  # Filter out tiny artifacts
...     tolerance=2.0  # Simplify boundaries
... )
>>> print(f"Found {len(polygons)} water bodies")

Vectorize with erosion to remove edge noise:

>>> polygons = vectorize.get_polygons(
...     noisy_mask,
...     min_area=50,
...     polygon_buffer=-2  # Erode 2 pixels
... )

Vectorize GeoTensor (auto-uses transform):

>>> from georeader.geotensor import GeoTensor
>>> # mask_gt is a GeoTensor with transform
>>> polygons = vectorize.get_polygons(mask_gt, min_area=200)
>>> # Polygons are in geographic coordinates
>>> for poly in polygons:
...     print(f"Area: {poly.area:.2f} sq. CRS units")

Get pixel-coordinate polygons from numpy array:

>>> polygons_px = vectorize.get_polygons(
...     mask_array,
...     transform=None  # No transform = pixel coordinates
... )
Note
  • Polygons are simplified AFTER buffering, so buffer then simplify
  • For very large masks, consider processing in tiles
  • MultiPolygon results are returned as separate Polygon objects
See Also

transform_polygon: Transform polygon between coordinate systems. georeader.rasterize: Inverse operation (vector โ†’ raster).

Source code in georeader/vectorize.py
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
def get_polygons(binary_mask: Union[np.ndarray, GeoData], min_area:float=25.5,
                 polygon_buffer:int=0, tolerance:float=1., transform: Optional[rasterio.Affine]=None) -> List[Polygon]:
    """
    Convert a binary raster mask to vector polygons.

    Extracts connected regions of True/nonzero pixels as polygon geometries.
    Includes options for filtering small polygons, buffering/eroding boundaries,
    and simplifying vertex counts.

    If `binary_mask` is a GeoTensor (has .transform attribute), polygons are
    returned in geographic coordinates. If it's a numpy array, polygons are in
    pixel coordinates unless a transform is provided.

    Args:
        binary_mask (Union[np.ndarray, GeoData]): 2D binary mask where nonzero
            pixels represent features to vectorize. Accepts GeoTensor (uses its
            transform) or numpy array.
        min_area (float): Minimum polygon area in square pixels to include.
            Polygons smaller than this are filtered out. Default 25.5 (~5x5 px).
            Set to 0 to keep all polygons.
        polygon_buffer (int): Buffer distance in pixels to apply to polygons.
            Default 0 (no buffering).

            - Positive: Expand/dilate polygons
            - Negative: Shrink/erode polygons (useful to remove edge noise)

        tolerance (float): Simplification tolerance in pixels. Higher values
            produce simpler polygons with fewer vertices. Default 1.0.
            Set to 0 for no simplification (keeps staircase pixel edges).
        transform (Optional[rasterio.Affine]): Affine transform to convert pixel
            coordinates to geographic coordinates. Only used if binary_mask is
            a numpy array. If binary_mask is GeoTensor, uses its transform.

    Returns:
        List[Polygon]: List of shapely Polygon objects. Coordinates are in:
            - Geographic CRS units if transform provided or binary_mask is GeoTensor
            - Pixel coordinates otherwise

    Examples:
        Vectorize a classification result:

        >>> import numpy as np
        >>> from georeader import vectorize
        >>>
        >>> # Binary mask from classification
        >>> water_mask = (classification == 1).astype(np.uint8)
        >>>
        >>> # Extract water body polygons
        >>> polygons = vectorize.get_polygons(
        ...     water_mask,
        ...     min_area=100,  # Filter out tiny artifacts
        ...     tolerance=2.0  # Simplify boundaries
        ... )
        >>> print(f"Found {len(polygons)} water bodies")

        Vectorize with erosion to remove edge noise:

        >>> polygons = vectorize.get_polygons(
        ...     noisy_mask,
        ...     min_area=50,
        ...     polygon_buffer=-2  # Erode 2 pixels
        ... )

        Vectorize GeoTensor (auto-uses transform):

        >>> from georeader.geotensor import GeoTensor
        >>> # mask_gt is a GeoTensor with transform
        >>> polygons = vectorize.get_polygons(mask_gt, min_area=200)
        >>> # Polygons are in geographic coordinates
        >>> for poly in polygons:
        ...     print(f"Area: {poly.area:.2f} sq. CRS units")

        Get pixel-coordinate polygons from numpy array:

        >>> polygons_px = vectorize.get_polygons(
        ...     mask_array,
        ...     transform=None  # No transform = pixel coordinates
        ... )

    Note:
        - Polygons are simplified AFTER buffering, so buffer then simplify
        - For very large masks, consider processing in tiles
        - MultiPolygon results are returned as separate Polygon objects

    See Also:
        transform_polygon: Transform polygon between coordinate systems.
        georeader.rasterize: Inverse operation (vector โ†’ raster).
    """

    if not hasattr(binary_mask, "transform"):
        binary_mask_np = binary_mask
    else:
        binary_mask_np = np.array(binary_mask)

        assert transform is None, "transform only must be used if input is np.ndarray"
        transform = binary_mask.transform

    shape_ = binary_mask_np.shape
    if len(shape_) != 2:
        binary_mask_np.squeeze()

    assert len(binary_mask_np.shape) == 2, f"Expected mask with 2 dim found {binary_mask_np.shape}"

    geoms_polygons = []
    polygon_generator = features.shapes(binary_mask_np.astype(np.int16),
                                        binary_mask_np)

    for polygon, _ in polygon_generator:
        p = shape(polygon)
        if polygon_buffer > 0:
            p = p.buffer(polygon_buffer)
        if p.area >= min_area:
            p = p.simplify(tolerance=tolerance)
            if transform is not None:
                p = transform_polygon(p, transform) # Convert polygon to raster coordinates
            geoms_polygons.append(p)

    return geoms_polygons

transform_polygon(polygon, transform, relative=False, shape_raster=None)

Transform polygon coordinates using an affine transformation.

Applies a rasterio Affine transform to all vertices of a polygon, converting between pixel and geographic coordinate systems. Handles both simple Polygons and MultiPolygons, including holes.

Common use cases: - Pixel coordinates โ†’ Geographic coordinates (using raster transform) - Geographic coordinates โ†’ Relative coordinates (0-1 range for ML)

Parameters:

Name Type Description Default
polygon Union[Polygon, MultiPolygon]

Shapely geometry to transform. Coordinates can be any numeric type.

required
transform Affine

2D affine transformation matrix. Common sources: raster.transform, rasterio.Affine.scale(), etc.

required
relative bool

If True, output normalized coordinates in [0, 1] range relative to the raster dimensions. Useful for ML model inputs. Requires shape_raster. Default False.

False
shape_raster Optional[Tuple[int, int]]

Raster dimensions as (height, width). Required if relative=True. Ignored otherwise.

None

Returns:

Type Description
Union[Polygon, MultiPolygon]

Union[Polygon, MultiPolygon]: Transformed geometry with same type as input. All coordinates are transformed by the affine matrix.

Raises:

Type Description
AssertionError

If relative=True but shape_raster not provided.

Examples:

Convert pixel polygon to geographic coordinates:

>>> from shapely.geometry import Polygon
>>> import rasterio
>>> from georeader.vectorize import transform_polygon
>>>
>>> # Polygon in pixel coordinates
>>> poly_px = Polygon([(0, 0), (100, 0), (100, 100), (0, 100)])
>>>
>>> # Transform from a raster
>>> transform = rasterio.Affine(10.0, 0, 500000, 0, -10.0, 4500000)
>>>
>>> # Convert to UTM coordinates
>>> poly_geo = transform_polygon(poly_px, transform)
>>> print(poly_geo.bounds)
(500000.0, 4499000.0, 501000.0, 4500000.0)

Get relative coordinates for ML input:

>>> poly_rel = transform_polygon(
...     poly_px,
...     transform=rasterio.Affine.identity(),
...     relative=True,
...     shape_raster=(1000, 1000)  # 1000x1000 image
... )
>>> print(poly_rel.bounds)  # Values in [0, 1]
(0.0, 0.0, 0.1, 0.1)

Transform MultiPolygon:

>>> from shapely.geometry import MultiPolygon
>>> multi = MultiPolygon([poly_px, poly_px.buffer(10)])
>>> multi_geo = transform_polygon(multi, transform)
>>> print(type(multi_geo))
<class 'shapely.geometry.multipolygon.MultiPolygon'>
Note
  • The transform is applied as: (x_out, y_out) = transform * (x_in, y_in)
  • For pixel-to-geo conversion, use the raster's transform directly
  • For geo-to-pixel conversion, use ~transform (inverse)
  • Holes in polygons are preserved after transformation
Source code in georeader/vectorize.py
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
def transform_polygon(polygon:Union[Polygon, MultiPolygon], 
                      transform: rasterio.Affine, relative:bool=False,
                      shape_raster:Optional[Tuple[int,int]] = None) -> Union[Polygon, MultiPolygon]:
    """
    Transform polygon coordinates using an affine transformation.

    Applies a rasterio Affine transform to all vertices of a polygon,
    converting between pixel and geographic coordinate systems. Handles
    both simple Polygons and MultiPolygons, including holes.

    Common use cases:
    - Pixel coordinates โ†’ Geographic coordinates (using raster transform)
    - Geographic coordinates โ†’ Relative coordinates (0-1 range for ML)

    Args:
        polygon (Union[Polygon, MultiPolygon]): Shapely geometry to transform.
            Coordinates can be any numeric type.
        transform (rasterio.Affine): 2D affine transformation matrix.
            Common sources: raster.transform, rasterio.Affine.scale(), etc.
        relative (bool): If True, output normalized coordinates in [0, 1] range
            relative to the raster dimensions. Useful for ML model inputs.
            Requires shape_raster. Default False.
        shape_raster (Optional[Tuple[int, int]]): Raster dimensions as
            (height, width). Required if relative=True. Ignored otherwise.

    Returns:
        Union[Polygon, MultiPolygon]: Transformed geometry with same type as
            input. All coordinates are transformed by the affine matrix.

    Raises:
        AssertionError: If relative=True but shape_raster not provided.

    Examples:
        Convert pixel polygon to geographic coordinates:

        >>> from shapely.geometry import Polygon
        >>> import rasterio
        >>> from georeader.vectorize import transform_polygon
        >>>
        >>> # Polygon in pixel coordinates
        >>> poly_px = Polygon([(0, 0), (100, 0), (100, 100), (0, 100)])
        >>>
        >>> # Transform from a raster
        >>> transform = rasterio.Affine(10.0, 0, 500000, 0, -10.0, 4500000)
        >>>
        >>> # Convert to UTM coordinates
        >>> poly_geo = transform_polygon(poly_px, transform)
        >>> print(poly_geo.bounds)
        (500000.0, 4499000.0, 501000.0, 4500000.0)

        Get relative coordinates for ML input:

        >>> poly_rel = transform_polygon(
        ...     poly_px,
        ...     transform=rasterio.Affine.identity(),
        ...     relative=True,
        ...     shape_raster=(1000, 1000)  # 1000x1000 image
        ... )
        >>> print(poly_rel.bounds)  # Values in [0, 1]
        (0.0, 0.0, 0.1, 0.1)

        Transform MultiPolygon:

        >>> from shapely.geometry import MultiPolygon
        >>> multi = MultiPolygon([poly_px, poly_px.buffer(10)])
        >>> multi_geo = transform_polygon(multi, transform)
        >>> print(type(multi_geo))
        <class 'shapely.geometry.multipolygon.MultiPolygon'>

    Note:
        - The transform is applied as: (x_out, y_out) = transform * (x_in, y_in)
        - For pixel-to-geo conversion, use the raster's transform directly
        - For geo-to-pixel conversion, use ~transform (inverse)
        - Holes in polygons are preserved after transformation
    """
    if relative:
        assert shape_raster is not None, "shape_raster must be provided if relative is True"
        transform = rasterio.Affine.scale(1/shape_raster[1], 1/shape_raster[0]) * transform

    geojson_dict = mapping(polygon)
    if geojson_dict["type"] == "Polygon":
        geojson_dict["coordinates"] = [geojson_dict["coordinates"]]

    multipol_coords = []
    for pol in geojson_dict["coordinates"]:
        pol_coords = []
        for shell_or_holes in pol:
            pol_out = []
            for coords in shell_or_holes:
                pol_out.append(transform * coords)

            pol_coords.append(pol_out)

        multipol_coords.append(pol_coords)

    if geojson_dict["type"] == "Polygon":
        geojson_dict["coordinates"] = multipol_coords[0]
    else:
        geojson_dict["coordinates"] = multipol_coords

    return shape(geojson_dict)