MODIS Pipeline¶

In this tutorial, we will walk through how one can download data and prep for any further machine learning work with the GOES16 dataset. We will:

download the data

harmonize the data

create patches that are ready for ML consumption.

In [4]:

Copied!





import autoroot
import os
from dotenv import load_dotenv
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import rasterio
import cartopy
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
xr.set_options(
    keep_attrs=True, 
    display_expand_data=False, 
    display_expand_coords=False, 
    display_expand_data_vars=False, 
    display_expand_indexes=False
)
np.set_printoptions(threshold=10, edgeitems=2)

import seaborn as sns
sns.reset_defaults()
sns.set_context(context="talk", font_scale=1.0)

%matplotlib inline
import autoroot
import os
from dotenv import load_dotenv
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import rasterio
import cartopy
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
xr.set_options(
    keep_attrs=True, 
    display_expand_data=False, 
    display_expand_coords=False, 
    display_expand_data_vars=False, 
    display_expand_indexes=False
)
np.set_printoptions(threshold=10, edgeitems=2)

import seaborn as sns
sns.reset_defaults()
sns.set_context(context="talk", font_scale=1.0)

%matplotlib inline

Download¶

Firstly, we need to download the data.

Save Directory¶

This is arguably the most important part. We need to define where we want to save the data.

We use the autoroot package to manually handle all of the

In [5]:

Copied!

root_dir = autoroot.root
root_dir = autoroot.root

Note: The data is very heavy! So make sure you have adequate space.

In [6]:

Copied!

save_dir = os.getenv("ITI_DATA_SAVEDIR")
save_dir = os.getenv("ITI_DATA_SAVEDIR")

Account¶

We use the NASA data registry which hosts all of the datasets. We use the EarthAccess API which enables us to easily download data using python.

Warning: the user must have an account for the NASA EarthData API. Please follow the link to register for an account. There are different ways to authenticate your account. We recommend you log in once and store it to your local ~/.netrc file or alternatively, setting the .env variable to your EARTHDATA_USERNAME and EARTH_PASSWORD. See these instructions for more information.

Config¶

We have a configuration file which features some of the options available for downloading data. One can take a peek using the command below.

In [7]:

Copied!

!cat $autoroot.root/config/example/download.yaml
!cat $autoroot.root/config/example/download.yaml

# PERIOD
period:
  start_date: '2020-10-01'
  start_time: '00:00:00'
  end_date: '2020-10-31'
  end_time: '23:59:00'

# CLOUD MASK
cloud_mask: True
  
# PATH FOR SAVING DATA
save_dir: data

defaults:
  - _self_

We also have some more things we can change that are satellite specific.

We can see them using the command below.

In [8]:

Copied!

# !cat $autoroot.root/config/example/satellite/aqua.yaml
# !cat $autoroot.root/config/example/satellite/aqua.yaml

download:
  _target_: rs_tools._src.data.modis.downloader_aqua.download
  save_dir: ${save_dir}/aqua/
  start_date: ${period.start_date}
  start_time: ${period.start_time}
  end_date: ${period.end_date}
  end_time: ${period.end_time}
  region: "-130 -15 -90 5" # "lon_min lat_min lon_max lat_max"

For this tutorial, we will change the save directory, start/end time, and the time step.

Notice how we will change some configurations within the download.yaml file and some others that are within the satellite.yaml file, in particular the aqua.yaml.

python rs_tools \
    satellite=aqua \
    stage=download \
    save_dir="/path/to/savedir" \
    period.start_date="2020-10-01" \
    period.end_date="2020-10-02" \
    period.start_time="09:00:00" \
    period.end_time="21:00:00"

GeoProcessing¶

We have an extensive geoprocessing steps to be able to

We can peek into the rs_tools/config/example/download.yaml configuration file to see some of the options we have to modify this.

In [9]:

Copied!

# !cat $autoroot.root/config/example/satellite/aqua.yaml
# !cat $autoroot.root/config/example/satellite/aqua.yaml

geoprocess:
  _target_: rs_tools._src.geoprocessing.modis.geoprocessor_modis.geoprocess
  read_path: ${read_path}/aqua/raw
  save_path: ${save_path}/aqua/geoprocessed
  satellite: aqua

In particular, we will focus on the geoprocess step within the configuration. The most important options are the resolution and the region. The resolution is a float or integer that is measured in km.

Below, we have an example of the command we

python rs_tools \
    satellite=aqua \
    stage=geoprocess \
    read_path=$save_dir/data/iti \
    save_path=$save_dir/data/iti

We can see the saved data are clean

/path/to/savedir/goes16/geoprocessed/20201001150019_goes16.nc
/path/to/savedir/goes16/geoprocessed/20201002150019_goes16.nc

In [10]:

Copied!

# !ls $save_dir/aqua/geoprocessed
# !ls $save_dir/aqua/geoprocessed

In [11]:

Copied!

ds = xr.open_dataset(f"{save_dir}/aqua/geoprocessed/20201001195500_aqua.nc", engine="netcdf4")
ds
ds = xr.open_dataset(f"{save_dir}/aqua/geoprocessed/20201001195500_aqua.nc", engine="netcdf4")
ds

Out[11]:

<xarray.Dataset> Size: 453MB
Dimensions:          (y: 2040, x: 1354, band: 38, time: 1, band_wavelength: 38)
Coordinates: (6)
Dimensions without coordinates: y, x
Data variables: (1)
Attributes:
    calibration:    radiance
    standard_name:  toa_outgoing_radiance_per_unit_wavelength
    platform_name:  EOS-Aqua
    sensor:         modis
    units:          Watts/m^2/micrometer/steradian

xarray.Dataset

Dimensions:
- y: 2040
- x: 1354
- band: 38
- time: 1
- band_wavelength: 38
Coordinates: (6)
- latitude
  (y, x)
  float32
  ...
```
[2762160 values with dtype=float32]
```
- longitude
  (y, x)
  float32
  ...
```
[2762160 values with dtype=float32]
```
- band
  (band)
  <U4
  '1' '2' '3' ... '36' '13hi' '14hi'
```
array(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13lo',
       '14lo', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24',
       '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36',
       '13hi', '14hi'], dtype='<U4')
```
- time
  (time)
  <U16
  '2020-10-01 19:55'
```
array(['2020-10-01 19:55'], dtype='<U16')
```
- band_wavelength
  (band_wavelength)
  float64
  0.645 0.8585 0.469 ... 13.94 14.23
```
array([ 0.645 ,  0.8585,  0.469 ,  0.555 ,  1.24  ,  1.64  ,  2.13  ,  0.4125,
        0.443 ,  0.488 ,  0.531 ,  0.551 ,  0.667 ,  0.667 ,  0.678 ,  0.678 ,
        0.748 ,  0.8695,  0.905 ,  0.936 ,  0.94  ,  3.75  ,  3.959 ,  3.959 ,
        4.05  ,  4.4655,  4.5155,  1.375 ,  6.715 ,  7.325 ,  8.55  ,  9.73  ,
       11.03  , 12.02  , 13.335 , 13.635 , 13.935 , 14.235 ])
```
- cloud_mask
  (y, x)
  float32
  ...
  valid_range :
  [ 0 -1]
  long_name :
  MODIS Cloud Mask and Spectral Test Results
  units :
  none
  Parameter_Type :
  Output
  Cell_Along_Swath_Sampling :
  [ 1 2040 1]
  Cell_Across_Swath_Sampling :
  [ 1 1354 1]
  Geolocation_Pointer :
  External MODIS geolocation product
  description :
  \n Bit fields within each byte are numbered from the left: 7, 6, 5, 4, 3, 2, 1, 0. The left-most bit (bit 7) is the most significant bit. The right-most bit (bit 0) is the least significant bit. bit field Description Key --------- ----------- --- 0 Cloud Mask Flag 0 = Not determined 1 = Determined 2, 1 Unobstructed FOV Quality Flag 00 = Cloudy 01 = Uncertain 10 = Probably Clear 11 = Confident Clear PROCESSING PATH --------------- 3 Day or Night Path 0 = Night / 1 = Day 4 Sunglint Path 0 = Yes / 1 = No 5 Snow/Ice Background Path 0 = Yes / 1 = No 7, 6 Land or Water Path 00 = Water 01 = Coastal 10 = Desert 11 = Land ____ END BYTE 1 ______________ ___________________________________________ bit field Description Key --------- ----------- --- 1-KM FLAGS ---------------------- 0 Non- cloud obstruction Flag 0 = Yes / 1 = No 1 Thin Cirrus Detected (Solar) 0 = Yes / 1 = No 2 Snow cover from ancillary map 0 = Yes / 1 = No 3 Thin Cirrus Detected (Infrared) 0 = Yes / 1 = No 4 Cloud Adjacency (cloudy, probably 0 = Yes / 1 = No cloudy plus 1-pixel adjacent) 5 Cloud Flag - IR Threshold 0 = Yes / 1 = No 6 High Cloud Flag - CO2 Test 0 = Yes / 1 = No 7 High Cloud Flag - 6.7 micron Test 0 = Yes / 1 = No ____ END BYTE 2 ______________ ___________________________________________ bit field Description Key --------- ----------- --- 0 High Cloud Flag - 1.38 micron Test 0 = Yes / 1 = No 1 High Cloud Flag - 3.9- 12 micron Test 0 = Yes / 1 = No 2 Cloud Flag - IR Temperature 0 = Yes / 1 = No Difference 3 Cloud Flag - 3.9- 11 micron Test 0 = Yes / 1 = No 4 Cloud Flag - Visible Reflectance Test 0 = Yes / 1 = No 5 Cloud Flag - Visible/NIR Reflectance 0 = Yes / 1 = No Ratio Test 6 Cloud Flag - NDVI Clear Sky Restoral 0 = Yes / 1 = No Test 7 Cloud Flag - Night Land and Polar 0 = Yes / 1 = No 7.3-11 micron Test ____ END BYTE 3 ______________ ___________________________________________ bit field Description Key --------- ----------- --- 0 Cloud Flag - Ocean 8.6- 11 micron Test 0 = Yes / 1 = No 1 Cloud Flag - Clear Sky Restoral Test 0 = Yes / 1 = No Spatial Consistency 2 Cloud Flag - Clear Sky Restoral Test 0 = Yes / 1 = No Polar Night, Land, Sun-glint 3 Cloud Flag - Surface Temperature Test 0 = Yes / 1 = No 4 Suspended Dust Flag 0 = Yes / 1 = No 5 Cloud Flag - Night Ocean 8.6- 7.3 micron 0 = Yes / 1 = No Test 6 Cloud Flag - Night Ocean 11 micon 0 = Yes / 1 = No Spatial Variability Test 7 Cloud Flag - Night Ocean Low Emissivity 0 = Yes / 1 = No Low Cloud 3.9-11 micron Test ____ END BYTE 4 ______________ ___________________________________________ bit field Description Key --------- ----------- --- 250-m Cloud Flag - Visible Tests -------------------------------- 0 Element(1,1) 0 = Yes / 1 = No 1 Element(1,2) 0 = Yes / 1 = No 2 Element(1,3) 0 = Yes / 1 = No 3 Element(1,4) 0 = Yes / 1 = No 4 Element(2,1) 0 = Yes / 1 = No 5 Element(2,2) 0 = Yes / 1 = No 6 Element(2,3) 0 = Yes / 1 = No 7 Element(2,4) 0 = Yes / 1 = No ____ END BYTE 5 ______________ ___________________________________________ bit field Description Key ---------- ----------- --- 0 Element(3,1) 0 = Yes / 1 = No 1 Element(3,2) 0 = Yes / 1 = No 2 Element(3,3) 0 = Yes / 1 = No 3 Element(3,4) 0 = Yes / 1 = No 4 Element(4,1) 0 = Yes / 1 = No 5 Element(4,2) 0 = Yes / 1 = No 6 Element(4,3) 0 = Yes / 1 = No 7 Element(4,4) 0 = Yes / 1 = No ____ END BYTE 6 ______________ ___________________________________________
  platform_name :
  EOS-Aqua
  sensor :
  modis
  rows_per_scan :
  10
  reader :
  modis_l2
  name :
  cloud_mask
  resolution :
  1000
  modifiers :
  []
  ancillary_variables :
  []
```
[2762160 values with dtype=float32]
```
Data variables: (1)
- Rad
  (band, y, x)
  float32
  ...
  grid_mapping :
  spatial_ref
```
[104962080 values with dtype=float32]
```

Indexes: (3)

band

PandasIndex

PandasIndex(Index(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13lo',
       '14lo', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24',
       '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36',
       '13hi', '14hi'],
      dtype='object', name='band'))

time

PandasIndex

PandasIndex(Index(['2020-10-01 19:55'], dtype='object', name='time'))

band_wavelength

PandasIndex

PandasIndex(Index([ 0.645, 0.8585,  0.469,  0.555,   1.24,   1.64,   2.13, 0.4125,  0.443,
        0.488,  0.531,  0.551,  0.667,  0.667,  0.678,  0.678,  0.748, 0.8695,
        0.905,  0.936,   0.94,   3.75,  3.959,  3.959,   4.05, 4.4655, 4.5155,
        1.375,  6.715,  7.325,   8.55,   9.73,  11.03,  12.02, 13.335, 13.635,
       13.935, 14.235],
      dtype='float64', name='band_wavelength'))

Attributes: (5)
calibration :
radiance
standard_name :
toa_outgoing_radiance_per_unit_wavelength
platform_name :
EOS-Aqua
sensor :
modis
units :
Watts/m^2/micrometer/steradian

In [12]:

Copied!





# in an even better way 
fig = plt.figure(figsize=(8,8))
ax = plt.axes(projection=ccrs.PlateCarree())
cbar_kwargs = {
    "fraction": 0.06, 
    "pad": 0.1, 
    "orientation": "horizontal",
}
# ax.set_extent([-20, -10, 30, 60])
# out["1"].plot(ax=ax, transform=ccrs.PlateCarree())
# ax.pcolormesh(out["1"].longitude, out["1"].latitude, out["1"].values)
ds.isel(band=0).Rad.plot.pcolormesh(
    x="longitude", y="latitude", transform=ccrs.PlateCarree(),
    cbar_kwargs=cbar_kwargs
)


ax.set(xlim=[-140, -70,],
      ylim=[ -40, 10])

# # Add map features with Cartopy 
# ax.add_feature(cfeature.NaturalEarthFeature('physical', 'land', '10m', 
#                                             edgecolor='face', 
#                                             facecolor='lightgray'))
ax.coastlines()
# Plot lat/lon grid 
gl = ax.gridlines(crs=ccrs.PlateCarree(), draw_labels=True,
                  linewidth=0.1, color='k', alpha=1, 
                  linestyle='--')
gl.top_labels = False
gl.right_labels = False
gl.xformatter = LONGITUDE_FORMATTER
gl.yformatter = LATITUDE_FORMATTER
gl.xlabel_style = {'size': 12}
gl.ylabel_style = {'size': 12} 
plt.tight_layout()
plt.show()
# in an even better way 
fig = plt.figure(figsize=(8,8))
ax = plt.axes(projection=ccrs.PlateCarree())
cbar_kwargs = {
    "fraction": 0.06, 
    "pad": 0.1, 
    "orientation": "horizontal",
}
# ax.set_extent([-20, -10, 30, 60])
# out["1"].plot(ax=ax, transform=ccrs.PlateCarree())
# ax.pcolormesh(out["1"].longitude, out["1"].latitude, out["1"].values)
ds.isel(band=0).Rad.plot.pcolormesh(
    x="longitude", y="latitude", transform=ccrs.PlateCarree(),
    cbar_kwargs=cbar_kwargs
)


ax.set(xlim=[-140, -70,],
      ylim=[ -40, 10])

# # Add map features with Cartopy 
# ax.add_feature(cfeature.NaturalEarthFeature('physical', 'land', '10m', 
#                                             edgecolor='face', 
#                                             facecolor='lightgray'))
ax.coastlines()
# Plot lat/lon grid 
gl = ax.gridlines(crs=ccrs.PlateCarree(), draw_labels=True,
                  linewidth=0.1, color='k', alpha=1, 
                  linestyle='--')
gl.top_labels = False
gl.right_labels = False
gl.xformatter = LONGITUDE_FORMATTER
gl.yformatter = LATITUDE_FORMATTER
gl.xlabel_style = {'size': 12}
gl.ylabel_style = {'size': 12} 
plt.tight_layout()
plt.show()

No description has been provided for this image

Patching¶

In [13]:

Copied!

!cat $cwd/config/example/patch.yaml
!cat $cwd/config/example/patch.yaml

cat: /config/example/patch.yaml: No existe el fichero o el directorio

The most important arguments are the patch_size and stride_size argument. The patch_size dictates how big the patches should be and the stride_size dictates how much space should be between patches. For complete overlap, the stride size should be the patch_size-1. For no overlap, the stride size should be patch_size

python rs_tools satellite=aqua stage=patch read_path=$save_dir save_path=$save_dir nan_cutoff=0.5 
patch_size=16 stride_size=16

Demo Visualization¶

In [14]:

Copied!

ds = xr.open_dataset(f"{save_dir}/aqua/analysis/20201001195500_patch_0.nc", engine="netcdf4")
ds
ds = xr.open_dataset(f"{save_dir}/aqua/analysis/20201001195500_patch_0.nc", engine="netcdf4")
ds

Out[14]:

<xarray.Dataset> Size: 169kB
Dimensions:          (y: 32, x: 32, band: 38, time: 1, band_wavelength: 38)
Coordinates: (6)
Dimensions without coordinates: y, x
Data variables: (1)

xarray.Dataset

Dimensions:
- y: 32
- x: 32
- band: 38
- time: 1
- band_wavelength: 38
Coordinates: (6)
- latitude
  (y, x)
  float32
  ...
```
[1024 values with dtype=float32]
```
- longitude
  (y, x)
  float32
  ...
```
[1024 values with dtype=float32]
```
- band
  (band)
  <U4
  '1' '2' '3' ... '36' '13hi' '14hi'
```
array(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13lo',
       '14lo', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24',
       '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36',
       '13hi', '14hi'], dtype='<U4')
```
- cloud_mask
  (y, x)
  float32
  ...
  valid_range :
  [ 0 -1]
  long_name :
  MODIS Cloud Mask and Spectral Test Results
  units :
  none
  Parameter_Type :
  Output
  Cell_Along_Swath_Sampling :
  [ 1 2040 1]
  Cell_Across_Swath_Sampling :
  [ 1 1354 1]
  Geolocation_Pointer :
  External MODIS geolocation product
  description :
  \n Bit fields within each byte are numbered from the left: 7, 6, 5, 4, 3, 2, 1, 0. The left-most bit (bit 7) is the most significant bit. The right-most bit (bit 0) is the least significant bit. bit field Description Key --------- ----------- --- 0 Cloud Mask Flag 0 = Not determined 1 = Determined 2, 1 Unobstructed FOV Quality Flag 00 = Cloudy 01 = Uncertain 10 = Probably Clear 11 = Confident Clear PROCESSING PATH --------------- 3 Day or Night Path 0 = Night / 1 = Day 4 Sunglint Path 0 = Yes / 1 = No 5 Snow/Ice Background Path 0 = Yes / 1 = No 7, 6 Land or Water Path 00 = Water 01 = Coastal 10 = Desert 11 = Land ____ END BYTE 1 ______________ ___________________________________________ bit field Description Key --------- ----------- --- 1-KM FLAGS ---------------------- 0 Non- cloud obstruction Flag 0 = Yes / 1 = No 1 Thin Cirrus Detected (Solar) 0 = Yes / 1 = No 2 Snow cover from ancillary map 0 = Yes / 1 = No 3 Thin Cirrus Detected (Infrared) 0 = Yes / 1 = No 4 Cloud Adjacency (cloudy, probably 0 = Yes / 1 = No cloudy plus 1-pixel adjacent) 5 Cloud Flag - IR Threshold 0 = Yes / 1 = No 6 High Cloud Flag - CO2 Test 0 = Yes / 1 = No 7 High Cloud Flag - 6.7 micron Test 0 = Yes / 1 = No ____ END BYTE 2 ______________ ___________________________________________ bit field Description Key --------- ----------- --- 0 High Cloud Flag - 1.38 micron Test 0 = Yes / 1 = No 1 High Cloud Flag - 3.9- 12 micron Test 0 = Yes / 1 = No 2 Cloud Flag - IR Temperature 0 = Yes / 1 = No Difference 3 Cloud Flag - 3.9- 11 micron Test 0 = Yes / 1 = No 4 Cloud Flag - Visible Reflectance Test 0 = Yes / 1 = No 5 Cloud Flag - Visible/NIR Reflectance 0 = Yes / 1 = No Ratio Test 6 Cloud Flag - NDVI Clear Sky Restoral 0 = Yes / 1 = No Test 7 Cloud Flag - Night Land and Polar 0 = Yes / 1 = No 7.3-11 micron Test ____ END BYTE 3 ______________ ___________________________________________ bit field Description Key --------- ----------- --- 0 Cloud Flag - Ocean 8.6- 11 micron Test 0 = Yes / 1 = No 1 Cloud Flag - Clear Sky Restoral Test 0 = Yes / 1 = No Spatial Consistency 2 Cloud Flag - Clear Sky Restoral Test 0 = Yes / 1 = No Polar Night, Land, Sun-glint 3 Cloud Flag - Surface Temperature Test 0 = Yes / 1 = No 4 Suspended Dust Flag 0 = Yes / 1 = No 5 Cloud Flag - Night Ocean 8.6- 7.3 micron 0 = Yes / 1 = No Test 6 Cloud Flag - Night Ocean 11 micon 0 = Yes / 1 = No Spatial Variability Test 7 Cloud Flag - Night Ocean Low Emissivity 0 = Yes / 1 = No Low Cloud 3.9-11 micron Test ____ END BYTE 4 ______________ ___________________________________________ bit field Description Key --------- ----------- --- 250-m Cloud Flag - Visible Tests -------------------------------- 0 Element(1,1) 0 = Yes / 1 = No 1 Element(1,2) 0 = Yes / 1 = No 2 Element(1,3) 0 = Yes / 1 = No 3 Element(1,4) 0 = Yes / 1 = No 4 Element(2,1) 0 = Yes / 1 = No 5 Element(2,2) 0 = Yes / 1 = No 6 Element(2,3) 0 = Yes / 1 = No 7 Element(2,4) 0 = Yes / 1 = No ____ END BYTE 5 ______________ ___________________________________________ bit field Description Key ---------- ----------- --- 0 Element(3,1) 0 = Yes / 1 = No 1 Element(3,2) 0 = Yes / 1 = No 2 Element(3,3) 0 = Yes / 1 = No 3 Element(3,4) 0 = Yes / 1 = No 4 Element(4,1) 0 = Yes / 1 = No 5 Element(4,2) 0 = Yes / 1 = No 6 Element(4,3) 0 = Yes / 1 = No 7 Element(4,4) 0 = Yes / 1 = No ____ END BYTE 6 ______________ ___________________________________________
  platform_name :
  EOS-Aqua
  sensor :
  modis
  rows_per_scan :
  10
  reader :
  modis_l2
  name :
  cloud_mask
  resolution :
  1000
  modifiers :
  []
  ancillary_variables :
  []
```
[1024 values with dtype=float32]
```
- time
  (time)
  <U16
  '2020-10-01 19:55'
```
array(['2020-10-01 19:55'], dtype='<U16')
```
- band_wavelength
  (band_wavelength)
  float64
  0.645 0.8585 0.469 ... 13.94 14.23
```
array([ 0.645 ,  0.8585,  0.469 ,  0.555 ,  1.24  ,  1.64  ,  2.13  ,  0.4125,
        0.443 ,  0.488 ,  0.531 ,  0.551 ,  0.667 ,  0.667 ,  0.678 ,  0.678 ,
        0.748 ,  0.8695,  0.905 ,  0.936 ,  0.94  ,  3.75  ,  3.959 ,  3.959 ,
        4.05  ,  4.4655,  4.5155,  1.375 ,  6.715 ,  7.325 ,  8.55  ,  9.73  ,
       11.03  , 12.02  , 13.335 , 13.635 , 13.935 , 14.235 ])
```
Data variables: (1)
- Rad
  (band, y, x)
  float32
  ...
  grid_mapping :
  spatial_ref
```
[38912 values with dtype=float32]
```

Indexes: (3)

band

PandasIndex

PandasIndex(Index(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13lo',
       '14lo', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24',
       '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36',
       '13hi', '14hi'],
      dtype='object', name='band'))

time

PandasIndex

PandasIndex(Index(['2020-10-01 19:55'], dtype='object', name='time'))

band_wavelength

PandasIndex

PandasIndex(Index([ 0.645, 0.8585,  0.469,  0.555,   1.24,   1.64,   2.13, 0.4125,  0.443,
        0.488,  0.531,  0.551,  0.667,  0.667,  0.678,  0.678,  0.748, 0.8695,
        0.905,  0.936,   0.94,   3.75,  3.959,  3.959,   4.05, 4.4655, 4.5155,
        1.375,  6.715,  7.325,   8.55,   9.73,  11.03,  12.02, 13.335, 13.635,
       13.935, 14.235],
      dtype='float64', name='band_wavelength'))

Attributes: (0)

In [15]:

Copied!





# in an even better way 
fig = plt.figure()
ax = plt.axes(projection=ccrs.PlateCarree())
# ax.set_extent([-20, -10, 30, 60])
# out["1"].plot(ax=ax, transform=ccrs.PlateCarree())
# ax.pcolormesh(out["1"].longitude, out["1"].latitude, out["1"].values)
ds.isel(band=0).Rad.plot.pcolormesh(x="longitude", y="latitude", transform=ccrs.PlateCarree())


# ax.set(xlim=[-140, -70,],
#       ylim=[ -40, 10])

# # Add map features with Cartopy 
# ax.add_feature(cfeature.NaturalEarthFeature('physical', 'land', '10m', 
#                                             edgecolor='face', 
#                                             facecolor='lightgray'))
ax.coastlines()
# Plot lat/lon grid 
gl = ax.gridlines(crs=ccrs.PlateCarree(), draw_labels=True,
                  linewidth=0.1, color='k', alpha=1, 
                  linestyle='--')
gl.top_labels = False
gl.right_labels = False
gl.xformatter = LONGITUDE_FORMATTER
gl.yformatter = LATITUDE_FORMATTER
gl.xlabel_style = {'size': 8}
gl.ylabel_style = {'size': 8} 
plt.tight_layout()
plt.show()
# in an even better way 
fig = plt.figure()
ax = plt.axes(projection=ccrs.PlateCarree())
# ax.set_extent([-20, -10, 30, 60])
# out["1"].plot(ax=ax, transform=ccrs.PlateCarree())
# ax.pcolormesh(out["1"].longitude, out["1"].latitude, out["1"].values)
ds.isel(band=0).Rad.plot.pcolormesh(x="longitude", y="latitude", transform=ccrs.PlateCarree())


# ax.set(xlim=[-140, -70,],
#       ylim=[ -40, 10])

# # Add map features with Cartopy 
# ax.add_feature(cfeature.NaturalEarthFeature('physical', 'land', '10m', 
#                                             edgecolor='face', 
#                                             facecolor='lightgray'))
ax.coastlines()
# Plot lat/lon grid 
gl = ax.gridlines(crs=ccrs.PlateCarree(), draw_labels=True,
                  linewidth=0.1, color='k', alpha=1, 
                  linestyle='--')
gl.top_labels = False
gl.right_labels = False
gl.xformatter = LONGITUDE_FORMATTER
gl.yformatter = LATITUDE_FORMATTER
gl.xlabel_style = {'size': 8}
gl.ylabel_style = {'size': 8} 
plt.tight_layout()
plt.show()

DataLoading¶

We can start using any dataloader framework right away. In this example, we will use PyTorch.

In [16]:

Copied!





from rs_tools._src.utils.io import get_list_filenames
from rs_tools._src.datamodule.utils import load_nc_file
from rs_tools._src.datamodule.editor import StackDictEditor, CoordNormEditor
from toolz import compose_left
from rs_tools._src.utils.io import get_list_filenames
from rs_tools._src.datamodule.utils import load_nc_file
from rs_tools._src.datamodule.editor import StackDictEditor, CoordNormEditor
from toolz import compose_left

/home/juanjohn/miniconda/envs/rs_tools/lib/python3.11/site-packages/goes2go/data.py:519: FutureWarning: 'H' is deprecated and will be removed in a future version. Please use 'h' instead of 'H'.
  within=pd.to_timedelta(config["nearesttime"].get("within", "1H")),
/home/juanjohn/miniconda/envs/rs_tools/lib/python3.11/site-packages/goes2go/NEW.py:188: FutureWarning: 'H' is deprecated and will be removed in a future version. Please use 'h' instead of 'H'.
  within=pd.to_timedelta(config["nearesttime"].get("within", "1H")),

We will create a very simple demo dataloader

In [17]:

Copied!





from torch.utils.data import Dataset, DataLoader
from typing import Optional, Callable

class NCDataReader(Dataset):
    def __init__(self, data_dir: str, ext: str=".nc", transforms: Optional[Callable]=None):
        self.data_dir = data_dir
        self.data_filenames = get_list_filenames(data_dir, ext)
        self.transforms = transforms

    def __getitem__(self, ind) -> np.ndarray:
        nc_path = self.data_filenames[ind]
        x = load_nc_file(nc_path)
        if self.transforms is not None:
            x = self.transforms(x)
        return x

    def __len__(self):
        return len(self.data_filenames)
from torch.utils.data import Dataset, DataLoader
from typing import Optional, Callable

class NCDataReader(Dataset):
    def __init__(self, data_dir: str, ext: str=".nc", transforms: Optional[Callable]=None):
        self.data_dir = data_dir
        self.data_filenames = get_list_filenames(data_dir, ext)
        self.transforms = transforms

    def __getitem__(self, ind) -> np.ndarray:
        nc_path = self.data_filenames[ind]
        x = load_nc_file(nc_path)
        if self.transforms is not None:
            x = self.transforms(x)
        return x

    def __len__(self):
        return len(self.data_filenames)

In [18]:

Copied!

ds = NCDataReader(f"{save_dir}/aqua/analysis")
dl = DataLoader(ds, batch_size=8)
ds = NCDataReader(f"{save_dir}/aqua/analysis")
dl = DataLoader(ds, batch_size=8)

In [19]:

Copied!

out = next(iter(dl))
out = next(iter(dl))

In [20]:

Copied!

list(out.keys())
list(out.keys())

Out[20]:

['data', 'wavelengths', 'coords', 'cloud_mask']

In [21]:

Copied!

out["data"].shape, out["coords"].shape
out["data"].shape, out["coords"].shape

Out[21]:

(torch.Size([8, 38, 32, 32]), torch.Size([8, 2, 32, 32]))

Transforms/Editors¶

We can also use custom transformations within the dataset (just like standard PyTorch) to transform our dataset

In [22]:

Copied!





transforms = compose_left(
    CoordNormEditor(), 
    StackDictEditor(),
)
transforms = compose_left(
    CoordNormEditor(), 
    StackDictEditor(),
)

In [23]:

Copied!





# initialize dataset with transforms
ds = NCDataReader(f"{save_dir}/aqua/analysis", transforms=transforms)

# initialize dataloader
dl = DataLoader(ds, batch_size=8)

# do one iteration
out = next(iter(dl))

# inspect a batch
out.shape
# initialize dataset with transforms
ds = NCDataReader(f"{save_dir}/aqua/analysis", transforms=transforms)

# initialize dataloader
dl = DataLoader(ds, batch_size=8)

# do one iteration
out = next(iter(dl))

# inspect a batch
out.shape

The Kernel crashed while executing code in the current cell or a previous cell. 

Please review the code in the cell(s) to identify a possible cause of the failure. 

Click <a href='https://aka.ms/vscodeJupyterKernelCrash'>here</a> for more info. 

View Jupyter <a href='command:jupyter.viewOutput'>log</a> for further details.