Abstract

Due to the nature of their pathways, NASA Terra and NASA Aqua satellites capture imagery containing “swath gaps'' which are areas of no data. Swath gaps can overlap the region of interest (ROI) completely, often rendering the entire imagery unusable by Machine Learning (ML) models. This problem is further exacerbated when the ROI rarely occurs (e.g. a hurricane) and, on occurrence, is partially overlapped with a swath gap. With annotated data as supervision, a model can learn to differentiate between the area of focus and the swath gap. However, annotation is expensive and currently the vast majority of existing data is unannotated. Hence, we propose an augmentation technique that considerably removes the existence of swath gaps in order to allow CNNs to focus on the ROI, and thus successfully use data with swath gaps for training. We experiment on the UC Merced Land Use Dataset, where we add swath gaps through empty polygons (up to 20% areas) and then apply augmentation techniques to fill the swath gaps. We compare the model trained with our augmentation techniques on the swath gap-filled data with the model trained on the original swath gap-less data and note highly augmented performance. Additionally, we perform a qualitative analysis using activation maps that visualizes the effectiveness of our trained network in not paying attention to the swath gaps. We also evaluate our results with a human baseline and show that, in certain cases, the filled swath gaps look so realistic that even a human evaluator did not distinguish between original satellite images and swath gap-filled images. Since this method is aimed at unlabeled data, it is widely generalizable and impactful for large scale unannotated datasets from various space data domains.

Esther Cao

Bibtex

@article{caochen2020swathgaps,
  title={Reducing Effects of Swath Gaps in Unsupervised Machine Learning},
  author={Chen, Sarah and Cao, Esther and Koul, Anirudh and Ganju, Siddha 
  and Praveen, Satyarth and Kasam, Meher Anand},
  journal={Committee on Space Research Machine Learning for Space Sciences Workshop,
  Cross-Disciplinary Workshop on Cloud Computing},
  year={2021}
}

What are Swath Gaps?

Swath gaps are empty or no data regions that occur in MODIS imagery. They exist because the MODIS satellites have a swath bandwidth of only 2330 km wide, causing consecutive orbits at the equator to miss coverage. Due to this missing coverage, these regions of missing data are present primarily as nine spindle shapes around the equator; however, they also occur at the North and South Poles during seasons of minimal sunlight. This uncollected data can be visualized as black spindles or swath gaps by NASA's Earth Observing System Data and Information System in the NASA Worldview.

Why are Swath Gaps important?

Through data exploration experiments and input from the NASA IMPACT team, we noted that similarity search experiments that find the most similar image, supposedly based on regions of interests (ROIs) such as hurricanes or beaches, instead return images with similarly placed swath gaps. These search engines focus on swath gaps, rather than concentrating on the ROI. Thus, we determine that when notable swath gaps are present in satellite imagery, specifically used for training Earth-sciences machine learning (ML) models, these areas of missing data render the entire image unusable by unsupervised training models. Further, ML pattern recognition algorithms begin recognizing the image's swath gap as its main feature, rather than the features of its primary ROI. This is an issue because, given the nature of satellite imagery, events of interest or ROIs are already quite sporadic in regards to rare events such as tornadoes, wildfires, and volcanic eruptions, and thus every piece of data is valuable. With already limited data, the occurrence of a swath gap overlapping the ROI further reduces the available data. We address the problem that swath gaps create when present in unlabeled image datasets that are used to train unsupervised machine learning models. The following is an example of swath gaps present due to NASA Terra and NASA Aqua satellite pathways.





Our Filling Methods

We resolve the problem that swath gaps pose by experimenting with three different methods: Random RGB (filling the swath gap with randomly selected RGB pixels), Pixel RGB (filling the swath gap with randomly selected RGB pixels from the image), and Neighbor RGB (filling the swath gap with randomly selected RGB pixels from a dynamic radius of the given pixel to be filled).





Evaluation Activation Maps

One way in which we assess our fill methods is by analyzing the effects they have on activation maps as seen below. We further evaluated our fill methods by using an autoencoder to perform similarity searches - given an input image, we ask the autoencoder to return the four most similar images from the dataset. Separate autoencoder models with each augmentation policy are trained. Testing was then conducted on the three filling methods by querying images filled with our methods into the autoencoder. These results greatly improved as methodology increased from method one (Random RGB) to two (Pixel RGB) to three (Neighbor RGB), with successively more images categorized correctly per fill method. Repetition of this experiment indicated that filling method three, Neighbor RGB, was the most efficient, with consistently three or four out of the four "most similar" images being categorized correctly.