This guide addresses a core topic in remote sensing: the automatic pixelwise labeling of aerial imagery. There are many use cases for analytics on satellite data ranging from tracking of different objects to segmenting houses, roads, etc. Today Deep Learning approaches are at the core of many production solutions. When applying machine learning (ML) to satellite data there are a few typical challenges that often arise:
The size of images: Images may have high resolution, so data will be in the tens of gigabytes. With continuous data updates image collection can quickly reach multiple terabytes.
Custom machine learning: many out-of-the-box solutions do not achive acceptable accuracies without customization. In addition sometimes unconventional pre- and post- processing steps are required.
Today we are going to demonstrate how users can build a custom neural network for buildings segmentation inside Supervisely.
We will show how to perform special training data preparations (DTL layer: sliding window crops), train a neural network, apply it to test images in a sliding window manner and then postprocess predictions (split single prediction mask into separate objects).
For this tutorial we will use The Inria Aerial Image Labeling Dataset.
You can download the data and reproduce the entire research yourself. The data may only be used for non-commercial purposes.
Download original dataset
Unzip the archive, open the train
folder and rename the gt
folder to ann
Choose the binary_masks
import option
Drag images
and ann
dirs to the upload window
Name this project INRIA
This DTL query splits the original dataset into train-validation and test subsets taking approximately the same number of images from each town (around 5 images per town).
Layer #1 ("action": "data"
) takes all data from project INRIA
and keeps classes as they are. Then there are five indentical branches for each town (Austin, Chicago, Kitsap County, Vienna and West Tyrol), so we will describe only one of them for Austin.
Layer #2 ("action": "if"
) selects the data related to austin
, other data is sent to $null1
.
Layer #3 ("action": "tag"
) adds tag austin
to each input image.
Layer #4 ("action": "if"
) randomly splits the data into two branches: first branch - 14% (will be tagged as trainval
) and second branch - 86% (will be tagged as test
). We decreased the number of training images to demonstrate that entire pipline can be passed with a small number of annotated images.
Layer #5 ("action": "tag"
) adds tag train
to all input images.
Layer #6 ("action": "tag"
) adds tag val
to all input images .
Layer #7 ("action": "supervisely"
) saves results to the new project INRIA_TAGGED
.
[{"dst": "$sample","src": ["INRIA/*"],"action": "data","settings": {"classes_mapping": "default"}},{"dst": ["$1","null"],"src": ["$sample"],"action": "if","settings": {"condition": {"regex_names": ["^austin[0-9]+$"]}}},{"dst": "$tag_1","src": ["$1"],"action": "tag","settings": {"tag": "austin","action": "add"}},{"dst": ["$to_train1","$to_val1"],"src": ["$tag_1"],"action": "if","settings": {"condition": {"probability": 0.14}}},{"dst": "$train1","src": ["$to_train1"],"action": "tag","settings": {"tag": "trainval","action": "add"}},{"dst": "$val1","src": ["$to_val1"],"action": "tag","settings": {"tag": "test","action": "add"}},{"dst": ["$2","null"],"src": ["$sample"],"action": "if","settings": {"condition": {"regex_names": ["^chicago[0-9]+$"]}}},{"dst": "$tag_2","src": ["$2"],"action": "tag","settings": {"tag": "chicago","action": "add"}},{"dst": ["$to_train2","$to_val2"],"src": ["$tag_2"],"action": "if","settings": {"condition": {"probability": 0.14}}},{"dst": "$train2","src": ["$to_train2"],"action": "tag","settings": {"tag": "trainval","action": "add"}},{"dst": "$val2","src": ["$to_val2"],"action": "tag","settings": {"tag": "test","action": "add"}},{"dst": ["$3","null"],"src": ["$sample"],"action": "if","settings": {"condition": {"regex_names": ["^kitsap[0-9]+$"]}}},{"dst": "$tag_3","src": ["$3"],"action": "tag","settings": {"tag": "kitsap","action": "add"}},{"dst": ["$to_train3","$to_val3"],"src": ["$tag_3"],"action": "if","settings": {"condition": {"probability": 0.14}}},{"dst": "$train3","src": ["$to_train3"],"action": "tag","settings": {"tag": "trainval","action": "add"}},{"dst": "$val3","src": ["$to_val3"],"action": "tag","settings": {"tag": "test","action": "add"}},{"dst": ["$4","null"],"src": ["$sample"],"action": "if","settings": {"condition": {"regex_names": ["^tyrol-w[0-9]+$"]}}},{"dst": "$tag_4","src": ["$4"],"action": "tag","settings": {"tag": "tyrol-w","action": "add"}},{"dst": ["$to_train4","$to_val4"],"src": ["$tag_4"],"action": "if","settings": {"condition": {"probability": 0.14}}},{"dst": "$train4","src": ["$to_train4"],"action": "tag","settings": {"tag": "trainval","action": "add"}},{"dst": "$val4","src": ["$to_val4"],"action": "tag","settings": {"tag": "test","action": "add"}},{"dst": ["$5","null"],"src": ["$sample"],"action": "if","settings": {"condition": {"regex_names": ["^vienna[0-9]+$"]}}},{"dst": "$tag_5","src": ["$5"],"action": "tag","settings": {"tag": "vienna","action": "add"}},{"dst": ["$to_train5","$to_val5"],"src": ["$tag_5"],"action": "if","settings": {"condition": {"probability": 0.14}}},{"dst": "$train5","src": ["$to_train5"],"action": "tag","settings": {"tag": "trainval","action": "add"}},{"dst": "$val5","src": ["$to_val5"],"action": "tag","settings": {"tag": "test","action": "add"}},{"dst": "INRIA_TAGGED","src": ["$train1","$val1","$train2","$val2","$train3","$val3","$train4","$val4","$train5","$val5"],"action": "supervisely","settings": {}}]
In this DTL query we filter images, apply transformations and split them into train and validation sets.
Layer #1 ("action": "data"
) takes all data from project INRIA_TAGGED
and keeps classes as they are.
Layer #2 ("action": "if"
) splits the data into two branches based on tags. Further we will work only with the images which have tag trainval
, other images will be sent to null
.
Layer #3 ("action": "sliding_window"
) creates crops of size 512x512 from the images using sliding window approach.
Layer #4 ("action": "flip"
) flips data horisontally.
Layer #5 ("action": "flip"
) flips data vertically.
Layer #6 ("action": "if"
) randomly splits the data into two branches: first branch - 95% (will be tagged as train
) and second branch - 5% (will be tagged as val
).
Layer #7 ("action": "tag"
) adds the tag train
to all input images .
Layer #8 ("action": "tag"
) adds the tag val
to all input images.
Layer #9 ("action": "if"
) filters the data (images without objects will be skipped).
Layer #10 ("action": "supervisely"
) saves results to the new project INRIA_train
. As a result we got around 7000 images for training.
[{"dst": "$sample","src": ["INRIA_TAGGED/*"],"action": "data","settings": {"classes_mapping": "default"}},{"dst": ["$sample1","null"],"src": ["$sample"],"action": "if","settings": {"condition": {"tags": ["trainval"]}}},{"dst": "$crops","src": ["$sample1"],"action": "sliding_window","settings": {"window": {"width": 512,"height": 512},"min_overlap": {"x": 0,"y": 0}}},{"dst": "$vflip","src": ["$crops"],"action": "flip","settings": {"axis": "vertical"}},{"dst": "$hflip","src": ["$crops","$vflip"],"action": "flip","settings": {"axis": "horizontal"}},{"dst": ["$to_train","$to_val"],"src": ["$crops","$hflip","$vflip"],"action": "if","settings": {"condition": {"probability": 0.95}}},{"dst": "$train","src": ["$to_train"],"action": "tag","settings": {"tag": "train","action": "add"}},{"dst": "$val","src": ["$to_val"],"action": "tag","settings": {"tag": "val","action": "add"}},{"dst": ["$res","null"],"src": ["$train","$val"],"action": "if","settings": {"condition": {"min_objects_count": 1}}},{"dst": "INRIA_train","src": ["$res"],"action": "supervisely","settings": {}}]
Basic step by step training guide is here. It is the same for all models inside Supervisely. Detailed information regarding training configs is here.
UNetV2's encoder weights were initialized from corresponding model from Model Zoo (UNetV2 with VGG weigths that was pretrained on ImageNet).
Training config:
{"lr": 0.001,"epochs": 5,"val_every": 1,"batch_size": {"val": 3,"train": 3},"input_size": {"width": 512,"height": 512},"gpu_devices": [0,1,2],"data_workers": {"val": 3,"train": 3},"dataset_tags": {"val": "val","train": "train"},"special_classes": {"neutral": "neutral","background": "bg"},"weights_init_type": "transfer_learning"}
Training took ~ 1 hour.
Here are the training charts:
Sliding window inference:
{"mode": {"save": false,"source": "sliding_window","window": {"width": 512,"height": 512},"min_overlap": {"x": 128,"y": 128}},"gpu_devices": [0],"model_classes": {"add_suffix": "_dl","save_classes": "__all__"},"existing_objects": {"add_suffix": "","save_classes": []}}
Here is an example of NN predictions:
Before splitting all buildings are on a single mask:
After the DTL query all buildings are split into different masks:
Here is a simple example of the DTL query:
[{"dst": "$sample","src": ["inf_INRIA_test_part/*"],"action": "data","settings": {"classes_mapping": "default"}},{"dst": "$splitted","src": ["$sample"],"action": "split_masks","settings": {"classes": ["untitled_dl"]}},{"dst": "inf_splitted","src": ["$splitted"],"action": "supervisely","settings": {}}]
Here is the visualization of DTL query: