Skip to content

Automatic roads segmentation

Introduction

We understand that life is too short to annotate thouthands of images manually. In this blog post we describe how to aply human-in-the-loop aproach for automatic image segmentation. We applied this approach to create big and high-quality training dataset for person segmentation:

  • Dataset consists of 5711 images with 6884 high-quality annotated person instances

  • Annotation team consisted of two members and the whole process took only 4 days

This tutorial is a step by step guide to help you apply human in the loop approach to your custom task. For example, we will consider road segmentation. But you can change target classes and reproduce results for your custom data.

Tasks diversity

Of course such approach can not be applicable to all possible segemntation and detection tasks. But we believe that this method will cover 90% of common scenarios. For some "exotic" or "hard" cases we have number of tricks and workarounds that will be discussed in next tutorials.

Reproduse results

We added this data to Datasets Library, so you can reproduce our experiment yourself.

Task description

One of our partners have huge number of videos with road scenes for both country and city roads. They annotate a lof of objects and road is the one of them. They collecte tens of millions of images and all this images have to be annotated. Of course it is impossible to do manually. Sometimes images from neighboring frames looks the same, sometimes images the are pretty diverse (i.e. various weather and lighting conditions). Here are the few examples:

So we are going to train neural network that will help our annotation team with road segmentation.

Pipeline

We slightly simplified real use case (for tutorial purposes). We took only one video file (~ 11k images). Each frame (image) was shooted every two meters. So this video file corresponds to ~20 kilometers of real road.

High-level pipeline:

  1. We randomly selected 156 images from video. These images will be used for testing. We put them to roads_test project.

  2. We manually selected 10 images from video. Of course all this images are not presented in our test set. But they were chosen to cover the variety of test images as much as possible. These images will be annotated and then used for training. We put them to roads_annotated project.

  3. Perform DTL query to apply data augmentations to our 10 annotated images (project roads_annotated). Thus, we get training set of 220 images. We name resulting project as train_01.

  4. Train neural network for roads segemntation on train_01 project. We take UNetV2 architecture and init weights from VGG. You can choose any another neural network. We choosed UNetV2 because it is fast to train and it is pretty accurate. As a result we get nn_road_01 model.

  5. Apply nn_road_01 model to test images and put neural network predictions to inf_01_roads_test project. Road on many images is presegmented really nice, but there is the cases when NN fails.

  6. We clone inf_01_roads_test project (just to save original predictions) to project inf_01_annotated, choose 5 images and correct NN predictions manually. We tag corrected images with corrected tag.

  7. Apply DTL to combine new 5 annotated images (correction of NN predictions) and 10 annotated images (that were annotated manually from scratch), apply augmentations for them (like in step 3). Thus, we get training set of 440 images. We name resulting project as train_02.

  8. We take nn_road_01 model and continue its training on new project train_02. As a result we get nn_road_02 model.

  9. Apply nn_road_02 model to test images and put neural network predictions to inf_02_roads_test project. We compare predictions with previous and see that NN becomes smarter, presegmentation becomes better and in most images have ideal presegmentation.

All steps took 45 minutes:

  • NN training time: 23 minutes. Time to ☕

  • Manual annotation of 15 images: 12 minutes.

  • Move mouse cursor and press necessary buttons inside Supervisely: 8 minutes.

For example, the method described above is iterative. So If you can not achieve necessary performance, just make few iterations. We recommend to make fast iterations: annotate new relatively small portion of images, train new model and test it on big testset to monitor progress in quality. After few iterations you will get both: accurate neural network and big training dataset.

Scale it

As we said at the beginning, current use case was simplified. But you can scale it easily. Just take 50 videos or more and apply the described steps to all of them. Thus the resulting neural network will generalize better.

Step 1-2. Initial data preparation.

We uploaded test images to roads_test project, images we are going to annotate to roads_annotated project. Project roads_test consists of 156 images, project roads_annotated consists of 10 images.

Few examples of annotated images:

Few examples of test images:

Step 3. First training set with DTL

So we have 10 annotated images. We have to augment our data before we start training. We apply DTL query to create training set of 220 images from 10 images.

Here is the raw DTL config:

[
  {
    "dst": "$sample",
    "src": [
      "roads_annotated/*"
    ],
    "action": "data",
    "settings": {
      "classes_mapping": "default"
    }
  },
  {
    "dst": "$fv",
    "src": [
      "$sample"
    ],
    "action": "flip",
    "settings": {
      "axis": "vertical"
    }
  },
  {
    "dst": "$data",
    "src": [
      "$fv",
      "$sample"
    ],
    "action": "dummy",
    "settings": {}
  },
  {
    "dst": "$data2",
    "src": [
      "$data"
    ],
    "action": "multiply",
    "settings": {
      "multiply": 10
    }
  },
  {
    "dst": "$data3",
    "src": [
      "$data2"
    ],
    "action": "crop",
    "settings": {
      "random_part": {
        "width": {
          "max_percent": 90,
          "min_percent": 70
        },
        "height": {
          "max_percent": 90,
          "min_percent": 70
        },
        "keep_aspect_ratio": false
      }
    }
  },
  {
    "dst": [
      "$totrain",
      "$toval"
    ],
    "src": [
      "$data3",
      "$data"
    ],
    "action": "if",
    "settings": {
      "condition": {
        "probability": 0.95
      }
    }
  },
  {
    "dst": "$train",
    "src": [
      "$totrain"
    ],
    "action": "tag",
    "settings": {
      "tag": "train",
      "action": "add"
    }
  },
  {
    "dst": "$val",
    "src": [
      "$toval"
    ],
    "action": "tag",
    "settings": {
      "tag": "val",
      "action": "add"
    }
  },
  {
    "dst": "$data_with_bg",
    "src": [
      "$train",
      "$val"
    ],
    "action": "background",
    "settings": {
      "class": "bg"
    }
  },
  {
    "dst": "train_01",
    "src": [
      "$data_with_bg"
    ],
    "action": "supervisely",
    "settings": {}
  }
]

Explanation:

100%

Here are the few examples of images after augmentations:

Step 4. Train first NN

Basic step by step training guide is here. So it is the same for all models inside Supervisely. Detailed information regarding training configs is here.

UNetV2 weihts were initialized from corresponding model from Model Zoo (UNetV2 with VGG weigths that was pretrained on ImageNet).

Resulting model will be named as nn_road_01. Project train_01 is used for training.

Training configuration:

{
  "lr": 0.001,
  "epochs": 20,
  "val_every": 1,
  "batch_size": {
    "val": 1,
    "train": 4
  },
  "input_size": {
    "width": 512,
    "height": 512
  },
  "gpu_devices": [
    0,
    1,
    2
  ],
  "data_workers": {
    "val": 0,
    "train": 3
  },
  "dataset_tags": {
    "val": "val",
    "train": "train"
  },
  "special_classes": {
    "neutral": "neutral",
    "background": "bg"
  },
  "weights_init_type": "transfer_learning"
}

Training takes 8 minutes on three GPUs. Here is the loss chart during training:

If we set a lof of epochs on training, the chart becomes awkward. So we can set the range of epochs we want to show. Thus we can monitor local changes of accuracy and loss. Here is the example:

Step 5. Apply first NN to test data

Basic step by step inference guide is here. So it is the same for all models inside Supervisely. Detailed information regarding inference configs is here.

We apply model nn_road_01 to project roads_test. Resulting project with neural network predictions will be saved as inf_01_roads_test.

Here are predictions. Most of predictions looks perfect, but also there are mistakes:

Step 6. Correct NN predictions

Firstly we clone project inf_01_roads_test to inf_01_annotated just to save original predictions and then compare them with future predictions. We choose 5 images and correct them. Also we assign tag "corrected" on them. Here they are:

Step 7. Second training set with DTL

I think this is the most interesting part of this guide. We need to combine first annotated images (10 images) with new ones (5 images) and then apply augmentations we use in Step 3.

Here is the raw DTL query:

[
  {
    "dst": "$sample_inf",
    "src": [
      "inf_01_annotated/*"
    ],
    "action": "data",
    "settings": {
      "classes_mapping": {
        "__other__": "__ignore__",
        "road_unet": "road"
      }
    }
  },
  {
    "dst": [
      "$corrected",
      "$to_skip"
    ],
    "src": [
      "$sample_inf"
    ],
    "action": "if",
    "settings": {
      "condition": {
        "tags": [
          "corrected"
        ]
      }
    }
  },
  {
    "dst": "$inf_annotated",
    "src": [
      "$corrected"
    ],
    "action": "multiply",
    "settings": {
      "multiply": 2
    }
  },
  {
    "dst": "$inf_annotated_ds",
    "src": [
      "$inf_annotated"
    ],
    "action": "dataset",
    "settings": {
      "name": "iteration_01"
    }
  },
  {
    "dst": "$sample_bitmap_ds",
    "src": [
      "$sample_bitmap"
    ],
    "action": "dataset",
    "settings": {
      "name": "iteration_00"
    }
  },
  {
    "dst": "$annotated_data",
    "src": [
      "$inf_annotated_ds",
      "$sample_bitmap_ds"
    ],
    "action": "dummy",
    "settings": {}
  },
  {
    "dst": "$fv",
    "src": [
      "$annotated_data"
    ],
    "action": "flip",
    "settings": {
      "axis": "vertical"
    }
  },
  {
    "dst": "$sample",
    "src": [
      "roads_annotated/*"
    ],
    "action": "data",
    "settings": {
      "classes_mapping": {
        "road": "road_poly"
      }
    }
  },
  {
    "dst": "$sample_bitmap",
    "src": [
      "$sample"
    ],
    "action": "poly2bitmap",
    "settings": {
      "classes_mapping": {
        "road_poly": "road"
      }
    }
  },
  {
    "dst": "$data",
    "src": [
      "$fv",
      "$annotated_data"
    ],
    "action": "dummy",
    "settings": {}
  },
  {
    "dst": "$data2",
    "src": [
      "$data"
    ],
    "action": "multiply",
    "settings": {
      "multiply": 10
    }
  },
  {
    "dst": "$data3",
    "src": [
      "$data2"
    ],
    "action": "crop",
    "settings": {
      "random_part": {
        "width": {
          "max_percent": 90,
          "min_percent": 70
        },
        "height": {
          "max_percent": 90,
          "min_percent": 70
        },
        "keep_aspect_ratio": false
      }
    }
  },
  {
    "dst": [
      "$totrain",
      "$toval"
    ],
    "src": [
      "$data3",
      "$data"
    ],
    "action": "if",
    "settings": {
      "condition": {
        "probability": 0.95
      }
    }
  },
  {
    "dst": "$train",
    "src": [
      "$totrain"
    ],
    "action": "tag",
    "settings": {
      "tag": "train",
      "action": "add"
    }
  },
  {
    "dst": "$val",
    "src": [
      "$toval"
    ],
    "action": "tag",
    "settings": {
      "tag": "val",
      "action": "add"
    }
  },
  {
    "dst": "$data_with_bg",
    "src": [
      "$train",
      "$val"
    ],
    "action": "background",
    "settings": {
      "class": "bg"
    }
  },
  {
    "dst": "train_02",
    "src": [
      "$data_with_bg"
    ],
    "action": "supervisely",
    "settings": {}
  }
]

Explanation:

100%

As the result we got the project "train_02" with 440 images.

Step 8. Train second NN

We take nn_road_01 model and continue its training on new project train_02. As a result we get nn_road_02 model.

Here is the training config:

{
  "lr": 0.001,
  "epochs": 10,
  "val_every": 1,
  "batch_size": {
    "val": 1,
    "train": 4
  },
  "input_size": {
    "width": 512,
    "height": 512
  },
  "gpu_devices": [
    0,
    1,
    2
  ],
  "data_workers": {
    "val": 0,
    "train": 3
  },
  "dataset_tags": {
    "val": "val",
    "train": "train"
  },
  "special_classes": {
    "neutral": "neutral",
    "background": "bg"
  },
  "weights_init_type": "continue_training"
}

Here are the loss and accuracy. Model becomes slightly better.

Step 9. Apply second NN and compare results

We apply nn_road_02 model to roads_test project and save predictions to inf_02_roads_test. Let's compare predictions (project inf_01_roads_test and project inf_02_roads_test). New model have to be "smarter" then previous. Here we will show only the images, where model predictions becomes better. We revised all images and can not find any examples when it becomes worse.

Be carefull with overfitting

If after next iteration model predictions become worse, it is a great chance that NN is overfitted. Try to restore model from earlier checkpoint. More info read here.

Before After
Before After
Before After

Conclusion

As you can see, with Supervisely and human in the loop approach we can iterate until we get necessary performance. It saves a lot of time of our annotation team. We had experience with different segmentation tasks and in most cases this approach works. Of course there are also complicated scenarios (i.e. if we want to segment curbs, poles or other tricky objects), or our images are too wide. In next tutorials we will give you few workarounds for such examples.