Skip to content

NNs and irregular image size

Introducton

There are many real world applications where image size has irregular shape: wide images from self-driving cars, huge resolutions in satellite imagery and so on. And in this context feed entire image to neural network without loss of quality is almost impossible. Especially in cases where it is necessary to carefully segment very small objects.

This guide demonstrates how we can train neural network on wide images and then apply it to similar images.

Customize for your case

This tutorial is a set of basic rules and procedures you can adapt and apply to your custom task.

Here is the example of wide image. If we want to segment persons, we can not just simply resize image to 512px * 512px and feed it to NN because we will obtain low quality segmentations.

Here is the example of satellite image. If we want to segment cars, we can not just simply resize image and feed it to NN because they are too small.

Task description

We want to build cars segmentation model. For this purposes we take KITTI semantic segmentation benchmark. There are few reasons why we take this dataset: you can reproduce this experiment, data is already annotated, images are wide. Data consists of 200 semantically annotated train as well as 200 test images. Almost all images have resolution 1242 px X 375 px. As you can see, images are wide. Also we will have only 200 train images and it allows us to demonstrate power of DTL queries for image augmentation.

Train image with annotations:

Test image without annotations:

Reproduse results

You can download tutorial data and reproduce entire experiment yourself.

Combine described approaches with other tutorials

We recommend to combine these techniques with other tutorials to obtain better results.

Pipeline

  1. Upload data. As a result we have project kitti-semseg-train with annotated data and kitti-semseg-test with test images.

  2. Apply DTL query to build new project kitti-veh-tr. We will merge few classes (different vehicles) to a single one, drop objects of other classes and do special image augmentations to simulate image scale we will use during inference.

  3. Train UNetV2 network for cars segmentation task. As a result we will get model unet-kitti-veh.

  4. Apply unet-kitti-veh model to test images and analyse results.

Step 1. Prepare data for train and test

Just download archive from official Kitty site, unpack and import its content to the system. Supervisely supports Kitty format. Read more here.

Let's go to kitti-semseg-train project -> Statistics -> Objects area. Supervisely automatically calculates all statistics in real time and it allows us to get valuable insides from our data. Then we can use this information to creating "right" training set. This dataset contains 34 classes. Let's analyze the classes that cover the largest area.

If we will traing binary segmentation model (classes car and background) we face class imbalance problem. To deal with it we will use UNet V2 architecture with additional Dice loss function (it automatically turns on).

Step 2. Use DTL to prepare training data

We take project kitti-semseg-train (200 images) and create new one - kitti-veh-tr (2400 images). Few interesting comments:

  1. Original dataset contains 34 classes. If we want to train car segmentation model we have to merge corresponding classes to a single one and drop other classes. We do it with Data layer (first layer with "action": "data"). We map classes bus, car, train, truck, caravan and trailer to single class vehicle. Other classes will be dropped (because of this field : "__other__": "__ignore__").

  2. We multiply images and with their flipped versions and then perform crops with specific sizes. We are going to apply model in a sliding window manner. That is why we have to custruct training that will have similar images shapes. Example:

Here is the entire DTL query:

[
  {
    "dst": "$sample",
    "src": [
      "kitti-semseg-train/*"
    ],
    "action": "data",
    "settings": {
      "classes_mapping": {
        "bus": "vehicle",
        "car": "vehicle",
        "train": "vehicle",
        "truck": "vehicle",
        "caravan": "vehicle",
        "trailer": "vehicle",
        "__other__": "__ignore__"
      }
    }
  },
  {
    "dst": "$sample1",
    "src": [
      "$sample"
    ],
    "action": "dataset",
    "settings": {
      "rule": "save_original"
    }
  },
  {
    "dst": [
      "$100-train",
      "$100-val"
    ],
    "src": [
      "$sample1"
    ],
    "action": "if",
    "settings": {
      "condition": {
        "probability": 0.95
      }
    }
  },
  {
    "dst": "$101-train",
    "src": [
      "$100-train"
    ],
    "action": "tag",
    "settings": {
      "tag": "train",
      "action": "add"
    }
  },
  {
    "dst": "$101-val",
    "src": [
      "$100-val"
    ],
    "action": "tag",
    "settings": {
      "tag": "val",
      "action": "add"
    }
  },
  {
    "dst": "$102",
    "src": [
      "$101-train",
      "$101-val"
    ],
    "action": "dummy",
    "settings": {}
  },
  {
    "dst": "$102-flipv",
    "src": [
      "$102"
    ],
    "action": "flip",
    "settings": {
      "axis": "vertical"
    }
  },
  {
    "dst": "$103",
    "src": [
      "$102",
      "$102-flipv"
    ],
    "action": "multiply",
    "settings": {
      "multiply": 6
    }
  },
  {
    "dst": "$104",
    "src": [
      "$103"
    ],
    "action": "crop",
    "settings": {
      "random_part": {
        "width": {
          "max_percent": 30,
          "min_percent": 27
        },
        "height": {
          "max_percent": 100,
          "min_percent": 90
        },
        "keep_aspect_ratio": false
      }
    }
  },
  {
    "dst": "kitti-veh-tr",
    "src": [
      "$104"
    ],
    "action": "supervisely",
    "settings": {}
  }
]

Detailed description.

100%

Background class

In all previous tutorials at the end of DTL query we explicitly crete object of background class to cover unlabeled pixels. Here we deliberately skip it just to show that all segmentation models before training will automatically create it if it is missed.

But we recomment to create it manually just to keep experiments more clear for your teammates.

Step 3. Train neural network

Basic step by step training guide is here. So it is the same for all models inside Supervisely. Detailed information regarding training configs is here.

UNetV2 weights were initialized from corresponding model from Model Zoo (UNetV2 with VGG weigths that was pretrained on ImageNet).

UnetV2 binary segmentation

For binary segmentation we prefer to use UNetV2 architecture because it works well for class imbalance problem like in this tutorial. This problem is partially solved with additional loss (Dice) we use during training. Also it is fast to train and produces accurate predictions.

Deeplab v3

DeepLab v3 is really deep, complex and "capricious" neural network. Sometimes when we have small training dataset it is hard to train this architecture and achieve high accuracy. Usually this problem is solved by means of long trainings and more complex augmentations.

Resulting model will be named as unet-kitti-veh. Project kitti-veh-tr is used for training.

Training configuration:

{
  "lr": 0.001,
  "epochs": 15,
  "val_every": 0.5,
  "batch_size": {
    "val": 6,
    "train": 14
  },
  "input_size": {
    "width": 256,
    "height": 256
  },
  "gpu_devices": [
    0,
    1,
    2
  ],
  "data_workers": {
    "val": 0,
    "train": 6
  },
  "dataset_tags": {
    "val": "val",
    "train": "train"
  },
  "special_classes": {
    "neutral": "neutral",
    "background": "bg"
  },
  "weights_init_type": "transfer_learning"
}

Training takes 17 minutes on three GPU devices. Here is the loss chart during training:

After training last model checkpoint is saved to "My models" list. But we can see from the training chart that the last checkpoint is not the best in terms of loss and accuracy (red arrow).

To make oneself safe from overfitting we took checkpoint from 13.5 epoch (green arrow). For 13.5 epoch the checkpoint number is 27 because we automatically save checkpoint every 0.5 epoch (field "val_every": 0.5 in training config). How to do it you can find here. We assigned unet-kitti-veh (ckpt 27) name to this restored model.

Monitor training charts and test various checkpoints

We recommend to carefully monitor training charts to prevent overfitting or underfitting. Especially it is very important when we use small training dataset. In this case restoring checkpoints is a key component to successful research.

Step 3. Apply NN to test images.

Basic step by step inference guide is here. So it is the same for all models inside Supervisely. Detailed information regarding inference configs is here.

We apply model unet-kitti-veh (ckpt 27) to project kitti-semseg-test. Resulting project with neural network predictions will be saved as kitti-test-vehunet-inf00.

Inference configuration we used:

{
  "mode": {
    "save": false,
    "source": "sliding_window",
    "window": {
      "width": 370,
      "height": 370
    },
    "min_overlap": {
      "x": 300,
      "y": 0
    }
  },
  "gpu_devices": [
    0
  ],
  "model_classes": {
    "add_suffix": "_dl",
    "save_classes": [
      "vehicle"
    ]
  },
  "existing_objects": {
    "add_suffix": "",
    "save_classes": []
  }
}

What have we done? We applied inference in sliding window mode with big overlap (300 px). Although most of the images has heights == 375, the height of sliding window equals to 370 because there are some test images with height 370.

Also we defined "save_classes": [ "vehicle" ] just to automatically drop background object after inference.

Here is the illustration of all sliding windows were used:

Here are few examples of predictions. They looks cool 😉

Conclusion

For example, in combination with human in the loop approach it allows to create complex pipelines inside Supervisely out of the shelf without coding. Thus Supervisely automates annotation work, routine tasks of data scientists and methodise/organises experiments with Neural Networks. Just download resulting NN weights, get sources from our repository and deploy model to production whatever you want.

Saves developers time

All neural networks inside Supervisely supports all available inference modes thanks to our SDK. Sources are available so developers can customize them for their needs.