Skip to content

Smart Tool

Introduction

One of the best properties of annotation Smart tool is that it can be tuned to your particular tasks.

If

  • you need an easier annotation of concrete classes, or
  • you work with objects of unusual shape or appearance, or
  • you just want to make Smart tool even smarter

— you can achieve it with training of Smart tool.

Hooray (Original ^^)

The tool is a neural network model, and in Supervisely it may be trained like other models. This tutorial will show you how to train Smart tool to obtain simple class-agnostic model which is smart enough.

Data preparation

Datasets

To get class-agnostic model it's needed to train it with large amount of objects and part of objects from different classes. We will use the following datasets as source:

Class variety

To achieve even more class ignorance one may include other datasets with different classes, or even generate some artifical datasets.

Training samples

Our goal is to construct training samples with the following figures:

  • Target object or area to recognize:
    • We'll name it as obj class and represent as a bitmap
    • Each training sample must contain one target object
  • Positive and negative points, which are imitating human feedback during annotation:
    • pos and neg classes correspondingly, with point type

TODO: add screenshot of some sample with obj, pos & neg

We wish the model "understand" different objects, edges and shapes. It shall perform reasonable segmentation of dominant object on input image. However, the model must be obedient and correct the segmentation considering user feedback, i.e. positive and negative zones.

Pascal VOC

To generate training samples from Pascal VOC use the config below.

The following actions will be performed:

  1. Crop each source object (excluding the tiny ones) with large padding. Note that the object may be placed not in the center of result image due to image borders.
  2. Mark samples randomly with train and val tags.
  3. Get some number of random crops from each obtained sample (maybe flipped). This is performed because we don't really want to get many samples with clearly distinguishable dominating object. Instead of this we generate "hard" samples on which target objects are shifted randomly and are placed side-by-side with other objects.
  4. Exclude samples where target objects are small or missing. Resize images to required size (model input size)
  5. Generate positive and negative points randomly for each sample. Positive points will be placed on target obj, and negative ones on the rest of sample image.
[
  {
    "src": [
      "PascalVOC/*"
    ],
    "dst": "$sample-pascal",
    "action": "data",
    "settings": {
      "classes_mapping": {
        "neutral": "unused",
        "__other__": "obj"
      }
    }
  },
  {
    "src": [
      "$sample-pascal"
    ],
    "dst": "$100",
    "action": "objects_filter",
    "settings": {
      "filter_by": {
        "polygon_sizes": {
          "filtering_classes": [
            "obj"
          ],
          "comparator": "less",
          "area_size": {
            "percent": 1
          },
          "action": "delete"
        }
      }
    }
  },
  {
    "src": [
      "$100"
    ],
    "dst": "$101",
    "action": "instances_crop",
    "settings": {
      "classes": [
        "obj"
      ],
      "pad": {
        "sides": {
          "left": "200%",
          "right": "200%",
          "top": "200%",
          "bottom": "200%"
        }
      }
    }
  },
  {
    "src": [
      "$101"
    ],
    "dst": [
      "$102-val",
      "$102-train"
    ],
    "action": "if",
    "settings": {
      "condition": {
        "probability": 0.05
      }
    }
  },
  {
    "src": [
      "$102-val"
    ],
    "dst": "$103-val",
    "action": "tag",
    "settings": {
      "tag": "val",
      "action": "add"
    }
  },
  {
    "src": [
      "$102-train"
    ],
    "dst": "$103-train",
    "action": "tag",
    "settings": {
      "tag": "train",
      "action": "add"
    }
  },
  {
    "src": [
      "$103-val",
      "$103-train"
    ],
    "dst": "$104",
    "action": "flip",
    "settings": {
      "axis": "vertical"
    }
  },
  {
    "src": [
      "$103-val",
      "$103-train",
      "$104"
    ],
    "dst": "$105",
    "action": "multiply",
    "settings": {
      "multiply": 3
    }
  },
  {
    "src": [
      "$105"
    ],
    "dst": "$106",
    "action": "crop",
    "settings": {
      "random_part": {
        "height": {
          "min_percent": 15,
          "max_percent": 95
        },
        "width": {
          "min_percent": 15,
          "max_percent": 95
        }
      }
    }
  },
  {
    "src": [
      "$106"
    ],
    "dst": [
      "$107",
      "$107-unused"
    ],
    "action": "if",
    "settings": {
      "condition": {
        "min_height": 50
      }
    }
  },
  {
    "src": [
      "$107"
    ],
    "dst": [
      "$108",
      "$108-unused"
    ],
    "action": "if",
    "settings": {
      "condition": {
        "min_width": 50
      }
    }
  },
  {
    "src": [
      "$108"
    ],
    "dst": "$109",
    "action": "resize",
    "settings": {
      "width": 256,
      "height": 256,
      "aspect_ratio": {
        "keep": false
      }
    }
  },
  {
    "src": [
      "$109"
    ],
    "dst": "$110",
    "action": "objects_filter",
    "settings": {
      "filter_by": {
        "polygon_sizes": {
          "filtering_classes": [
            "obj"
          ],
          "comparator": "less",
          "area_size": {
            "percent": 15
          },
          "action": "delete"
        }
      }
    }
  },
  {
    "src": [
      "$110"
    ],
    "dst": [
      "$111",
      "$111-unused"
    ],
    "action": "if",
    "settings": {
      "condition": {
        "min_objects_count": 1
      }
    }
  },
  {
    "src": [
      "$111"
    ],
    "dst": "$112",
    "action": "generate_hints",
    "settings": {
      "class": "obj",
      "positive_class": "pos",
      "negative_class": "neg",
      "min_points_number": 1
    }
  },
  {
    "src": [
      "$112"
    ],
    "dst": "smtool_dataset",
    "action": "supervisely",
    "settings": {}
  }
]

Mapillary

To generate training samples from Mapillary use the config below.

The data transformation acts like one for Pascal VOC with the two distinctions:

  1. Not all source classes are used. Some classes in Mapillary (e.g. Terrain) represent "image-level" segmentation figures, and other may be too complicated for our training. The mapping is defined in Data layer.
  2. Heavy class imbalance should be eliminated, otherwise the model will be trained mostly on images of cars and traffic signs. Separate "branch" in DTL config is needed to crop frequent objects and to drop the most part of them by random.
[
  {
    "src": [
      "Mapillary/*"
    ],
    "dst": "$sample-mapill",
    "action": "data",
    "settings": {
      "classes_mapping": {
        "Trailer": "obj",
        "Guard Rail": "obj",
        "Bench": "obj",
        "Trash Can": "obj",
        "Fire Hydrant": "obj",
        "Junction Box": "obj",
        "On Rails": "obj",
        "Bike Rack": "obj",
        "Ground Animal": "obj",
        "Phone Booth": "obj",
        "Bus": "obj",
        "Bicyclist": "obj",
        "Mailbox": "obj",
        "Manhole": "obj",
        "Traffic Sign (Back)": "obj_Freq",
        "Banner": "obj",
        "Other Vehicle": "obj",
        "Other Rider": "obj",
        "Motorcyclist": "obj",
        "Street Light": "obj",
        "Catch Basin": "obj",
        "Motorcycle": "obj",
        "Person": "obj",
        "Traffic Sign (Front)": "obj_Freq",
        "Car": "obj_Freq",
        "Caravan": "obj",
        "Billboard": "obj_Freq",
        "Truck": "obj",
        "Boat": "obj",
        "Wheeled Slow": "obj",
        "__other__": "unused"
      }
    }
  },
  {
    "src": [
      "$sample-mapill"
    ],
    "dst": "$100",
    "action": "objects_filter",
    "settings": {
      "filter_by": {
        "polygon_sizes": {
          "filtering_classes": [
            "obj",
            "obj_Car"
          ],
          "comparator": "less",
          "area_size": {
            "percent": 0.1
          },
          "action": "delete"
        }
      }
    }
  },
  {
    "src": [
      "$100"
    ],
    "dst": "$101-obj",
    "action": "instances_crop",
    "settings": {
      "classes": [
        "obj"
      ],
      "pad": {
        "sides": {
          "left": "200%",
          "right": "200%",
          "top": "200%",
          "bottom": "200%"
        }
      }
    }
  },
  {
    "src": [
      "$100"
    ],
    "dst": "$101-obj_Freq",
    "action": "instances_crop",
    "settings": {
      "classes": [
        "obj_Freq"
      ],
      "pad": {
        "sides": {
          "left": "200%",
          "right": "200%",
          "top": "200%",
          "bottom": "200%"
        }
      }
    }
  },
  {
    "src": [
      "$101-obj_Freq"
    ],
    "dst": [
      "$101-obj_Freq-selected",
      "$102-obj_Freq-unused"
    ],
    "action": "if",
    "settings": {
      "condition": {
        "probability": 0.1
      }
    }
  },
  {
    "src": [
      "$101-obj_Freq-selected"
    ],
    "dst": "$101-obj_Freq-renamed",
    "action": "rename",
    "settings": {
      "classes_mapping": {
        "obj_Freq": "obj"
      }
    }
  },
  {
    "src": [
      "$101-obj",
      "$101-obj_Freq-renamed"
    ],
    "dst": [
      "$102-val",
      "$102-train"
    ],
    "action": "if",
    "settings": {
      "condition": {
        "probability": 0.05
      }
    }
  },
  {
    "src": [
      "$102-val"
    ],
    "dst": "$103-val",
    "action": "tag",
    "settings": {
      "tag": "val",
      "action": "add"
    }
  },
  {
    "src": [
      "$102-train"
    ],
    "dst": "$103-train",
    "action": "tag",
    "settings": {
      "tag": "train",
      "action": "add"
    }
  },
  {
    "src": [
      "$103-val",
      "$103-train"
    ],
    "dst": "$104",
    "action": "flip",
    "settings": {
      "axis": "vertical"
    }
  },
  {
    "src": [
      "$103-val",
      "$103-train",
      "$104"
    ],
    "dst": "$105",
    "action": "multiply",
    "settings": {
      "multiply": 3
    }
  },
  {
    "src": [
      "$105"
    ],
    "dst": "$106",
    "action": "crop",
    "settings": {
      "random_part": {
        "height": {
          "min_percent": 15,
          "max_percent": 95
        },
        "width": {
          "min_percent": 15,
          "max_percent": 95
        }
      }
    }
  },
  {
    "src": [
      "$106"
    ],
    "dst": [
      "$107",
      "$107-unused"
    ],
    "action": "if",
    "settings": {
      "condition": {
        "min_height": 50
      }
    }
  },
  {
    "src": [
      "$107"
    ],
    "dst": [
      "$108",
      "$108-unused"
    ],
    "action": "if",
    "settings": {
      "condition": {
        "min_width": 50
      }
    }
  },
  {
    "src": [
      "$108"
    ],
    "dst": "$109",
    "action": "resize",
    "settings": {
      "width": 256,
      "height": 256,
      "aspect_ratio": {
        "keep": false
      }
    }
  },
  {
    "src": [
      "$109"
    ],
    "dst": "$110",
    "action": "objects_filter",
    "settings": {
      "filter_by": {
        "polygon_sizes": {
          "filtering_classes": [
            "obj"
          ],
          "comparator": "less",
          "area_size": {
            "percent": 15
          },
          "action": "delete"
        }
      }
    }
  },
  {
    "src": [
      "$110"
    ],
    "dst": [
      "$111",
      "$111-unused"
    ],
    "action": "if",
    "settings": {
      "condition": {
        "min_objects_count": 1
      }
    }
  },
  {
    "src": [
      "$111"
    ],
    "dst": "$112",
    "action": "generate_hints",
    "settings": {
      "class": "obj",
      "positive_class": "pos",
      "negative_class": "neg",
      "min_points_number": 1
    }
  },
  {
    "src": [
      "$112"
    ],
    "dst": "smtool_dataset",
    "action": "supervisely",
    "settings": {}
  }
]

Training

Train

With prepared data we may train Smart tool like any other model.

Consider using pre-trained model to continue training. One may be found in Model Zoo.

Here is the example of training config:

{
  "input_size": {
    "width": 256,
    "height": 256
  },
  "batch_size": {
    "train": 16,
    "val": 16
  },
  "data_workers": {
    "train": 3,
    "val": 0
  },
  "epochs": 30,
  "val_every": 1,
  "lr": 0.1,
  "momentum": 0.9,
  "lr_decreasing": {
      "patience": 10,
      "lr_divisor": 5
  },
  "loss_weights": {
    "main_class": 1,
    "points": 0.5
  },
  "special_classes": {
    "foreground": "obj",
    "background": "bg",
    "neutral": "neutral",
    "pos": "pos",
    "neg": "neg"
  },
  "weights_init_type": "continue_training",
  "gpu_devices": [
    0,
    1,
    2,
    3
  ]
}

Note that there are special_classes for SmartTool:

  • foreground means name of "main" class in input project (model performs binary segmentation, object-vs-background)
  • pos and neg must be points

Test

Inference can be run on some existing project/dataset like another model. Provide project with points of corresponding "pos" and "neg" classes to look at model reaction.

Here is the example of inference config:

{
  "gpu_devices": [
    0
  ],
  "model_classes": {
    "save_classes": [
      "obj"
    ],
    "add_suffix": "_smtool"
  },
  "existing_objects": {
    "save_classes": [
      "obj",
      "neg",
      "pos"
    ],
    "add_suffix": ""
  }
}

Test interactively

Please see the following guide to know how to use the trained SmartTool.