Skip to content

Introduction

Data preparation for training takes most of the time of data scientists. In addition, there is a high probability of mistakes while doing such process.

We designed special language that allows to fully automate manipulation with data: merge projects and datasets, make classes mapping, various augmentations of images and annotations, save to different formats and more.

This process is defined with JSON based config file. More technically, it is an array of objects, where each object define one transformation.

When we design neural network we think about it in terms of computational graph. This is the core abstraction behind popular deep learning frameworks. Computational graph consists of math operations and variables.

We developed the powerful DTL that opens up the possibility to configure data manipulation with computational graphs. We can define the sequence of operations that will be applied to each image from selected datasets.

Let me show you a small example to demonstrate the concept:

[
  {
    "action": "data",
    "src": [
      "my_project/*"
    ],
    "dst": "$data",
    "settings": {
      "classes_mapping": "default"
    }
  },
  {
    "action": "flip",
    "src": [
      "$data"
    ],
    "dst": "$data_flip",
    "settings": {
      "axis": "vertical"
    }
  },
  {
    "action": "supervisely",
    "src": [
      "$data",
      "$data_flip"
    ],
    "dst": "result_project",
    "settings": {
    }
  }
]

Here you can see the array of objects. Object is a "Layer" that does an operation with data (image + annotation). We can make the sequence of such layers to define the whole process of data transformation.

Every layer is a JSON-object in the following format:

{
  "action": "...",
  "src": "...",
  "dst": "...",
  "settings": {}
}

Fields description:

  • action — defines the type of operation. For example: "action": "flip" means that every image that is feeded to this layer will be flipped.

  • src — input data

  • dst — output data

  • settings — transformation settings

Here is the graphical representation of computational graph from example:

First layer from example is data layer. It takes all images with their annotations from all datasets in project my_project. Then this data is feeded to flip layer. Flip layer produces flipped versions of images and annotations. The last layer ( "action": "supervisely") creates new project result_project and saves input data (original + flipped).

Placeholders

As you can see from example, we use values like $data or $data_flip in fields src and dst. Don't worry if you haven't seen them before — we neither. It's just a temporal names for variables (aka placeholders). You can choose any name you want. It has to start with symbol $.

The only exceptions are Data and Save layers: some of that layers use fields src and dst as a name of input project or output directory.

How to run Data Transformation in Supervisely

Click "DTL" in top menu. Write json configuration in edit widget and then click "Start" button. You will be automatically moved to a "Tasks" tab there you can see progress, logs, stop task, download final archive or go to generated project.

Primary controls:

  1. Saved configurations. You can save current configuration by selecting "New config..." option from dropdown menu and giving it a name. You can later select it from dropdown menu to load saved configuration into your editor, update it or remove.
  2. Node selection.
  3. "Start" button
  4. Graphical representation of config. You can download it as an .svg file by clicking top right icon. Also you can zoom and shift it with the mouse. If you will click to the purple node (some action), you will be automatically redirected to its json-object in configuation.