This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.


List of annotation formats supported by CVAT.
  • 1:
  • 2:
  • 3:
  • 4:
  • 5:
  • 6:
  • 7:
  • 8:
  • 9:
  • 10:
  • 11:
  • 12:
  • 13:
  • 14:
  • 15:
  • 16:
  • 17:
  • 18:
  • 19:

1 -


This is the native CVAT annotation format. It supports all CVAT annotations features, so it can be used to make data backups.

  • supported annotations CVAT for Images: Rectangles, Polygons, Polylines, Points, Cuboids, Skeletons, Tags, Tracks

  • supported annotations CVAT for Videos: Rectangles, Polygons, Polylines, Points, Cuboids, Skeletons, Tracks

  • attributes are supported

  • Format specification

CVAT for images export

Downloaded file: a ZIP file of the following structure:
├── images/
|   ├── img1.png
|   └── img2.jpg
└── annotations.xml
  • tracks are split by frames

CVAT for videos export

Downloaded file: a ZIP file of the following structure:
├── images/
|   ├── frame_000000.png
|   └── frame_000001.png
└── annotations.xml
  • shapes are exported as single-frame tracks

CVAT loader

Uploaded file: an XML file or a ZIP file of the structures above

2 -

Datumaro format

Datumaro is a tool, which can help with complex dataset and annotation transformations, format conversions, dataset statistics, merging, custom formats etc. It is used as a provider of dataset support in CVAT, so basically, everything possible in CVAT is possible in Datumaro too, but Datumaro can offer dataset operations.

  • supported annotations: any 2D shapes, labels
  • supported attributes: any

Import annotations in Datumaro format

Uploaded file: a zip archive of the following structure:

└── annotations/
    ├── subset1.json # fully description of classes and all dataset items
    └── subset2.json # fully description of classes and all dataset items

JSON annotations files in the annotations directory should have similar structure:

  "info": {},
  "categories": {
    "label": {
      "labels": [
          "name": "label_0",
          "parent": "",
          "attributes": []
          "name": "label_1",
          "parent": "",
          "attributes": []
      "attributes": []
  "items": [
      "id": "img1",
      "annotations": [
          "id": 0,
          "type": "polygon",
          "attributes": {},
          "group": 0,
          "label_id": 1,
          "points": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0],
          "z_order": 0
          "id": 1,
          "type": "bbox",
          "attributes": {},
          "group": 1,
          "label_id": 0,
          "z_order": 0,
          "bbox": [1.0, 2.0, 3.0, 4.0]
          "id": 2,
          "type": "mask",
          "attributes": {},
          "group": 1,
          "label_id": 0,
          "rle": {
            "counts": "d0d0:F\\0",
            "size": [10, 10]
          "z_order": 0

Export annotations in Datumaro format

Downloaded file: a zip archive of the following structure:
├── annotations/
│   └── default.json # fully description of classes and all dataset items
└── images/ # if the option `save images` was selected
    └── default
        ├── image1.jpg
        ├── image2.jpg
        ├── ...

3 -


LabelMe export

Downloaded file: a zip archive of the following structure:
├── img1.jpg
└── img1.xml
  • supported annotations: Rectangles, Polygons (with attributes)

LabelMe import

Uploaded file: a zip archive of the following structure:
├── Masks/
|   ├── img1_mask1.png
|   └── img1_mask2.png
├── img1.xml
├── img2.xml
└── img3.xml
  • supported annotations: Rectangles, Polygons, Masks (as polygons)

4 -

MOT sequence

MOT export

Downloaded file: a zip archive of the following structure:
├── img1/
|   ├── image1.jpg
|   └── image2.jpg
└── gt/
    ├── labels.txt
    └── gt.txt

# labels.txt

# gt.txt
# frame_id, track_id, x, y, w, h, "not ignored", class_id, visibility, <skipped>

  • supported annotations: Rectangle shapes and tracks
  • supported attributes: visibility (number), ignored (checkbox)

MOT import

Uploaded file: a zip archive of the structure above or:
├── labels.txt # optional, mandatory for non-official labels
└── gt.txt
  • supported annotations: Rectangle tracks

5 -


MOTS PNG export

Downloaded file: a zip archive of the following structure:
└── <any_subset_name>/
    |   images/
    |   ├── image1.jpg
    |   └── image2.jpg
    └── instances/
        ├── labels.txt
        ├── image1.png
        └── image2.png

# labels.txt
  • supported annotations: Rectangle and Polygon tracks

MOTS PNG import

Uploaded file: a zip archive of the structure above

  • supported annotations: Polygon tracks

6 -

MS COCO Object Detection

COCO export

Downloaded file: a zip archive with the structure described here

  • supported annotations: Polygons, Rectangles
  • supported attributes:
    • is_crowd (checkbox or integer with values 0 and 1) - specifies that the instance (an object group) should have an RLE-encoded mask in the segmentation field. All the grouped shapes are merged into a single mask, the largest one defines all the object properties
    • score (number) - the annotation score field
    • arbitrary attributes - will be stored in the attributes annotation section

Support for COCO tasks via Datumaro is described here For example, support for COCO keypoints over Datumaro:

  1. Install Datumaro pip install datumaro
  2. Export the task in the Datumaro format, unzip
  3. Export the Datumaro project in coco / coco_person_keypoints formats datum export -f coco -p path/to/project [-- --save-images]

This way, one can export CVAT points as single keypoints or keypoint lists (without the visibility COCO flag).

COCO import

Uploaded file: a single unpacked *.json or a zip archive with the structure described here (without images).

  • supported annotations: Polygons, Rectangles (if the segmentation field is empty)

MS COCO Keypoint Detection

COCO export

Downloaded file: a zip archive with the structure described here

  • supported annotations: Skeletons
  • supported attributes:
    • is_crowd (checkbox or integer with values 0 and 1) - specifies that the instance (an object group) should have an RLE-encoded mask in the segmentation field. All the grouped shapes are merged into a single mask, the largest one defines all the object properties
    • score (number) - the annotation score field
    • arbitrary attributes - will be stored in the attributes annotation section

COCO import

Uploaded file: a single unpacked *.json or a zip archive with the structure described here (without images).

  • supported annotations: Skeletons

How to create a task from MS COCO dataset

  1. Download the MS COCO dataset.

    For example val images and instances annotations

  2. Create a CVAT task with the following labels:

    person bicycle car motorcycle airplane bus train truck boat "traffic light" "fire hydrant" "stop sign" "parking meter" bench bird cat dog horse sheep cow elephant bear zebra giraffe backpack umbrella handbag tie suitcase frisbee skis snowboard "sports ball" kite "baseball bat" "baseball glove" skateboard surfboard "tennis racket" bottle "wine glass" cup fork knife spoon bowl banana apple sandwich orange broccoli carrot "hot dog" pizza donut cake chair couch "potted plant" bed "dining table" toilet tv laptop mouse remote keyboard "cell phone" microwave oven toaster sink refrigerator book clock vase scissors "teddy bear" "hair drier" toothbrush
  3. Select as data (See Creating an annotation task guide for details)

  4. Unpack

  5. click Upload annotation button, choose COCO 1.1 and select instances_val2017.json annotation file. It can take some time.

7 -

Pascal VOC

  • Format specification

  • supported annotations:

    • Rectangles (detection and layout tasks)
    • Tags (action- and classification tasks)
    • Polygons (segmentation task)
  • supported attributes:

    • occluded (both UI option and a separate attribute)
    • truncated and difficult (should be defined for labels as checkbox -es)
    • action attributes (import only, should be defined as checkbox -es)
    • arbitrary attributes (in the attributes section of XML files)

Pascal VOC export

Downloaded file: a zip archive of the following structure:
├── JPEGImages/
│   ├── <image_name1>.jpg
│   ├── <image_name2>.jpg
│   └── <image_nameN>.jpg
├── Annotations/
│   ├── <image_name1>.xml
│   ├── <image_name2>.xml
│   └── <image_nameN>.xml
├── ImageSets/
│   └── Main/
│       └── default.txt
└── labelmap.txt

# labelmap.txt
# label : color_rgb : 'body' parts : actions

Pascal VOC import

Uploaded file: a zip archive of the structure declared above or the following:
├── <image_name1>.xml
├── <image_name2>.xml
└── <image_nameN>.xml

It must be possible for CVAT to match the frame name and file name from annotation .xml file (the filename tag, e. g. <filename>2008_004457.jpg</filename> ).

There are 2 options:

  1. full match between frame name and file name from annotation .xml (in cases when task was created from images or image archive).

  2. match by frame number. File name should be <number>.jpg or frame_000000.jpg. It should be used when task was created from video.

Segmentation mask export

Downloaded file: a zip archive of the following structure:
├── labelmap.txt # optional, required for non-VOC labels
├── ImageSets/
│   └── Segmentation/
│       └── default.txt # list of image names without extension
├── SegmentationClass/ # merged class masks
│   ├── image1.png
│   └── image2.png
└── SegmentationObject/ # merged instance masks
    ├── image1.png
    └── image2.png

# labelmap.txt
# label : color (RGB) : 'body' parts : actions

Mask is a png image with 1 or 3 channels where each pixel has own color which corresponds to a label. Colors are generated following to Pascal VOC algorithm. (0, 0, 0) is used for background by default.

  • supported shapes: Rectangles, Polygons

Segmentation mask import

Uploaded file: a zip archive of the following structure:
  ├── labelmap.txt # optional, required for non-VOC labels
  ├── ImageSets/
  │   └── Segmentation/
  │       └── <any_subset_name>.txt
  ├── SegmentationClass/
  │   ├── image1.png
  │   └── image2.png
  └── SegmentationObject/
      ├── image1.png
      └── image2.png

It is also possible to import grayscale (1-channel) PNG masks. For grayscale masks provide a list of labels with the number of lines equal to the maximum color index on images. The lines must be in the right order so that line index is equal to the color index. Lines can have arbitrary, but different, colors. If there are gaps in the used color indices in the annotations, they must be filled with arbitrary dummy labels. Example:

q:0,128,0:: # color index 0
aeroplane:10,10,128:: # color index 1
_dummy2:2,2,2:: # filler for color index 2
_dummy3:3,3,3:: # filler for color index 3
boat:108,0,100:: # color index 3
_dummy198:198,198,198:: # filler for color index 198
_dummy199:199,199,199:: # filler for color index 199
the last label:12,28,0:: # color index 200
  • supported shapes: Polygons

How to create a task from Pascal VOC dataset

  1. Download the Pascal Voc dataset (Can be downloaded from the PASCAL VOC website)

  2. Create a CVAT task with the following labels:

    aeroplane bicycle bird boat bottle bus car cat chair cow diningtable
    dog horse motorbike person pottedplant sheep sofa train tvmonitor

    You can add ~checkbox=difficult:false ~checkbox=truncated:false attributes for each label if you want to use them.

    Select interesting image files (See Creating an annotation task guide for details)

  3. zip the corresponding annotation files

  4. click Upload annotation button, choose Pascal VOC ZIP 1.1

    and select the zip file with annotations from previous step. It may take some time.

8 -


YOLO export

Downloaded file: a zip archive with following structure:
├── obj.names
├── obj_<subset>_data
│   ├── image1.txt
│   └── image2.txt
└── train.txt # list of subset image paths

# the only valid subsets are: train, valid
# train.txt and valid.txt:

classes = 3 # optional
names = obj.names
train = train.txt
valid = valid.txt # optional
backup = backup/ # optional

# obj.names:

# image_name.txt:
# label_id - id from obj.names
# cx, cy - relative coordinates of the bbox center
# rw, rh - relative size of the bbox
# label_id cx cy rw rh
1 0.3 0.8 0.1 0.3
2 0.7 0.2 0.3 0.1

Each annotation *.txt file has a name that corresponds to the name of the image file (e. g. frame_000001.txt is the annotation for the frame_000001.jpg image). The *.txt file structure: each line describes label and bounding box in the following format label_id cx cy w h. obj.names contains the ordered list of label names.

YOLO import

Uploaded file: a zip archive of the same structure as above It must be possible to match the CVAT frame (image name) and annotation file name. There are 2 options:

  1. full match between image name and name of annotation *.txt file (in cases when a task was created from images or archive of images).

  2. match by frame number (if CVAT cannot match by name). File name should be in the following format <number>.jpg . It should be used when task was created from a video.

How to create a task from YOLO formatted dataset (from VOC for example)

  1. Follow the official guide(see Training YOLO on VOC section) and prepare the YOLO formatted annotation files.

  2. Zip train images

zip -j -@ < train.txt
  1. Create a CVAT task with the following labels:

    aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog
    horse motorbike person pottedplant sheep sofa train tvmonitor

    Select images. zip as data. Most likely you should use share functionality because size of images. zip is more than 500Mb. See Creating an annotation task guide for details.

  2. Create obj.names with the following content:

  3. Zip all label files together (we need to add only label files that correspond to the train subset)

    cat train.txt | while read p; do echo ${p%/*/*}/labels/${${p##*/}%%.*}.txt; done | zip -j -@ obj.names
  4. Click Upload annotation button, choose YOLO 1.1 and select the zip

    file with labels from the previous step.

9 -


TFRecord is a very flexible format, but we try to correspond the format that used in TF object detection with minimal modifications.

Used feature description:

image_feature_description = {
    'image/filename':[], tf.string),
    'image/source_id':[], tf.string),
    'image/height':[], tf.int64),
    'image/width':[], tf.int64),
    # Object boxes and classes.

TFRecord export

Downloaded file: a zip archive with following structure:
├── default.tfrecord
└── label_map.pbtxt

# label_map.pbtxt
item {
	id: 1
	name: 'label_0'
item {
	id: 2
	name: 'label_1'
  • supported annotations: Rectangles, Polygons (as masks, manually over Datumaro)

How to export masks:

  1. Export annotations in Datumaro format
  2. Apply polygons_to_masks and boxes_to_masks transforms
datum transform -t polygons_to_masks -p path/to/proj -o ptm
datum transform -t boxes_to_masks -p ptm -o btm
  1. Export in the TF Detection API format
datum export -f tf_detection_api -p btm [-- --save-images]

TFRecord import

Uploaded file: a zip archive of following structure:
└── <any name>.tfrecord
  • supported annotations: Rectangles

How to create a task from TFRecord dataset (from VOC2007 for example)

  1. Create label_map.pbtxt file with the following content:
item {
    id: 1
    name: 'aeroplane'
item {
    id: 2
    name: 'bicycle'
item {
    id: 3
    name: 'bird'
item {
    id: 4
    name: 'boat'
item {
    id: 5
    name: 'bottle'
item {
    id: 6
    name: 'bus'
item {
    id: 7
    name: 'car'
item {
    id: 8
    name: 'cat'
item {
    id: 9
    name: 'chair'
item {
    id: 10
    name: 'cow'
item {
    id: 11
    name: 'diningtable'
item {
    id: 12
    name: 'dog'
item {
    id: 13
    name: 'horse'
item {
    id: 14
    name: 'motorbike'
item {
    id: 15
    name: 'person'
item {
    id: 16
    name: 'pottedplant'
item {
    id: 17
    name: 'sheep'
item {
    id: 18
    name: 'sofa'
item {
    id: 19
    name: 'train'
item {
    id: 20
    name: 'tvmonitor'
  1. Use

to convert VOC2007 dataset to TFRecord format. As example:

python --data_dir <path to VOCdevkit> --set train --year VOC2007 --output_path pascal.tfrecord --label_map_path label_map.pbtxt
  1. Zip train images

    cat <path to VOCdevkit>/VOC2007/ImageSets/Main/train.txt | while read p; do echo <path to VOCdevkit>/VOC2007/JPEGImages/${p}.jpg  ; done | zip -j -@
  2. Create a CVAT task with the following labels:

    aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor

    Select images. zip as data. See Creating an annotation task guide for details.

  3. Zip pascal.tfrecord and label_map.pbtxt files together

    zip -j <path to pascal.tfrecord> <path to label_map.pbtxt>
  4. Click Upload annotation button, choose TFRecord 1.0 and select the zip file

    with labels from the previous step. It may take some time.

10 -


ImageNet export

Downloaded file: a zip archive of the following structure:

# if we save images:
├── label1/
|   ├── label1_image1.jpg
|   └── label1_image2.jpg
└── label2/
    ├── label2_image1.jpg
    ├── label2_image3.jpg
    └── label2_image4.jpg

# if we keep only annotation:
├── <any_subset_name>.txt
└── synsets.txt

  • supported annotations: Labels

ImageNet import

Uploaded file: a zip archive of the structure above

  • supported annotations: Labels

11 -


WIDER Face export

Downloaded file: a zip archive of the following structure:
├── labels.txt # optional
├── wider_face_split/
│   └── wider_face_<any_subset_name>_bbx_gt.txt
└── WIDER_<any_subset_name>/
    └── images/
        ├── 0--label0/
        │   └── 0_label0_image1.jpg
        └── 1--label1/
            └── 1_label1_image2.jpg
  • supported annotations: Rectangles (with attributes), Labels
  • supported attributes:
    • blur, expression, illumination, pose, invalid
    • occluded (both the annotation property & an attribute)

WIDER Face import

Uploaded file: a zip archive of the structure above

  • supported annotations: Rectangles (with attributes), Labels
  • supported attributes:
    • blur, expression, illumination, occluded, pose, invalid

12 -


CamVid export

Downloaded file: a zip archive of the following structure:
├── labelmap.txt # optional, required for non-CamVid labels
├── <any_subset_name>/
|   ├── image1.png
|   └── image2.png
├── <any_subset_name>annot/
|   ├── image1.png
|   └── image2.png
└── <any_subset_name>.txt

# labelmap.txt
# color (RGB) label
0 0 0 Void
64 128 64 Animal
192 0 128 Archway
0 128 192 Bicyclist
0 128 64 Bridge

Mask is a png image with 1 or 3 channels where each pixel has own color which corresponds to a label. (0, 0, 0) is used for background by default.

  • supported annotations: Rectangles, Polygons

CamVid import

Uploaded file: a zip archive of the structure above

  • supported annotations: Polygons

13 -


VGGFace2 export

Downloaded file: a zip archive of the following structure:
├── labels.txt # optional
├── <any_subset_name>/
|   ├── label0/
|   |   └── image1.jpg
|   └── label1/
|       └── image2.jpg
└── bb_landmark/
    ├── loose_bb_<any_subset_name>.csv
    └── loose_landmark_<any_subset_name>.csv
# labels.txt
# n000001 car
label0 <class0>
label1 <class1>
  • supported annotations: Rectangles, Points (landmarks - groups of 5 points)

VGGFace2 import

Uploaded file: a zip archive of the structure above

  • supported annotations: Rectangles, Points (landmarks - groups of 5 points)

14 -


Market-1501 export

Downloaded file: a zip archive of the following structure:
├── bounding_box_<any_subset_name>/
│   └── image_name_1.jpg
└── query
    ├── image_name_2.jpg
    └── image_name_3.jpg
# if we keep only annotation:
└── images_<any_subset_name>.txt
# images_<any_subset_name>.txt
# image_name = 0001_c1s1_000015_00.jpg
0001 - person id
c1 - camera id (there are totally 6 cameras)
s1 - sequence
000015 - frame number in sequence
00 - means that this bounding box is the first one among the several
  • supported annotations: Label market-1501 with attributes (query, person_id, camera_id)

Market-1501 import

Uploaded file: a zip archive of the structure above

  • supported annotations: Label market-1501 with attributes (query, person_id, camera_id)

15 -


ICDAR13/15 export

Downloaded file: a zip archive of the following structure:

# word recognition task
└── word_recognition/
    └── <any_subset_name>/
        ├── images
        |   ├── word1.png
        |   └── word2.png
        └── gt.txt
# text localization task
└── text_localization/
    └── <any_subset_name>/
        ├── images
        |   ├── img_1.png
        |   └── img_2.png
        ├── gt_img_1.txt
        └── gt_img_1.txt
#text segmentation task
└── text_localization/
    └── <any_subset_name>/
        ├── images
        |   ├── 1.png
        |   └── 2.png
        ├── 1_GT.bmp
        ├── 1_GT.txt
        ├── 2_GT.bmp
        └── 2_GT.txt

Word recognition task:

  • supported annotations: Label icdar with attribute caption

Text localization task:

  • supported annotations: Rectangles and Polygons with label icdar and attribute text

Text segmentation task:

  • supported annotations: Rectangles and Polygons with label icdar and attributes index, text, color, center

ICDAR13/15 import

Uploaded file: a zip archive of the structure above

Word recognition task:

  • supported annotations: Label icdar with attribute caption

Text localization task:

  • supported annotations: Rectangles and Polygons with label icdar and attribute text

Text segmentation task:

  • supported annotations: Rectangles and Polygons with label icdar and attributes index, text, color, center

16 -

Open Images

  • Format specification

  • Supported annotations:

    • Rectangles (detection task)
    • Tags (classification task)
    • Polygons (segmentation task)
  • Supported attributes:

    • Labels

      • score (should be defined for labels as text or number). The confidence level from 0 to 1.
    • Bounding boxes

      • score (should be defined for labels as text or number). The confidence level from 0 to 1.
      • occluded (both UI option and a separate attribute). Whether the object is occluded by another object.
      • truncated (should be defined for labels as checkbox -es). Whether the object extends beyond the boundary of the image.
      • is_group_of (should be defined for labels as checkbox -es). Whether the object represents a group of objects of the same class.
      • is_depiction (should be defined for labels as checkbox -es). Whether the object is a depiction (such as a drawing) rather than a real object.
      • is_inside (should be defined for labels as checkbox -es). Whether the object is seen from the inside.
    • Masks

      • box_id (should be defined for labels as text). An identifier for the bounding box associated with the mask.
      • predicted_iou (should be defined for labels as text or number). Predicted IoU value with respect to the ground truth.

Open Images export

Downloaded file: a zip archive of the following structure:

    ├── annotations/
    │   ├── bbox_labels_600_hierarchy.json
    │   ├── class-descriptions.csv
    |   ├── images.meta  # additional file with information about image sizes
    │   ├── <subset_name>-image_ids_and_rotation.csv
    │   ├── <subset_name>-annotations-bbox.csv
    │   ├── <subset_name>-annotations-human-imagelabels.csv
    │   └── <subset_name>-annotations-object-segmentation.csv
    ├── images/
    │   ├── subset1/
    │   │   ├── <image_name101.jpg>
    │   │   ├── <image_name102.jpg>
    │   │   └── ...
    │   ├── subset2/
    │   │   ├── <image_name201.jpg>
    │   │   ├── <image_name202.jpg>
    │   │   └── ...
    |   ├── ...
    └── masks/
        ├── subset1/
        │   ├── <mask_name101.png>
        │   ├── <mask_name102.png>
        │   └── ...
        ├── subset2/
        │   ├── <mask_name201.png>
        │   ├── <mask_name202.png>
        │   └── ...
        ├── ...

Open Images import

Uploaded file: a zip archive of the following structure:

    ├── annotations/
    │   ├── bbox_labels_600_hierarchy.json
    │   ├── class-descriptions.csv
    |   ├── images.meta  # optional, file with information about image sizes
    │   ├── <subset_name>-image_ids_and_rotation.csv
    │   ├── <subset_name>-annotations-bbox.csv
    │   ├── <subset_name>-annotations-human-imagelabels.csv
    │   └── <subset_name>-annotations-object-segmentation.csv
    └── masks/
        ├── subset1/
        │   ├── <mask_name101.png>
        │   ├── <mask_name102.png>
        │   └── ...
        ├── subset2/
        │   ├── <mask_name201.png>
        │   ├── <mask_name202.png>
        │   └── ...
        ├── ...

Image ids in the <subset_name>-image_ids_and_rotation.csv should match with image names in the task.

17 -


  • Format specification

  • Supported annotations

    • Polygons (segmentation task)
  • Supported attributes

    • ‘is_crowd’ (boolean, should be defined for labels as checkbox -es) Specifies if the annotation label can distinguish between different instances. If False, the annotation id field encodes the instance id.

Cityscapes export

Downloaded file: a zip archive of the following structure:

├── label_color.txt
├── gtFine
│   ├── <subset_name>
│   │   └── <city_name>
│   │       ├── image_0_gtFine_instanceIds.png
│   │       ├── image_0_gtFine_color.png
│   │       ├── image_0_gtFine_labelIds.png
│   │       ├── image_1_gtFine_instanceIds.png
│   │       ├── image_1_gtFine_color.png
│   │       ├── image_1_gtFine_labelIds.png
│   │       ├── ...
└── imgsFine  # if saving images was requested
    └── leftImg8bit
        ├── <subset_name>
        │   └── <city_name>
        │       ├── image_0_leftImg8bit.png
        │       ├── image_1_leftImg8bit.png
        │       ├── ...
  • label_color.txt a file that describes the color for each label
# label_color.txt example
# r g b label_name
0 0 0 background
0 255 0 tree
  • *_gtFine_color.png class labels encoded by its color.
  • *_gtFine_labelIds.png class labels are encoded by its index.
  • *_gtFine_instanceIds.png class and instance labels encoded by an instance ID. The pixel values encode class and the individual instance: the integer part of a division by 1000 of each ID provides class ID, the remainder is the instance ID. If a certain annotation describes multiple instances, then the pixels have the regular ID of that class

Cityscapes annotations import

Uploaded file: a zip archive with the following structure:

├── label_color.txt # optional
└── gtFine
    └── <city_name>
        ├── image_0_gtFine_instanceIds.png
        ├── image_1_gtFine_instanceIds.png
        ├── ...

Creating task with Cityscapes dataset

Create a task with the labels you need or you can use the labels and colors of the original dataset. To work with the Cityscapes format, you must have a black color label for the background.

Original Cityscapes color map:

    {"name": "unlabeled", "color": "#000000", "attributes": []},
    {"name": "egovehicle", "color": "#000000", "attributes": []},
    {"name": "rectificationborder", "color": "#000000", "attributes": []},
    {"name": "outofroi", "color": "#000000", "attributes": []},
    {"name": "static", "color": "#000000", "attributes": []},
    {"name": "dynamic", "color": "#6f4a00", "attributes": []},
    {"name": "ground", "color": "#510051", "attributes": []},
    {"name": "road", "color": "#804080", "attributes": []},
    {"name": "sidewalk", "color": "#f423e8", "attributes": []},
    {"name": "parking", "color": "#faaaa0", "attributes": []},
    {"name": "railtrack", "color": "#e6968c", "attributes": []},
    {"name": "building", "color": "#464646", "attributes": []},
    {"name": "wall", "color": "#66669c", "attributes": []},
    {"name": "fence", "color": "#be9999", "attributes": []},
    {"name": "guardrail", "color": "#b4a5b4", "attributes": []},
    {"name": "bridge", "color": "#966464", "attributes": []},
    {"name": "tunnel", "color": "#96785a", "attributes": []},
    {"name": "pole", "color": "#999999", "attributes": []},
    {"name": "polegroup", "color": "#999999", "attributes": []},
    {"name": "trafficlight", "color": "#faaa1e", "attributes": []},
    {"name": "trafficsign", "color": "#dcdc00", "attributes": []},
    {"name": "vegetation", "color": "#6b8e23", "attributes": []},
    {"name": "terrain", "color": "#98fb98", "attributes": []},
    {"name": "sky", "color": "#4682b4", "attributes": []},
    {"name": "person", "color": "#dc143c", "attributes": []},
    {"name": "rider", "color": "#ff0000", "attributes": []},
    {"name": "car", "color": "#00008e", "attributes": []},
    {"name": "truck", "color": "#000046", "attributes": []},
    {"name": "bus", "color": "#003c64", "attributes": []},
    {"name": "caravan", "color": "#00005a", "attributes": []},
    {"name": "trailer", "color": "#00006e", "attributes": []},
    {"name": "train", "color": "#005064", "attributes": []},
    {"name": "motorcycle", "color": "#0000e6", "attributes": []},
    {"name": "bicycle", "color": "#770b20", "attributes": []},
    {"name": "licenseplate", "color": "#00000e", "attributes": []}

Upload images when creating a task:
    ├── image_0.jpg
    ├── image_1.jpg
    ├── ...

After creating the task, upload the Cityscapes annotations as described in the previous section.

18 -


  • Format specification for KITTI detection

  • Format specification for KITTI segmentation

  • supported annotations:

    • Rectangles (detection task)
    • Polygon (segmentation task)
  • supported attributes:

    • occluded (both UI option and a separate attribute). Indicates that a significant portion of the object within the bounding box is occluded by another object
    • truncated supported only for rectangles (should be defined for labels as checkbox -es). Indicates that the bounding box specified for the object does not correspond to the full extent of the object
    • ‘is_crowd’ supported only for polygons (should be defined for labels as checkbox -es). Indicates that the annotation covers multiple instances of the same class

KITTI annotations export

Downloaded file: a zip archive of the following structure:

    ├── label_colors.txt # list of pairs r g b label_name
    ├── labels.txt # list of labels
    └── default/
        ├── label_2/ # left color camera label files
        │   ├── <image_name_1>.txt
        │   ├── <image_name_2>.txt
        │   └── ...
        ├── instance/ # instance segmentation masks
        │   ├── <image_name_1>.png
        │   ├── <image_name_2>.png
        │   └── ...
        ├── semantic/ # semantic segmentation masks (labels are encoded by its id)
        │   ├── <image_name_1>.png
        │   ├── <image_name_2>.png
        │   └── ...
        └── semantic_rgb/ # semantic segmentation masks (labels are encoded by its color)
            ├── <image_name_1>.png
            ├── <image_name_2>.png
            └── ...

KITTI annotations import

You can upload KITTI annotations in two ways: rectangles for the detection task and masks for the segmentation task.

For detection tasks the uploading archive should have the following structure:

    ├── labels.txt # optional, labels list for non-original detection labels
    └── <subset_name>/
        ├── label_2/ # left color camera label files
        │   ├── <image_name_1>.txt
        │   ├── <image_name_2>.txt
        │   └── ...

For segmentation tasks the uploading archive should have the following structure:

    ├── label_colors.txt # optional, color map for non-original segmentation labels
    └── <subset_name>/
        ├── instance/ # instance segmentation masks
        │   ├── <image_name_1>.png
        │   ├── <image_name_2>.png
        │   └── ...
        ├── semantic/ # optional, semantic segmentation masks (labels are encoded by its id)
        │   ├── <image_name_1>.png
        │   ├── <image_name_2>.png
        │   └── ...
        └── semantic_rgb/ # optional, semantic segmentation masks (labels are encoded by its color)
            ├── <image_name_1>.png
            ├── <image_name_2>.png
            └── ...

All annotation files and masks should have structures that are described in the original format specification.

19 -


  • Format specification available here

  • Supported annotations: tags, points.

  • Supported attributes:

    • negative_pairs (should be defined for labels as text): list of image names with mismatched persons.
    • positive_pairs (should be defined for labels as text): list of image names with matched persons.

Import LFW annotation

The uploaded annotations file should be a zip file with the following structure:

    └── annotations/
        ├── landmarks.txt # list with landmark points for each image
        ├── pairs.txt # list of matched and mismatched pairs of person
        └── people.txt # optional file with a list of persons name

Full information about the content of annotation files is available here

Export LFW annotation

Downloaded file: a zip archive of the following structure:

    └── images/ # if the option save images was selected
    │    ├── name1/
    │    │   ├── name1_0001.jpg
    │    │   ├── name1_0002.jpg
    │    │   ├── ...
    │    ├── name2/
    │    │   ├── name2_0001.jpg
    │    │   ├── name2_0002.jpg
    │    │   ├── ...
    │    ├── ...
    ├── landmarks.txt
    ├── pairs.txt
    └── people.txt

Example: create task with images and upload LFW annotations into it

This is one of the possible ways to create a task and add LFW annotations for it.

  • On the task creation page:
    • Add labels that correspond to the names of the persons.
    • For each label define text attributes with names positive_pairs and negative_pairs
    • Add images using zip archive from local repository:
    ├── name1_0001.jpg
    ├── name1_0002.jpg
    ├── ...
    ├── name1_<N>.jpg
    ├── name2_0001.jpg
    ├── ...
  • On the annotation page: Upload annotation -> LFW 1.0 -> choose archive with structure that described in the import section.