This the multi-page printable view of this section. Click here to print.
Formats
1 -
CVAT
This is the native CVAT annotation format. It supports all CVAT annotations features, so it can be used to make data backups.
-
supported annotations CVAT for Images: Rectangles, Polygons, Polylines, Points, Cuboids, Tags, Tracks
-
supported annotations CVAT for Videos: Rectangles, Polygons, Polylines, Points, Cuboids, Tracks
-
attributes are supported
CVAT for images export
Downloaded file: a ZIP file of the following structure:
- tracks are split by frames
CVAT for videos export
Downloaded file: a ZIP file of the following structure:
- shapes are exported as single-frame tracks
CVAT loader
Uploaded file: an XML file or a ZIP file of the structures above
2 -
Datumaro format
Datumaro is a tool, which can help with complex dataset and annotation transformations, format conversions, dataset statistics, merging, custom formats etc. It is used as a provider of dataset support in CVAT, so basically, everything possible in CVAT is possible in Datumaro too, but Datumaro can offer dataset operations.
- supported annotations: any 2D shapes, labels
- supported attributes: any
Import annotations in Datumaro format
Uploaded file: a zip archive of the following structure:
JSON annotations files in the annotations
directory should have similar structure:
Export annotations in Datumaro format
Downloaded file: a zip archive of the following structure:
3 -
LabelMe
LabelMe export
Downloaded file: a zip archive of the following structure:
- supported annotations: Rectangles, Polygons (with attributes)
LabelMe import
Uploaded file: a zip archive of the following structure:
- supported annotations: Rectangles, Polygons, Masks (as polygons)
4 -
MOT sequence
MOT export
Downloaded file: a zip archive of the following structure:
- supported annotations: Rectangle shapes and tracks
- supported attributes:
visibility
(number),ignored
(checkbox)
MOT import
Uploaded file: a zip archive of the structure above or:
- supported annotations: Rectangle tracks
5 -
MOTS PNG
MOTS PNG export
Downloaded file: a zip archive of the following structure:
- supported annotations: Rectangle and Polygon tracks
MOTS PNG import
Uploaded file: a zip archive of the structure above
- supported annotations: Polygon tracks
6 -
MS COCO Object Detection
COCO export
Downloaded file: a zip archive with the structure described here
- supported annotations: Polygons, Rectangles
- supported attributes:
is_crowd
(checkbox or integer with values 0 and 1) - specifies that the instance (an object group) should have an RLE-encoded mask in thesegmentation
field. All the grouped shapes are merged into a single mask, the largest one defines all the object propertiesscore
(number) - the annotationscore
field- arbitrary attributes - will be stored in the
attributes
annotation section
Support for COCO tasks via Datumaro is described here For example, support for COCO keypoints over Datumaro:
- Install Datumaro
pip install datumaro
- Export the task in the
Datumaro
format, unzip - Export the Datumaro project in
coco
/coco_person_keypoints
formatsdatum export -f coco -p path/to/project [-- --save-images]
This way, one can export CVAT points as single keypoints or
keypoint lists (without the visibility
COCO flag).
COCO import
Uploaded file: a single unpacked *.json
or a zip archive with the structure described
here
(without images).
- supported annotations: Polygons, Rectangles (if the
segmentation
field is empty)
MS COCO Keypoint Detection
COCO export
Downloaded file: a zip archive with the structure described here
- supported annotations: Skeletons
- supported attributes:
is_crowd
(checkbox or integer with values 0 and 1) - specifies that the instance (an object group) should have an RLE-encoded mask in thesegmentation
field. All the grouped shapes are merged into a single mask, the largest one defines all the object propertiesscore
(number) - the annotationscore
field- arbitrary attributes - will be stored in the
attributes
annotation section
COCO import
Uploaded file: a single unpacked *.json
or a zip archive with the structure described
here
(without images).
- supported annotations: Skeletons
How to create a task from MS COCO dataset
-
Download the MS COCO dataset.
For example
val images
andinstances
annotations -
Create a CVAT task with the following labels:
-
Select
val2017.zip
as data (See Creating an annotation task guide for details) -
Unpack
annotations_trainval2017.zip
-
click
Upload annotation
button, chooseCOCO 1.1
and selectinstances_val2017.json
annotation file. It can take some time.
7 -
Pascal VOC
-
supported annotations:
- Rectangles (detection and layout tasks)
- Tags (action- and classification tasks)
- Polygons (segmentation task)
-
supported attributes:
occluded
(both UI option and a separate attribute)truncated
anddifficult
(should be defined for labels ascheckbox
-es)- action attributes (import only, should be defined as
checkbox
-es) - arbitrary attributes (in the
attributes
section of XML files)
Pascal VOC export
Downloaded file: a zip archive of the following structure:
Pascal VOC import
Uploaded file: a zip archive of the structure declared above or the following:
It must be possible for CVAT to match the frame name and file name
from annotation .xml
file (the filename
tag, e. g.
<filename>2008_004457.jpg</filename>
).
There are 2 options:
-
full match between frame name and file name from annotation
.xml
(in cases when task was created from images or image archive). -
match by frame number. File name should be
<number>.jpg
orframe_000000.jpg
. It should be used when task was created from video.
Segmentation mask export
Downloaded file: a zip archive of the following structure:
Mask is a png
image with 1 or 3 channels where each pixel
has own color which corresponds to a label.
Colors are generated following to Pascal VOC algorithm.
(0, 0, 0)
is used for background by default.
- supported shapes: Rectangles, Polygons
Segmentation mask import
Uploaded file: a zip archive of the following structure:
It is also possible to import grayscale (1-channel) PNG masks. For grayscale masks provide a list of labels with the number of lines equal to the maximum color index on images. The lines must be in the right order so that line index is equal to the color index. Lines can have arbitrary, but different, colors. If there are gaps in the used color indices in the annotations, they must be filled with arbitrary dummy labels. Example:
q:0,128,0:: # color index 0
aeroplane:10,10,128:: # color index 1
_dummy2:2,2,2:: # filler for color index 2
_dummy3:3,3,3:: # filler for color index 3
boat:108,0,100:: # color index 3
...
_dummy198:198,198,198:: # filler for color index 198
_dummy199:199,199,199:: # filler for color index 199
...
the last label:12,28,0:: # color index 200
- supported shapes: Polygons
How to create a task from Pascal VOC dataset
-
Download the Pascal Voc dataset (Can be downloaded from the PASCAL VOC website)
-
Create a CVAT task with the following labels:
You can add
~checkbox=difficult:false ~checkbox=truncated:false
attributes for each label if you want to use them.Select interesting image files (See Creating an annotation task guide for details)
-
zip the corresponding annotation files
-
click
Upload annotation
button, choosePascal VOC ZIP 1.1
and select the zip file with annotations from previous step. It may take some time.
8 -
YOLO
- Format specification
- supported annotations: Rectangles
YOLO export
Downloaded file: a zip archive with following structure:
Each annotation *.txt
file has a name that corresponds to the name of
the image file (e. g. frame_000001.txt
is the annotation
for the frame_000001.jpg
image).
The *.txt
file structure: each line describes label and bounding box
in the following format label_id cx cy w h
.
obj.names
contains the ordered list of label names.
YOLO import
Uploaded file: a zip archive of the same structure as above It must be possible to match the CVAT frame (image name) and annotation file name. There are 2 options:
-
full match between image name and name of annotation
*.txt
file (in cases when a task was created from images or archive of images). -
match by frame number (if CVAT cannot match by name). File name should be in the following format
<number>.jpg
. It should be used when task was created from a video.
How to create a task from YOLO formatted dataset (from VOC for example)
-
Follow the official guide(see Training YOLO on VOC section) and prepare the YOLO formatted annotation files.
-
Zip train images
-
Create a CVAT task with the following labels:
Select images. zip as data. Most likely you should use
share
functionality because size of images. zip is more than 500Mb. See Creating an annotation task guide for details. -
Create
obj.names
with the following content: -
Zip all label files together (we need to add only label files that correspond to the train subset)
-
Click
Upload annotation
button, chooseYOLO 1.1
and select the zipfile with labels from the previous step.
9 -
TFRecord
TFRecord is a very flexible format, but we try to correspond the format that used in TF object detection with minimal modifications.
Used feature description:
TFRecord export
Downloaded file: a zip archive with following structure:
- supported annotations: Rectangles, Polygons (as masks, manually over Datumaro)
How to export masks:
- Export annotations in
Datumaro
format - Apply
polygons_to_masks
andboxes_to_masks
transforms
- Export in the
TF Detection API
format
TFRecord import
Uploaded file: a zip archive of following structure:
- supported annotations: Rectangles
How to create a task from TFRecord dataset (from VOC2007 for example)
- Create
label_map.pbtxt
file with the following content:
to convert VOC2007 dataset to TFRecord format. As example:
-
Zip train images
-
Create a CVAT task with the following labels:
Select images. zip as data. See Creating an annotation task guide for details.
-
Zip
pascal.tfrecord
andlabel_map.pbtxt
files together -
Click
Upload annotation
button, chooseTFRecord 1.0
and select the zip filewith labels from the previous step. It may take some time.
10 -
ImageNet
ImageNet export
Downloaded file: a zip archive of the following structure:
- supported annotations: Labels
ImageNet import
Uploaded file: a zip archive of the structure above
- supported annotations: Labels
11 -
WIDER Face
WIDER Face export
Downloaded file: a zip archive of the following structure:
- supported annotations: Rectangles (with attributes), Labels
- supported attributes:
blur
,expression
,illumination
,pose
,invalid
occluded
(both the annotation property & an attribute)
WIDER Face import
Uploaded file: a zip archive of the structure above
- supported annotations: Rectangles (with attributes), Labels
- supported attributes:
blur
,expression
,illumination
,occluded
,pose
,invalid
12 -
CamVid
CamVid export
Downloaded file: a zip archive of the following structure:
Mask is a png
image with 1 or 3 channels where each pixel
has own color which corresponds to a label.
(0, 0, 0)
is used for background by default.
- supported annotations: Rectangles, Polygons
CamVid import
Uploaded file: a zip archive of the structure above
- supported annotations: Polygons
13 -
VGGFace2
VGGFace2 export
Downloaded file: a zip archive of the following structure:
- supported annotations: Rectangles, Points (landmarks - groups of 5 points)
VGGFace2 import
Uploaded file: a zip archive of the structure above
- supported annotations: Rectangles, Points (landmarks - groups of 5 points)
14 -
Market-1501
Market-1501 export
Downloaded file: a zip archive of the following structure:
- supported annotations: Label
market-1501
with attributes (query
,person_id
,camera_id
)
Market-1501 import
Uploaded file: a zip archive of the structure above
- supported annotations: Label
market-1501
with attributes (query
,person_id
,camera_id
)
15 -
ICDAR13/15
ICDAR13/15 export
Downloaded file: a zip archive of the following structure:
Word recognition task:
- supported annotations: Label
icdar
with attributecaption
Text localization task:
- supported annotations: Rectangles and Polygons with label
icdar
and attributetext
Text segmentation task:
- supported annotations: Rectangles and Polygons with label
icdar
and attributesindex
,text
,color
,center
ICDAR13/15 import
Uploaded file: a zip archive of the structure above
Word recognition task:
- supported annotations: Label
icdar
with attributecaption
Text localization task:
- supported annotations: Rectangles and Polygons with label
icdar
and attributetext
Text segmentation task:
- supported annotations: Rectangles and Polygons with label
icdar
and attributesindex
,text
,color
,center
16 -
Open Images
-
Supported annotations:
- Rectangles (detection task)
- Tags (classification task)
- Polygons (segmentation task)
-
Supported attributes:
-
Labels
score
(should be defined for labels astext
ornumber
). The confidence level from 0 to 1.
-
Bounding boxes
score
(should be defined for labels astext
ornumber
). The confidence level from 0 to 1.occluded
(both UI option and a separate attribute). Whether the object is occluded by another object.truncated
(should be defined for labels ascheckbox
-es). Whether the object extends beyond the boundary of the image.is_group_of
(should be defined for labels ascheckbox
-es). Whether the object represents a group of objects of the same class.is_depiction
(should be defined for labels ascheckbox
-es). Whether the object is a depiction (such as a drawing) rather than a real object.is_inside
(should be defined for labels ascheckbox
-es). Whether the object is seen from the inside.
-
Masks
box_id
(should be defined for labels astext
). An identifier for the bounding box associated with the mask.predicted_iou
(should be defined for labels astext
ornumber
). Predicted IoU value with respect to the ground truth.
-
Open Images export
Downloaded file: a zip archive of the following structure:
└─ taskname.zip/
├── annotations/
│ ├── bbox_labels_600_hierarchy.json
│ ├── class-descriptions.csv
| ├── images.meta # additional file with information about image sizes
│ ├── <subset_name>-image_ids_and_rotation.csv
│ ├── <subset_name>-annotations-bbox.csv
│ ├── <subset_name>-annotations-human-imagelabels.csv
│ └── <subset_name>-annotations-object-segmentation.csv
├── images/
│ ├── subset1/
│ │ ├── <image_name101.jpg>
│ │ ├── <image_name102.jpg>
│ │ └── ...
│ ├── subset2/
│ │ ├── <image_name201.jpg>
│ │ ├── <image_name202.jpg>
│ │ └── ...
| ├── ...
└── masks/
├── subset1/
│ ├── <mask_name101.png>
│ ├── <mask_name102.png>
│ └── ...
├── subset2/
│ ├── <mask_name201.png>
│ ├── <mask_name202.png>
│ └── ...
├── ...
Open Images import
Uploaded file: a zip archive of the following structure:
└─ upload.zip/
├── annotations/
│ ├── bbox_labels_600_hierarchy.json
│ ├── class-descriptions.csv
| ├── images.meta # optional, file with information about image sizes
│ ├── <subset_name>-image_ids_and_rotation.csv
│ ├── <subset_name>-annotations-bbox.csv
│ ├── <subset_name>-annotations-human-imagelabels.csv
│ └── <subset_name>-annotations-object-segmentation.csv
└── masks/
├── subset1/
│ ├── <mask_name101.png>
│ ├── <mask_name102.png>
│ └── ...
├── subset2/
│ ├── <mask_name201.png>
│ ├── <mask_name202.png>
│ └── ...
├── ...
Image ids in the <subset_name>-image_ids_and_rotation.csv
should match with
image names in the task.
17 -
Cityscapes
-
Supported annotations
- Polygons (segmentation task)
-
Supported attributes
- ‘is_crowd’ (boolean, should be defined for labels as
checkbox
-es) Specifies if the annotation label can distinguish between different instances. If False, the annotation id field encodes the instance id.
- ‘is_crowd’ (boolean, should be defined for labels as
Cityscapes export
Downloaded file: a zip archive of the following structure:
.
├── label_color.txt
├── gtFine
│ ├── <subset_name>
│ │ └── <city_name>
│ │ ├── image_0_gtFine_instanceIds.png
│ │ ├── image_0_gtFine_color.png
│ │ ├── image_0_gtFine_labelIds.png
│ │ ├── image_1_gtFine_instanceIds.png
│ │ ├── image_1_gtFine_color.png
│ │ ├── image_1_gtFine_labelIds.png
│ │ ├── ...
└── imgsFine # if saving images was requested
└── leftImg8bit
├── <subset_name>
│ └── <city_name>
│ ├── image_0_leftImg8bit.png
│ ├── image_1_leftImg8bit.png
│ ├── ...
label_color.txt
a file that describes the color for each label
# label_color.txt example
# r g b label_name
0 0 0 background
0 255 0 tree
...
*_gtFine_color.png
class labels encoded by its color.*_gtFine_labelIds.png
class labels are encoded by its index.*_gtFine_instanceIds.png
class and instance labels encoded by an instance ID. The pixel values encode class and the individual instance: the integer part of a division by 1000 of each ID provides class ID, the remainder is the instance ID. If a certain annotation describes multiple instances, then the pixels have the regular ID of that class
Cityscapes annotations import
Uploaded file: a zip archive with the following structure:
.
├── label_color.txt # optional
└── gtFine
└── <city_name>
├── image_0_gtFine_instanceIds.png
├── image_1_gtFine_instanceIds.png
├── ...
Creating task with Cityscapes dataset
Create a task with the labels you need or you can use the labels and colors of the original dataset. To work with the Cityscapes format, you must have a black color label for the background.
Original Cityscapes color map:
Upload images when creating a task:
images.zip/
├── image_0.jpg
├── image_1.jpg
├── ...
After creating the task, upload the Cityscapes annotations as described in the previous section.
18 -
KITTI
-
supported annotations:
- Rectangles (detection task)
- Polygon (segmentation task)
-
supported attributes:
occluded
(both UI option and a separate attribute). Indicates that a significant portion of the object within the bounding box is occluded by another objecttruncated
supported only for rectangles (should be defined for labels ascheckbox
-es). Indicates that the bounding box specified for the object does not correspond to the full extent of the object- ‘is_crowd’ supported only for polygons
(should be defined for labels as
checkbox
-es). Indicates that the annotation covers multiple instances of the same class
KITTI annotations export
Downloaded file: a zip archive of the following structure:
└─ annotations.zip/
├── label_colors.txt # list of pairs r g b label_name
├── labels.txt # list of labels
└── default/
├── label_2/ # left color camera label files
│ ├── <image_name_1>.txt
│ ├── <image_name_2>.txt
│ └── ...
├── instance/ # instance segmentation masks
│ ├── <image_name_1>.png
│ ├── <image_name_2>.png
│ └── ...
├── semantic/ # semantic segmentation masks (labels are encoded by its id)
│ ├── <image_name_1>.png
│ ├── <image_name_2>.png
│ └── ...
└── semantic_rgb/ # semantic segmentation masks (labels are encoded by its color)
├── <image_name_1>.png
├── <image_name_2>.png
└── ...
KITTI annotations import
You can upload KITTI annotations in two ways: rectangles for the detection task and masks for the segmentation task.
For detection tasks the uploading archive should have the following structure:
└─ annotations.zip/
├── labels.txt # optional, labels list for non-original detection labels
└── <subset_name>/
├── label_2/ # left color camera label files
│ ├── <image_name_1>.txt
│ ├── <image_name_2>.txt
│ └── ...
For segmentation tasks the uploading archive should have the following structure:
└─ annotations.zip/
├── label_colors.txt # optional, color map for non-original segmentation labels
└── <subset_name>/
├── instance/ # instance segmentation masks
│ ├── <image_name_1>.png
│ ├── <image_name_2>.png
│ └── ...
├── semantic/ # optional, semantic segmentation masks (labels are encoded by its id)
│ ├── <image_name_1>.png
│ ├── <image_name_2>.png
│ └── ...
└── semantic_rgb/ # optional, semantic segmentation masks (labels are encoded by its color)
├── <image_name_1>.png
├── <image_name_2>.png
└── ...
All annotation files and masks should have structures that are described in the original format specification.
19 -
LFW
-
Format specification available here
-
Supported annotations: tags, points.
-
Supported attributes:
negative_pairs
(should be defined for labels astext
): list of image names with mismatched persons.positive_pairs
(should be defined for labels astext
): list of image names with matched persons.
Import LFW annotation
The uploaded annotations file should be a zip file with the following structure:
Full information about the content of annotation files is available here
Export LFW annotation
Downloaded file: a zip archive of the following structure:
Example: create task with images and upload LFW annotations into it
This is one of the possible ways to create a task and add LFW annotations for it.
- On the task creation page:
- Add labels that correspond to the names of the persons.
- For each label define
text
attributes with namespositive_pairs
andnegative_pairs
- Add images using zip archive from local repository:
- On the annotation page: Upload annotation -> LFW 1.0 -> choose archive with structure that described in the import section.