This is the multi-page printable view of this section. Click here to print.
Dataset Management
- 1: Specification for annotators
- 2: Import datasets and upload annotation
- 3: Export annotations and data from CVAT
- 3.1: CVAT for image
- 3.2: Datumaro
- 3.3: LabelMe
- 3.4: MOT
- 3.5: MOTS
- 3.6: COCO
- 3.7: COCO Keypoints
- 3.8: Pascal VOC
- 3.9: Segmentation Mask
- 3.10: Ultralytics YOLO
- 3.11: Ultralytics-YOLO-Classification
- 3.12: YOLO
- 3.13: ImageNet
- 3.14: Wider Face
- 3.15: CamVid
- 3.16: VGGFace2
- 3.17: Market-1501
- 3.18: ICDAR13/15
- 3.19: Open Images
- 3.20: Cityscapes
- 3.21: KITTI
- 3.22: LFW
1 - Specification for annotators
The Guide feature provides a built-in markdown editor that allows you to create specification for annotators.
Once you create and submit the specification, it will be accessible from the annotation interface (see below).
You can attach the specification to Projects or to Tasks.
The attachment procedure is the same for individual users and organizations.
See:
- Adding specification to Project
- Adding specification to Task
- Access to specification for annotators
- Markdown editor guide
- Specification for annotators’ video tutorial
Adding specification to Project
To add specification to the Projects, do the following:
- Go to the Projects page and click on the project to which you want to add specification.
- Under the Project description, click Edit.

- Add instruction to the Markdown editor, and click Submit.
Editing rights
- For individual users: only the project owner and the project assignee can edit the specification.
- For organizations: specification additionally can be edited by the organization owner and maintainer

Adding specification to Task
To add specification to the Task, do the following:
-
Go to the Tasks page and click on the task to which you want to add specification.
-
Under the Task description, click Edit.

-
Add instruction to the Markdown editor, and click Submit.
Editing rights
- For individual users: only the task owner and task assignee can edit the specification.
- For organizations: only the task owner, maintainer, and task assignee can edit the specification.

Access to specification for annotators
The specification is opened automatically when the job has new annotation state.
It means, that it will open when the assigned user begins working on the first
job within a Project or Task.
The specifications will not automatically reopen if the user moves to another job within the same Project or Task.
If a Project or Task is reassigned to another annotator, the specifications will automatically be shown when the annotator opens the first job but will not reappear for subsequent jobs.
To enable the option for specifications to always open automatically,
append the ?openGuide parameter to the end of the job URL you share with the annotator:
/tasks/<task_id>/jobs/<job_id>?openGuide
For example:
https://app.cvat.ai/tasks/taskID/jobs/jobID?openGuide
To open specification manually, do the following:
- Open the job to see the annotation interface.
- In the top right corner, click Guide button(
).
Markdown editor guide
The markdown editor for Guide has two panes. Add instructions to the left pane, and the editor will immediately show the formatted result on the right.

You can write in raw markdown or use the toolbar on the top of the editor.

| Element | Description |
|---|---|
| 1 | Text formatting: bold, cursive, and strikethrough. |
| 2 | Insert a horizontal rule (horizontal line). |
| 3 | Add a title, heading, or subheading. It provides a drop-down list to select the title level (from 1 to 6). |
| 4 | Add a link. Note: If you left-click on the link, it will open in the same window. |
| 5 | Add a quote. |
| 6 | Add a single line of code. |
| 7 | Add a block of code. |
| 8 | Add a comment. The comment is only visible to Guide editors and remains invisible to annotators. |
| 9 | Add a picture. To use this option, first, upload the picture to an external resource and then add the link in the editor. Alternatively, you can drag and drop a picture into the editor, which will upload it to the CVAT server and add it to the specification. |
| 10 | Add a list: bullet list, numbered list, and checklist. |
| 11 | Hide the editor pane: options to hide the right pane, show both panes or hide the left pane. |
| 12 | Enable full-screen mode. |
Specification for annotators’ video tutorial
Video tutorial on how to use the Guide feature.
2 - Import datasets and upload annotation
Export dataset
You can export a dataset to a project, task or job.
-
To download the latest annotations, you have to save all changes first. Click the
Savebutton. There is aCtrl+Sshortcut to save annotations quickly.
-
After that, click the
Menubutton. Exporting and importing of task and project datasets takes place through theActionmenu. -
Press the
Export task datasetbutton.
-
Choose the format for exporting the dataset. Exporting and importing is available in:
-
Standard CVAT formats:
-
CVAT for video choose if the task is created in interpolation mode.
-
CVAT for images choose if a task is created in annotation mode.
-
-
And also in formats from the list of annotation formats supported by CVAT.
-
For 3D tasks, the following formats are available:
- Kitti Raw Format 1.0
- Sly Point Cloud Format 1.0 - Supervisely Point Cloud dataset
-
-
To download images with the dataset, enable the
Save imagesoption. -
(Optional) To name the resulting archive, use the
Custom namefield. -
You can choose a storage for dataset export by selecting a target storage
LocalorCloud storage. The default settings are the settings that had been selected when the project was created (for example, if you specified a local storage when you created the project, then by default, you will be prompted to export the dataset to your PC). You can find out the default value by hovering the mouse over the?. Learn more about attach cloud storage.
Import dataset
You can import dataset only to a project. In this case, the data will be split into subsets.
To import a dataset, do the following on the Project page:

- Open the
Actionsmenu. - Press the
Import datasetbutton. - Select the dataset format (if you did not specify a custom name during export, the format will be in the archive name).
- Drag the file to the file upload area or click on the upload area to select the file through the explorer.

- You can also import a dataset from an attached cloud storage.
Here you should select the annotation format, then select a cloud storage from the list or use default settings
if you have already specified required cloud storage for task or project
and specify a zip archive to the text field
File name.
During the import process, you will be able to track the progress of the import.
Upload annotations

In the task or job you can upload an annotation. For this select the item Upload annotation
in the menu Action of the task or in the job Menu on the Top panel select the format in which you plan
to upload the annotation and select the annotation file or archive via explorer.

Or you can also use the attached cloud storage to upload the annotation file.
3 - Export annotations and data from CVAT
In CVAT, you have the option to export data in various formats. The choice of export format depends on the type of annotation as well as the intended future use of the dataset.
See:
Data export formats
The table below outlines the available formats for data export in CVAT.
| Format | Type | Computer Vision Task | Models | Shapes | Attributes | Video Tracks |
|---|---|---|---|---|---|---|
| CamVid 1.0 | .txt .png |
Semantic Segmentation |
U-Net, SegNet, DeepLab, PSPNet, FCN, Mask R-CNN, ICNet, ERFNet, HRNet, V-Net, and others. |
Polygons, Masks | Not supported | Not supported |
| Cityscapes 1.0 | .txt .png |
Semantic Segmentation |
U-Net, SegNet, DeepLab, PSPNet, FCN, ERFNet, ICNet, Mask R-CNN, HRNet, ENet, and others. |
Polygons, Masks | Specific attributes | Not supported |
| COCO 1.0 | .json | Detection, Semantic Segmentation |
YOLO (You Only Look Once), Faster R-CNN, Mask R-CNN, SSD (Single Shot MultiBox Detector), RetinaNet, EfficientDet, UNet, DeepLabv3+, CenterNet, Cascade R-CNN, and others. |
Bounding Boxes, Polygons, Masks | All attributes | Supported |
| COCO Keypoints 1.0 | .json | Keypoints | OpenPose, PoseNet, AlphaPose, SPM (Single Person Model), Mask R-CNN with Keypoint Detection:, and others. |
Skeletons | All attributes | Supported |
| CVAT for images 1.1 | .xml | Any in 2D except for Video Tracking | Any model that can decode the format. | Tags, Bounding Boxes, Polygons, Polylines, Points, Cuboids, Skeletons, Ellipses, Masks |
All attributes | Supported |
| CVAT for video 1.1 | .xml | Any in 2D except for Classification | Any model that can decode the format. | Bounding Boxes, Polygons, Polylines, Points, Cuboids, Skeletons, Ellipses, Masks |
All attributes | Supported |
| Datumaro 1.0 | .json | Any | Any model that can decode the format. Main format in Datumaro framework |
Tags, Bounding Boxes, Polygons, Polylines, Points, Cuboids, Skeletons, Ellipses, Masks |
All attributes | Supported |
| ICDAR Includes ICDAR Recognition 1.0, ICDAR Detection 1.0, and ICDAR Segmentation 1.0 descriptions. |
.txt | Text recognition, Text detection, Text segmentation |
EAST: Efficient and Accurate Scene Text Detector, CRNN, Mask TextSpotter, TextSnake, and others. |
Tags, Bounding Boxes, Polygons, Masks | Specific attributes | Not supported |
| ImageNet 1.0 | .jpg .txt |
Semantic Segmentation, Classification, Detection |
VGG (VGG16, VGG19), Inception, YOLO, Faster R-CNN , U-Net, and others | Tags | No attributes | Not supported |
| KITTI 1.0 | .txt .png |
Semantic Segmentation, Detection, 3D | PointPillars, SECOND, AVOD, YOLO, DeepSORT, PWC-Net, ORB-SLAM, and others. | Bounding Boxes, Polygons, Masks | Specific attributes | Not supported |
| LabelMe 3.0 | .xml | Compatibility, Semantic Segmentation |
U-Net, Mask R-CNN, Fast R-CNN, Faster R-CNN, DeepLab, YOLO, and others. |
Bounding Boxes, Polygons, Masks | Supported (Polygons) | Not supported |
| LFW 1.0 | .txt | Verification, Face recognition |
OpenFace, VGGFace & VGGFace2, FaceNet, ArcFace, and others. |
Tags, Skeletons | Specific attributes | Not supported |
| Market-1501 1.0 | .txt | Re-identification | Triplet Loss Networks, Deep ReID models, and others. |
Bounding Boxes | Specific attributes | Not supported |
| MOT 1.0 | .txt | Video Tracking, Detection |
SORT, MOT-Net, IOU Tracker, and others. |
Bounding Boxes | Specific attributes | Supported |
| MOTS PNG 1.0 | .png .txt |
Video Tracking, Detection |
SORT, MOT-Net, IOU Tracker, and others. |
Bounding Boxes, Masks | Specific attributes | Supported |
| Open Images 1.0 | .csv | Detection, Classification, Semantic Segmentation |
Faster R-CNN, YOLO, U-Net, CornerNet, and others. |
Tags, Bounding Boxes, Polygons, Masks | Specific attributes | Not supported |
| PASCAL VOC 1.0 | .xml, .png | Classification, Detection | Faster R-CNN, SSD, YOLO, AlexNet, and others. |
Tags, Bounding Boxes, Polygons, Masks | Specific attributes | Not supported |
| Segmentation Mask 1.0 | .png | Semantic Segmentation | Faster R-CNN, SSD, YOLO, AlexNet, and others. |
Polygons, Masks | No attributes | Not supported |
| VGGFace2 1.0 | .csv | Face recognition | VGGFace, ResNet, Inception, and others. |
Bounding Boxes, Points | No attributes | Not supported |
| WIDER Face 1.0 | .txt | Detection | SSD (Single Shot MultiBox Detector), Faster R-CNN, YOLO, and others. |
Tags, Bounding Boxes | Specific attributes | Not supported |
| YOLO 1.0 | .txt | Detection | YOLOv1, YOLOv2 (YOLO9000), YOLOv3, YOLOv4, and others. |
Bounding Boxes | No attributes | Not supported |
| Ultralytics YOLO Detection 1.0 | .txt | Detection | YOLOv8 | Bounding Boxes | No attributes | Supported |
| Ultralytics YOLO Segmentation 1.0 | .txt | Instance Segmentation | YOLOv8 | Polygons, Masks | No attributes | Supported |
| Ultralytics YOLO Pose 1.0 | .txt | Keypoints | YOLOv8 | Skeletons | No attributes | Supported |
| Ultralytics YOLO Oriented Bounding Boxes 1.0 | .txt | Detection | YOLOv8 | Bounding Boxes | No attributes | Supported |
| Ultralytics YOLO Classification 1.0 | .jpg | Classification | YOLOv8 | Tags | No attributes | Not supported |
Exporting dataset in CVAT
Exporting dataset from Task
To export the dataset from the task, follow these steps:
-
Open Task.
-
Go to Actions > Export task dataset.
-
Choose the desired format from the list of available options.
-
(Optional) Toggle the Save images switch if you wish to include images in the export.
Note
The Save images option is a paid feature.
-
Input a name for the resulting
.ziparchive. -
Click OK to initiate the export.
Exporting dataset from Job
To export a dataset from Job follow these steps:
-
Navigate to Menu > Export job dataset.

-
Choose the desired format from the list of available options.
-
(Optional) Toggle the Save images switch if you wish to include images in the export.
Note
The Save images option is a paid feature.
-
Input a name for the resulting
.ziparchive. -
Click OK to initiate the export.
Data export video tutorial
For more information on the process, see the following tutorial:
3.1 - CVAT for image
This is CVAT’s native annotation format, which fully supports all of CVAT’s annotation features. It is ideal for creating data backups.
For more information, see:
CVAT for image export
Applicable for all computer vision tasks in 2D except for Video Tracking.
- Supported annotations: Tags, Bounding Boxes, Polygons, Polylines, Points, Cuboids, Ellipses, Skeletons, Masks.
- Attributes: Supported.
- Tracks: Supported (via the extra
track_idattribute).
The downloaded file is a .zip archive with following structure:
taskname.zip/
├── images/
| ├── img1.png
| └── img2.jpg
└── annotations.xml
CVAT for video export
Applicable for all computer vision tasks in 2D except for Classification
- Supported annotations: Bounding Boxes, Polygons, Polylines, Points, Cuboids, Ellipses, Skeletons, Masks.
- Attributes: Supported.
- Tracks: Supported.
- Shapes are exported as single-frame tracks
Downloaded file is a .zip archive with following structure:
taskname.zip/
├── images/
| ├── frame_000000.png
| └── frame_000001.png
└── annotations.xml
CVAT for video import
Uploaded file: either an .xml file or a
.zip file with the contents described above.
3.2 - Datumaro
The Datumaro format is a universal format, capable of handling arbitrary datasets and annotations. It is the native format of the Datumaro dataset framework. The framework can be used for various dataset operations, such as dataset and annotation transformations, format conversions, computation of statistics, and dataset merging. This framework is used in CVAT as the dataset support provider. It effectively means that anything you import in CVAT or export from CVAT, can be processed with Datumaro, allowing you to perform custom dataset operations easily.
For more information, see:
Datumaro export
- Supported annotations: Tags, Bounding Boxes, Polygons, Polylines, Points, Cuboids, Ellipses, Masks, Skeletons.
- Attributes: Supported.
- Tracks: Supported (via the
track_idattribute).
The downloaded file is a .zip archive with the following structure:
taskname.zip/
├── annotations/
│ └── default.json
└── images/
└── default/
├── image1.jpg
├── image2.jpg
├── ...
Datumaro import
- Supported annotations: Tags, Bounding Boxes, Polygons, Polylines, Points, Cuboids, Ellipses, Masks, Skeletons.
- Attributes: Supported.
- Tracks: Supported.
Uploaded file: a .json file with annotations or a .zip archive of the following structure:
archive.zip/
└── annotations/
├── subset1.json
└── subset2.json
The .json annotations files in the annotations directory should have similar structure:
{
"info": {},
"categories": {
"label": {
"labels": [
{
"name": "label_0",
"parent": "",
"attributes": []
},
{
"name": "label_1",
"parent": "",
"attributes": []
}
],
"attributes": []
}
},
"items": [
{
"id": "img1",
"annotations": [
{
"id": 0,
"type": "polygon",
"attributes": {},
"group": 0,
"label_id": 1,
"points": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0],
"z_order": 0
},
{
"id": 1,
"type": "bbox",
"attributes": {},
"group": 1,
"label_id": 0,
"z_order": 0,
"bbox": [1.0, 2.0, 3.0, 4.0]
},
{
"id": 2,
"type": "mask",
"attributes": {},
"group": 1,
"label_id": 0,
"rle": {
"counts": "d0d0:F\\0",
"size": [10, 10]
},
"z_order": 0
}
]
}
]
}
3.3 - LabelMe
The LabelMe format is often used for image segmentation tasks in computer vision. While it may not be specifically tied to any particular models, it’s designed to be versatile and can be easily converted to formats that are compatible with popular frameworks like TensorFlow or PyTorch.
For more information, see:
LabelMe export
For export of images:
- Supported annotations: Bounding Boxes, Polygons, Masks, Ellipses (as masks).
- Attributes: Supported for Polygons.
- Tracks: Not supported.
The downloaded file is a .zip archive with the following structure:
taskname.zip/
├── img1.jpg
└── img1.xml
LabelMe import
- Supported annotations: Rectangles, Polygons, Masks
Uploaded file: a .zip archive of the following structure:
taskname.zip/
├── Masks/
| ├── img1_mask1.png
| └── img1_mask2.png
├── img1.xml
├── img2.xml
└── img3.xml
3.4 - MOT
The MOT (Multiple Object Tracking) sequence format is widely used for evaluating multi-object tracking algorithms, particularly in the domains of pedestrian tracking, vehicle tracking, and more. The MOT sequence format essentially contains frames of video along with annotations that specify object locations and identities over time.
For more information, see:
MOT export
For export of images and videos:
- Supported annotations: Bounding Box tracks.
- Attributes:
visibility(number),ignored(checkbox) - Tracks: Supported.
The downloaded file is a .zip archive with the following structure:
taskname.zip/
├── img1/
| ├── image1.jpg
| └── image2.jpg
└── gt/
├── labels.txt
└── gt.txt
# labels.txt
cat
dog
person
...
# gt.txt
# frame_id, track_id, x, y, w, h, "not ignored", class_id, visibility, <skipped>
1,1,1363,569,103,241,1,1,0.86014
...
MOT import
Uploaded file: a .zip archive of the structure above or:
archive.zip/
└── gt/
└── gt.txt
└── labels.txt # optional, mandatory for non-official labels
3.5 - MOTS
The MOT (Multiple Object Tracking) sequence format is widely used for evaluating multi-object tracking algorithms, particularly in the domains of pedestrian tracking, vehicle tracking, and more. The MOT sequence format essentially contains frames of video along with annotations that specify object locations and identities over time.
This version encoded as .png. Supports masks.
For more information, see:
MOTS PNG export
For export of images and videos:
- Supported annotations: Masks, Bounding Boxes (as masks), Polygons (as masks), Ellipses (as masks).
- Attributes:
visibility(number),ignored(checkbox). - Tracks: Supported. Only tracks are supported, shapes are ignored.
The downloaded file is a .zip archive with the following structure:
taskname.zip/
└── <any_subset_name>/
| images/
| ├── image1.jpg
| └── image2.jpg
└── instances/
├── labels.txt
├── image1.png
└── image2.png
# labels.txt
cat
dog
person
...
MOTS PNG import
- Supported annotations: Masks or Polygon tracks
Uploaded file: a .zip archive of the structure above
3.6 - COCO
The COCO dataset format is a popular format, designed for tasks involving object detection and instance segmentation. It’s supported by many annotation tools and model training frameworks, making it a safe default choice for typical object detection projects.
For more information, see:
COCO export
- Supported annotations: Bounding Boxes, Polygons, Masks, Ellipses (as masks).
- Attributes:
is_crowdThis can either be a checkbox or an integer (with values of 0 or 1). It indicates whether the instance (a group of objects) should be represented as an RLE-encoded mask or a set of polygons in thesegmentationfield of the annotation file. The largest (by area) shape in the group sets the properties for the entire object group. If the attribute is not specified, the input shape type is used (polygon or mask). IfTrueor 1, all shapes within the group will be converted into a single mask. IfFalseor 0, all shapes within the group will be converted into polygons.- Arbitrary attributes: These will be stored within the custom
attributessection of the annotation.
- Tracks: Supported (via the
track_idcustom attribute).
The downloaded file is a .zip archive with the following structure:
taskname.zip/
├── images/
│ └── <subset_name>/
│ ├── <image_name1.ext>
│ ├── <image_name2.ext>
│ └── ...
└── annotations/
├── instances_<subset_name>.json
└── ...
When exporting a dataset from a Project, subset names will mirror those used within the project itself. Otherwise, a singular default subset will be created to house all the dataset information.
COCO import
- Supported annotations: Bounding Boxes (if the
segmentationfield is empty), Polygons, Masks. - Attributes: Supported, as described in the export section
- Tracks: Supported (via the
track_idcustom attribute). - Supported tasks:
instances,person_keypoints(only segmentations will be imported),panoptic.
Upload format: a .json file with annotations
or a .zip archive with the structure described above or
here
(without images).
Note
Even thoughlicenses and info fields are required according to format specifications,
CVAT does not require them to import annotations.
How to create a task from MS COCO dataset
-
Download the MS COCO dataset.
For example
val imagesandinstancesannotations -
Create a CVAT task with the following labels:
person bicycle car motorcycle airplane bus train truck boat "traffic light" "fire hydrant" "stop sign" "parking meter" bench bird cat dog horse sheep cow elephant bear zebra giraffe backpack umbrella handbag tie suitcase frisbee skis snowboard "sports ball" kite "baseball bat" "baseball glove" skateboard surfboard "tennis racket" bottle "wine glass" cup fork knife spoon bowl banana apple sandwich orange broccoli carrot "hot dog" pizza donut cake chair couch "potted plant" bed "dining table" toilet tv laptop mouse remote keyboard "cell phone" microwave oven toaster sink refrigerator book clock vase scissors "teddy bear" "hair drier" toothbrush -
Select
val2017.zipas data (See Creating an annotation task guide for details) -
Unpack
annotations_trainval2017.zip -
click
Upload annotationbutton, chooseCOCO 1.1and selectinstances_val2017.jsonannotation file. It can take some time.
3.7 - COCO Keypoints
The COCO Keypoints format is designed specifically for human pose estimation tasks, where the objective is to identify and localize body joints or keypoints on a human figure within an image. This format is used with a variety of state-of-the-art models focused on pose estimation.
For more information, see:
COCO Keypoints export
- Supported annotations: Skeletons
- Attributes: Supported (stored in the custom
attributesfield of the annotation). - Tracks: Supported (via the
track_idcustom attribute).
Downloaded file is a .zip archive with the following structure:
├── images/
│ └── <subset_name>/
│ ├── <image_name1.ext>
│ ├── <image_name2.ext>
│ └── ...
└── annotations/
├── person_keypoints_<subset_name>.json
└── ...
COCO Keypoints import
- Supported annotations: Skeletons
- Attributes: Supported (via the custom
attributesfield of the annotation). - Tracks: Supported (via the
track_idcustom attribute).
Uploaded file: a single unpacked .json or a .zip archive with the structure described above or
here
(without images).
3.8 - Pascal VOC
The Pascal VOC (Visual Object Classes) format is one of the earlier established benchmarks for object classification and detection, which provides a standardized image data set for object class recognition.
The export data format is XML-based and has been widely adopted in computer vision tasks.
For more information, see:
Pascal VOC export
For export of images:
- Supported annotations: Bounding Boxes (detection), Tags (classification), Polygons (segmentation), Masks (segmentation), Ellipses (segmentation, as masks).
- Attributes:
occludedas both UI option and a separate attribute.truncatedanddifficultmust be defined for labels ascheckbox.- Arbitrary attributes in the
attributessection of XML files.
- Tracks: Not supported.
The downloaded file is a .zip archive with the following structure:
taskname.zip/
├── JPEGImages/
│ ├── <image_name1>.jpg
│ ├── <image_name2>.jpg
│ └── <image_nameN>.jpg
├── Annotations/
│ ├── <image_name1>.xml
│ ├── <image_name2>.xml
│ └── <image_nameN>.xml
├── ImageSets/
│ └── Main/
│ └── default.txt
└── labelmap.txt
# labelmap.txt
# label : color_rgb : 'body' parts : actions
background:::
aeroplane:::
bicycle:::
bird:::
Pascal VOC import
- Supported attributes: action attributes (import only, should be defined as
checkbox-es)
Uploaded file: a .zip archive of the structure declared above or the following:
taskname.zip/
├── <image_name1>.xml
├── <image_name2>.xml
└── <image_nameN>.xml
It must be possible for CVAT to match the frame name and file name
from annotation .xml file (the filename tag, e. g.
<filename>2008_004457.jpg</filename> ).
There are 2 options:
-
full match between frame name and file name from annotation
.xml(in cases when task was created from images or image archive). -
match by frame number. File name should be
<number>.jpgorframe_000000.jpg. It should be used when task was created from video.
How to create a task from Pascal VOC dataset
-
Download the Pascal Voc dataset (Can be downloaded from the PASCAL VOC website)
-
Create a CVAT task with the following labels:
aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitorYou can add
~checkbox=difficult:false ~checkbox=truncated:falseattributes for each label if you want to use them.Select interesting image files (See Creating an annotation task guide for details).
-
Zip the corresponding annotation files
-
Click
Upload annotationbutton, choosePascal VOC ZIP 1.1and select the zip file with annotations from previous step. It may take some time.
3.9 - Segmentation Mask
Segmentation Mask format is a simple format for image segmentation tasks like semantic segmentation, instance segmentation, and panoptic segmentation. It is a custom format based on the Pascal VOC segmentation format.
Segmentation Mask export
- Supported annotations: Masks, Bounding Boxes (as masks), Polygons (as masks), Ellipses (as masks).
- Attributes: Not supported.
- Tracks: Not supported (exported as separate shapes).
The downloaded file is a .zip archive with the following structure:
taskname.zip/
├── labelmap.txt # optional, required for non-Pascal VOC labels
├── ImageSets/
│ └── Segmentation/
│ └── default.txt # list of image names without extension
├── SegmentationClass/ # merged class masks
│ ├── image1.png
│ └── image2.png
└── SegmentationObject/ # merged instance masks
├── image1.png
└── image2.png
# labelmap.txt
# label : color (RGB) : 'body' parts : actions
background:0,128,0::
aeroplane:10,10,128::
bicycle:10,128,0::
bird:0,108,128::
boat:108,0,100::
bottle:18,0,8::
bus:12,28,0::
A mask is a .png image that can have either 1 or 3 channels.
Each pixel in the image has a color that corresponds to a specific label.
The colors are generated according to the Pascal VOC
algorithm.
By default, the color (0, 0, 0) is used to represent the background.
Segmentation Mask import
- Supported annotations: Masks, Polygons (if Convert masks to polygons is enabled).
- Attributes: Not supported.
- Tracks: Not supported.
Uploaded file: a .zip archive of the following structure:
archive.zip/
├── labelmap.txt # optional, required for non-Pascal VOC labels
├── ImageSets/
│ └── Segmentation/
│ └── <any_subset_name>.txt
├── SegmentationClass/
│ ├── image1.png
│ └── image2.png
└── SegmentationObject/
├── image1.png
└── image2.png
The format supports both 3-channel and grayscale (1-channel) PNG masks.
To import 3-channel masks, the labelmap.txt file should declare all the colors used in
the dataset:
# labelmap.txt
# label : color (RGB) : 'body' parts : actions
background:0,128,0::
aeroplane:10,10,128::
bicycle:10,128,0::
bird:0,108,128::
boat:108,0,100::
bottle:18,0,8::
bus:12,28,0::
To import 1-channel masks, the labelmap.txt file should declare all the indices used in
the dataset with no gaps. The number of lines must be equal
to the maximum color index on images. The lines must be in the right order
so that line index is equal to the color index. Lines can have arbitrary,
but different, colors. If there are gaps in the used color
indices in the annotations, they must be filled with arbitrary dummy labels.
# labelmap.txt
# label : color (RGB) : 'body' parts : actions
q:0,128,0:: # color index 0
aeroplane:10,10,128:: # color index 1
_dummy2:2,2,2:: # filler for color index 2
_dummy3:3,3,3:: # filler for color index 3
boat:108,0,100:: # color index 4
...
_dummy198:198,198,198:: # filler for color index 198
_dummy199:199,199,199:: # filler for color index 199
...
the last label:12,28,0:: # color index 200
3.10 - Ultralytics YOLO
Ultralytics YOLO is a format family which consists of four formats:
Dataset examples:
Ultralytics YOLO export
For export of images:
- Supported annotations
- Detection: Bounding Boxes
- Oriented bounding box: Oriented Bounding Boxes
- Segmentation: Polygons, Masks
- Pose: Skeletons
- Attributes: Not supported.
- Tracks: Supported.
The downloaded file is a .zip archive with the following structure:
archive.zip/
├── data.yaml # configuration file
├── train.txt # list of train subset image paths
│
├── images/
│ ├── train/ # directory with images for train subset
│ │ ├── image1.jpg
│ │ ├── image2.jpg
│ │ ├── image3.jpg
│ │ └── ...
├── labels/
│ ├── train/ # directory with annotations for train subset
│ │ ├── image1.txt
│ │ ├── image2.txt
│ │ ├── image3.txt
│ │ └── ...
# train.txt:
images/<subset>/image1.jpg
images/<subset>/image2.jpg
...
# data.yaml:
path: ./ # dataset root dir
train: train.txt # train images (relative to 'path')
# Ultralytics YOLO Pose specific field
# First number is the number of points in a skeleton.
# If there are several skeletons with different number of points, it is the greatest number of points
# Second number defines the format of point info in annotation txt files
kpt_shape: [17, 3]
# Classes
names:
0: person
1: bicycle
2: car
# ...
# <image_name>.txt:
# content depends on format
# Ultralytics YOLO Detection:
# label_id - id from names field of data.yaml
# cx, cy - relative coordinates of the bbox center
# rw, rh - relative size of the bbox
# label_id cx cy rw rh
1 0.3 0.8 0.1 0.3
2 0.7 0.2 0.3 0.1
# Ultralytics YOLO Oriented Bounding Boxes:
# xn, yn - relative coordinates of the n-th point
# label_id x1 y1 x2 y2 x3 y3 x4 y4
1 0.3 0.8 0.1 0.3 0.4 0.5 0.7 0.5
2 0.7 0.2 0.3 0.1 0.4 0.5 0.5 0.6
# Ultralytics YOLO Segmentation:
# xn, yn - relative coordinates of the n-th point
# label_id x1 y1 x2 y2 x3 y3 ...
1 0.3 0.8 0.1 0.3 0.4 0.5
2 0.7 0.2 0.3 0.1 0.4 0.5 0.5 0.6 0.7 0.5
# Ultralytics YOLO Pose:
# cx, cy - relative coordinates of the bbox center
# rw, rh - relative size of the bbox
# xn, yn - relative coordinates of the n-th point
# vn - visibility of n-th point. 2 - visible, 1 - partially visible, 0 - not visible
# if second value in kpt_shape is 3:
# label_id cx cy rw rh x1 y1 v1 x2 y2 v2 x3 y3 v3 ...
1 0.3 0.8 0.1 0.3 0.3 0.8 2 0.1 0.3 2 0.4 0.5 2 0.0 0.0 0 0.0 0.0 0
2 0.3 0.8 0.1 0.3 0.7 0.2 2 0.3 0.1 1 0.4 0.5 0 0.5 0.6 2 0.7 0.5 2
# if second value in kpt_shape is 2:
# label_id cx cy rw rh x1 y1 x2 y2 x3 y3 ...
1 0.3 0.8 0.1 0.3 0.3 0.8 0.1 0.3 0.4 0.5 0.0 0.0 0.0 0.0
2 0.3 0.8 0.1 0.3 0.7 0.2 0.3 0.1 0.4 0.5 0.5 0.6 0.7 0.5
# Note, that if there are several skeletons with different number of points,
# smaller skeletons are padded with points with coordinates 0.0 0.0 and visibility = 0
All coordinates must be normalized. It can be achieved by dividing x coordinates and widths by image width, and y coordinates and heights by image height.
Note
In CVAT you can place an object or some parts of it outside the image, which will cause the coordinates to be outside the [0, 1] range. YOLOv8 framework ignores labels with such coordinates.Each annotation file, with the .txt extension,
is named to correspond with its associated image file.
For example, frame_000001.txt serves as the annotation for the
frame_000001.jpg image.
Track support
Tracks can be saved on export for Detection by using Ultralytics YOLO Detection Track format. It writes track ids to the end of corresponding annotations:
# label_id cx cy rw rh <optional track_id>
1 0.3 0.8 0.1 0.3 1
2 0.7 0.2 0.3 0.1
Ultralytics YOLO Import
Uploaded file: a .zip archive of the same structure as above.
For compatibility with other tools exporting in Ultralytics YOLO format
(e.g. roboflow),
CVAT supports datasets with the inverted directory order of subset and “images” or “labels”,
i.e. both train/images/, images/train/ are valid inputs.
archive.zip/
├── train/
│ ├── images/ # directory with images for train subset
│ │ ├── image1.jpg
│ │ ├── image2.jpg
│ │ └── ...
│ ├── labels/ # directory with annotations for train subset
│ │ ├── image1.txt
│ │ ├── image2.txt
│ │ └── ...
Track support
Import in each of the Ultralytics YOLO formats support tracking. Integer track id can be added to the end of any annotation, e.g. with Detection format:
# label_id cx cy rw rh <optional track_id>
1 0.3 0.8 0.1 0.3 1
2 0.7 0.2 0.3 0.1
3.11 - Ultralytics-YOLO-Classification
For more information, see:
Ultralytics YOLO Classification export
For export of images:
- Supported annotations: Tags.
- Attributes: Not supported.
- Tracks: Not supported.
The downloaded file is a .zip archive with the following structure:
archive.zip/
├── train
│ ├── labels.json # CVAT extension. Contains original ids and labels
│ │ # is not needed when using dataset with YOLOv8 framework
│ │ # but is useful when importing it back to CVAT
│ ├── label_0
│ │ ├── <image_name_0>.jpg
│ │ ├── <image_name_1>.jpg
│ │ ├── <image_name_2>.jpg
│ │ ├── ...
│ ├── label_1
│ │ ├── <image_name_0>.jpg
│ │ ├── <image_name_1>.jpg
│ │ ├── <image_name_2>.jpg
│ │ ├── ...
├── ...
3.12 - YOLO
YOLO, which stands for “You Only Look Once,” is a renowned framework predominantly utilized for real-time object detection tasks. Its efficiency and speed make it an ideal choice for many applications. While YOLO has its unique data format, this format can be tailored to suit other object detection models as well.
For more information, see:
YOLO export
For export of images:
- Supported annotations: Bounding Boxes.
- Attributes: Not supported.
- Tracks: Not supported.
The downloaded file is a .zip archive with the following structure:
archive.zip/
├── obj.data
├── obj.names
├── obj_<subset>_data
│ ├── image1.txt
│ └── image2.txt
└── train.txt # list of subset image paths
# the only valid subsets are: train, valid
# train.txt and valid.txt:
obj_<subset>_data/image1.jpg
obj_<subset>_data/image2.jpg
# obj.data:
classes = 3 # optional
names = obj.names
train = train.txt
valid = valid.txt # optional
backup = backup/ # optional
# obj.names:
cat
dog
airplane
# image_name.txt:
# label_id - id from obj.names
# cx, cy - relative coordinates of the bbox center
# rw, rh - relative size of the bbox
# label_id cx cy rw rh
1 0.3 0.8 0.1 0.3
2 0.7 0.2 0.3 0.1
Each annotation file, with the .txt extension,
is named to correspond with its associated image file.
For example, frame_000001.txt serves as the annotation for the
frame_000001.jpg image.
The structure of the .txt file is as follows:
each line describes a label and a bounding box
in the format label_id cx cy w h.
The file obj.names contains an ordered list of label names.
YOLO import
Uploaded file: a .zip archive of the same structure as above
It must be possible to match the CVAT frame (image name)
and annotation file name. There are 2 options:
-
full match between image name and name of annotation
*.txtfile (in cases when a task was created from images or archive of images). -
match by frame number (if CVAT cannot match by name). File name should be in the following format
<number>.jpg. It should be used when task was created from a video.
How to create a task from YOLO formatted dataset (from VOC for example)
-
Follow the official guide (see Training YOLO on VOC section) and prepare the YOLO formatted annotation files.
-
Zip train images
zip images.zip -j -@ < train.txt -
Create a CVAT task with the following labels:
aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitorSelect images. zip as data. Most likely you should use
sharefunctionality because size of images. zip is more than 500Mb. See Creating an annotation task guide for details. -
Create
obj.nameswith the following content:aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor -
Zip all label files together (we need to add only label files that correspond to the train subset):
cat train.txt | while read p; do echo ${p%/*/*}/labels/${${p##*/}%%.*}.txt; done | zip labels.zip -j -@ obj.names -
Click
Upload annotationbutton, chooseYOLO 1.1and select the zip file with labels from the previous step.
3.13 - ImageNet
The ImageNet is typically used for a variety of computer vision tasks, including but not limited to image classification, object detection, and segmentation.
It is widely recognized and used in the training and benchmarking of various machine learning models.
For more information, see:
ImageNet export
For export of images:
- Supported annotations: Tags.
- Attributes: Not supported.
- Tracks: Not supported.
The downloaded file is a .zip archive with the following structure:
# if we save images:
taskname.zip/
├── label1/
| ├── label1_image1.jpg
| └── label1_image2.jpg
└── label2/
├── label2_image1.jpg
├── label2_image3.jpg
└── label2_image4.jpg
# if we keep only annotation:
taskname.zip/
├── <any_subset_name>.txt
└── synsets.txt
ImageNet import
- Supported annotations: Tags.
Uploaded file: a .zip archive of the structure above
3.14 - Wider Face
The WIDER Face dataset is widely used for face detection tasks. Many popular models for object detection and face detection specifically are trained on this dataset for benchmarking and deployment.
For more information, see:
WIDER Face export
For export of images:
- Supported annotations: Bounding Boxes (with attributes), Tags.
- Attributes:
blur,expression,illumination,pose,invalidoccluded(both the annotation property & an attribute).
- Tracks: Not supported.
The downloaded file is a .zip archive with the following structure:
taskname.zip/
├── labels.txt # optional
├── wider_face_split/
│ └── wider_face_<any_subset_name>_bbx_gt.txt
└── WIDER_<any_subset_name>/
└── images/
├── 0--label0/
│ └── 0_label0_image1.jpg
└── 1--label1/
└── 1_label1_image2.jpg
WIDER Face import
- Supported annotations: Rectangles (with attributes), Labels
- supported attributes:
blur,expression,illumination,occluded,pose,invalid
Uploaded file: a .zip archive of the structure above
3.15 - CamVid
The CamVid (Cambridge-driving Labeled Video Database) format is most commonly used in the realm of semantic segmentation tasks. It is particularly useful for training and evaluating models for autonomous driving and other vision-based robotics applications.
For more information, see:
CamVid export
- Supported annotations: Masks, Bounding Boxes (as masks), Polygons (as masks), Ellipses (as masks).
- Attributes: Not supported.
- Tracks: Not supported (exported as separate shapes).
The downloaded file is a .zip archive with the following structure:
taskname.zip/
├── label_colors.txt # optional, required for non-CamVid labels
├── <any_subset_name>/
| ├── image1.png
| └── image2.png
├── <any_subset_name>annot/
| ├── image1.png
| └── image2.png
└── <any_subset_name>.txt
# label_colors.txt (with color value type)
# if you want to manually set the color for labels, configure label_colors.txt as follows:
# color (RGB) label
0 0 0 Void
64 128 64 Animal
192 0 128 Archway
0 128 192 Bicyclist
0 128 64 Bridge
# label_colors.txt (without color value type)
# if you do not manually set the color for labels, it will be set automatically:
# label
Void
Animal
Archway
Bicyclist
Bridge
A mask in the CamVid dataset is typically a .png
image with either one or three channels.
In this image, each pixel is assigned a specific color that corresponds to a particular label.
By default, the color (0, 0, 0) — or black — is used
to represent the background.
CamVid import
- Supported annotations: Masks, Polygons (if Convert masks to polygons is enabled).
- Attributes: Not supported.
- Tracks: Not supported.
Uploaded file: a .zip archive of the structure above
3.16 - VGGFace2
The VGGFace2 is primarily designed for face recognition tasks and is most commonly used with deep learning models specifically designed for face recognition, verification, and similar tasks.
For more information, see:
VGGFace2 export
For export of images:
- Supported annotations: Bounding Boxes, Points (landmarks - groups of 5 points).
- Attributes: Not supported.
- Tracks: Not supported.
The downloaded file is a .zip archive with the following structure:
taskname.zip/
├── labels.txt # optional
├── <any_subset_name>/
| ├── label0/
| | └── image1.jpg
| └── label1/
| └── image2.jpg
└── bb_landmark/
├── loose_bb_<any_subset_name>.csv
└── loose_landmark_<any_subset_name>.csv
# labels.txt
# n000001 car
label0 <class0>
label1 <class1>
VGGFace2 import
- Supported annotations: Rectangles, Points (landmarks - groups of 5 points)
Uploaded file: a .zip archive of the structure above
3.17 - Market-1501
The Market-1501 dataset is widely used for person re-identification tasks. It is a challenging dataset that has gained significant attention in the computer vision community.
For more information, see:
Market-1501 export
For export of images:
- Supported annotations: Bounding Boxes
- Attributes:
query(checkbox),person_id(number),camera_id(number). - Tracks: Not supported.
Th downloaded file is a .zip archive with the following structure:
taskname.zip/
├── bounding_box_<any_subset_name>/
│ └── image_name_1.jpg
└── query
├── image_name_2.jpg
└── image_name_3.jpg
# if we keep only annotation:
taskname.zip/
└── images_<any_subset_name>.txt
# images_<any_subset_name>.txt
query/image_name_1.jpg
bounding_box_<any_subset_name>/image_name_2.jpg
bounding_box_<any_subset_name>/image_name_3.jpg
# image_name = 0001_c1s1_000015_00.jpg
0001 - person id
c1 - camera id (there are totally 6 cameras)
s1 - sequence
000015 - frame number in sequence
00 - means that this bounding box is the first one among the several
Market-1501 import
- Supported annotations: Label
market-1501with attributes (query,person_id,camera_id)
Uploaded file: a .zip archive of the structure above
3.18 - ICDAR13/15
ICDAR 13/15 formats are typically used for text detection and recognition tasks and OCR (Optical Character Recognition).
These formats are usually paired with specialized text detection and recognition models.
For more information, see:
ICDAR13/15 export
- ICDAR Recognition 1.0 (Text recognition):
- Supported annotations: Tags with the
icdarlabel - Attributes:
caption.
- Supported annotations: Tags with the
- ICDAR Detection 1.0 (Text detection):
- Supported annotations: Bounding Boxes, Polygons with the
icdarlabel - Attributes:
text.
- Supported annotations: Bounding Boxes, Polygons with the
- ICDAR Segmentation 1.0 (Text segmentation):
- Supported annotations: Masks, Bounding Boxes, Polygons, or Ellipses with the
icdarlabel - Attributes:
index,text,color,center
- Supported annotations: Masks, Bounding Boxes, Polygons, or Ellipses with the
- Tracks: Not supported.
The downloaded file is a .zip archive with the following structure:
# text recognition task
taskname.zip/
└── word_recognition/
└── <any_subset_name>/
├── images
| ├── word1.png
| └── word2.png
└── gt.txt
# text localization task
taskname.zip/
└── text_localization/
└── <any_subset_name>/
├── images
| ├── img_1.png
| └── img_2.png
├── gt_img_1.txt
└── gt_img_1.txt
#text segmentation task
taskname.zip/
└── text_localization/
└── <any_subset_name>/
├── images
| ├── 1.png
| └── 2.png
├── 1_GT.bmp
├── 1_GT.txt
├── 2_GT.bmp
└── 2_GT.txt
ICDAR13/15 import
Word recognition task:
- Supported annotations: Tags with the
icdarlabel andcaptionattribute
Text localization task:
- Supported annotations: Rectangles and Polygons with the
icdarlabel andtextattribute
Text segmentation task:
- Supported annotations: Masks or Polygons with the
icdarlabel andindex,text,color,centerattributes
Uploaded file: a .zip archive of the structure above
3.19 - Open Images
The Open Images format is based on a large-scale, diverse dataset that contains object detection, object segmentation, visual relationship, and localized narratives annotations.
Its export data format is compatible with many object detection and segmentation models.
For more information, see:
Open Images export
For export of images:
-
Supported annotations: Bounding Boxes (detection), Tags (classification), Polygons (segmentation), Masks (segmentation), Ellipses (segmentation, as masks).
-
Supported attributes:
- Tags:
scoremust be defined for labels astextornumber. The confidence level from 0 to 1. - Bounding boxes:
scoremust be defined for labels astextornumber. The confidence level from 0 to 1.
occludedas both UI option and a separate attribute. Whether the object is occluded by another object.truncatedmust be defined for labels ascheckbox. Whether the object extends beyond the boundary of the image.is_group_ofmust be defined for labels ascheckbox. Whether the object represents a group of objects of the same class.is_depictionmust be defined for labels ascheckbox. Whether the object is a depiction (such as a drawing) rather than a real object.is_insidemust be defined for labels ascheckbox. Whether the object is seen from the inside. - Masks:
box_idmust be defined for labels astext. An identifier for the bounding box associated with the mask.predicted_ioumust be defined for labels astextornumber. Predicted IoU value with respect to the ground truth.
- Tags:
-
Tracks: Not supported.
The downloaded file is a .zip archive with the following structure:
└─ taskname.zip/
├── annotations/
│ ├── bbox_labels_600_hierarchy.json
│ ├── class-descriptions.csv
| ├── images.meta # additional file with information about image sizes
│ ├── <subset_name>-image_ids_and_rotation.csv
│ ├── <subset_name>-annotations-bbox.csv
│ ├── <subset_name>-annotations-human-imagelabels.csv
│ └── <subset_name>-annotations-object-segmentation.csv
├── images/
│ ├── subset1/
│ │ ├── <image_name101.jpg>
│ │ ├── <image_name102.jpg>
│ │ └── ...
│ ├── subset2/
│ │ ├── <image_name201.jpg>
│ │ ├── <image_name202.jpg>
│ │ └── ...
| ├── ...
└── masks/
├── subset1/
│ ├── <mask_name101.png>
│ ├── <mask_name102.png>
│ └── ...
├── subset2/
│ ├── <mask_name201.png>
│ ├── <mask_name202.png>
│ └── ...
├── ...
Open Images import
Uploaded file: a .zip archive of the following structure:
└─ upload.zip/
├── annotations/
│ ├── bbox_labels_600_hierarchy.json
│ ├── class-descriptions.csv
| ├── images.meta # optional, file with information about image sizes
│ ├── <subset_name>-image_ids_and_rotation.csv
│ ├── <subset_name>-annotations-bbox.csv
│ ├── <subset_name>-annotations-human-imagelabels.csv
│ └── <subset_name>-annotations-object-segmentation.csv
└── masks/
├── subset1/
│ ├── <mask_name101.png>
│ ├── <mask_name102.png>
│ └── ...
├── subset2/
│ ├── <mask_name201.png>
│ ├── <mask_name202.png>
│ └── ...
├── ...
Image ids in the <subset_name>-image_ids_and_rotation.csv should match with
image names in the task.
3.20 - Cityscapes
The Cityscapes format is a widely-used standard in the field of computer vision, particularly for tasks involving semantic and instance segmentation in urban scenes. This dataset format typically comprises high-resolution images of cityscapes along with detailed pixel-level annotations.
Each pixel is labeled with a category such as “road,” “pedestrian,” or “vehicle,” making it a valuable resource for training and validating machine learning models aimed at understanding urban environments. It’s a go-to choice for researchers and professionals working on autonomous vehicles, robotics, and smart cities.
For more information, see:
Cityscapes export
- Supported annotations: Masks, Polygons (as masks), Bounding Boxes (as masks), Ellipses (as masks).
- Attributes:
is_crowdboolean, should be defined for labels ascheckbox. Specifies if the annotation label can distinguish between different instances. IfFalse, the exported annotation will include the instance id value.
- Tracks: Not supported (exported as separate shapes).
The downloaded file is a .zip archive with the following structure:
taskname.zip/
├── label_color.txt
├── gtFine
│ ├── <subset_name>
│ │ └── <city_name>
│ │ ├── image_0_gtFine_instanceIds.png
│ │ ├── image_0_gtFine_color.png
│ │ ├── image_0_gtFine_labelIds.png
│ │ ├── image_1_gtFine_instanceIds.png
│ │ ├── image_1_gtFine_color.png
│ │ ├── image_1_gtFine_labelIds.png
│ │ ├── ...
└── imgsFine # if saving images was requested
└── leftImg8bit
├── <subset_name>
│ └── <city_name>
│ ├── image_0_leftImg8bit.png
│ ├── image_1_leftImg8bit.png
│ ├── ...
label_color.txta file that describes the color for each label
# label_color.txt example
# r g b label_name
0 0 0 background
0 255 0 tree
...
*_gtFine_color.pngclass labels encoded by its color.*_gtFine_labelIds.pngclass labels are encoded by its index.*_gtFine_instanceIds.pngclass and instance labels encoded by an instance ID. The pixel values encode class and the individual instance: the integer part of a division by 1000 of each ID provides class ID, the remainder is the instance ID. If a certain annotation describes multiple instances, then the pixels have the regular ID of that class
Cityscapes import
- Supported annotations: Masks, Polygons (if Convert masks to polygons is enabled).
- Attributes:
is_crowdboolean, should be defined for labels ascheckbox.
- Tracks: Not supported.
Uploaded file: a .zip archive with the following structure:
archive.zip/
├── label_color.txt # optional
└── gtFine
└── <city_name>
├── image_0_gtFine_instanceIds.png
├── image_1_gtFine_instanceIds.png
├── ...
Creating task for Cityscapes dataset
Create a task with the labels you need or you can use the labels and colors of the original dataset. To work with the Cityscapes format, you must have a black color label for the background.
Original Cityscapes color map:
[
{"name": "unlabeled", "color": "#000000", "attributes": []},
{"name": "egovehicle", "color": "#000000", "attributes": []},
{"name": "rectificationborder", "color": "#000000", "attributes": []},
{"name": "outofroi", "color": "#000000", "attributes": []},
{"name": "static", "color": "#000000", "attributes": []},
{"name": "dynamic", "color": "#6f4a00", "attributes": []},
{"name": "ground", "color": "#510051", "attributes": []},
{"name": "road", "color": "#804080", "attributes": []},
{"name": "sidewalk", "color": "#f423e8", "attributes": []},
{"name": "parking", "color": "#faaaa0", "attributes": []},
{"name": "railtrack", "color": "#e6968c", "attributes": []},
{"name": "building", "color": "#464646", "attributes": []},
{"name": "wall", "color": "#66669c", "attributes": []},
{"name": "fence", "color": "#be9999", "attributes": []},
{"name": "guardrail", "color": "#b4a5b4", "attributes": []},
{"name": "bridge", "color": "#966464", "attributes": []},
{"name": "tunnel", "color": "#96785a", "attributes": []},
{"name": "pole", "color": "#999999", "attributes": []},
{"name": "polegroup", "color": "#999999", "attributes": []},
{"name": "trafficlight", "color": "#faaa1e", "attributes": []},
{"name": "trafficsign", "color": "#dcdc00", "attributes": []},
{"name": "vegetation", "color": "#6b8e23", "attributes": []},
{"name": "terrain", "color": "#98fb98", "attributes": []},
{"name": "sky", "color": "#4682b4", "attributes": []},
{"name": "person", "color": "#dc143c", "attributes": []},
{"name": "rider", "color": "#ff0000", "attributes": []},
{"name": "car", "color": "#00008e", "attributes": []},
{"name": "truck", "color": "#000046", "attributes": []},
{"name": "bus", "color": "#003c64", "attributes": []},
{"name": "caravan", "color": "#00005a", "attributes": []},
{"name": "trailer", "color": "#00006e", "attributes": []},
{"name": "train", "color": "#005064", "attributes": []},
{"name": "motorcycle", "color": "#0000e6", "attributes": []},
{"name": "bicycle", "color": "#770b20", "attributes": []},
{"name": "licenseplate", "color": "#00000e", "attributes": []}
]
Upload images when creating a task:
images.zip/
├── image_0.jpg
├── image_1.jpg
├── ...
After creating the task, upload the Cityscapes annotations as described in the previous section.
3.21 - KITTI
The KITTI format is widely used for a range of computer vision tasks related to autonomous driving, including but not limited to 3D object detection, multi-object tracking, and scene flow estimation. Given its special focus on automotive scenes, the KITTI format is generally used with models that are designed or adapted for these types of tasks.
For more information, see:
- KITTI site
- Format specification for KITTI detection
- Format specification for KITTI segmentation
- Dataset examples
KITTI export
For export of images:
- Supported annotations: Bounding Boxes (detection), Polygons (segmentation), Masks (segmentation), Ellipses (segmentation, as masks).
- Supported attributes:
occluded(Available both as a UI option and a separate attribute) Denotes that a major portion of the object within the bounding box is obstructed by another object.truncated(Only applicable to bounding boxes) Must be represented ascheckboxesfor labels. Suggests that the bounding box does not encompass the entire object; some part is cut off.is_crowd(Only valid for polygons). Should be indicated usingcheckboxesfor labels. Signifies that the annotation encapsulates multiple instances of the same object class.
- Tracks: Not supported (exported as separate shapes).
The downloaded file is a .zip archive with the following structure:
└─ annotations.zip/
├── label_colors.txt # list of pairs r g b label_name
├── labels.txt # list of labels
└── default/
├── label_2/ # left color camera label files
│ ├── <image_name_1>.txt
│ ├── <image_name_2>.txt
│ └── ...
├── instance/ # instance segmentation masks
│ ├── <image_name_1>.png
│ ├── <image_name_2>.png
│ └── ...
├── semantic/ # semantic segmentation masks (labels are encoded by its id)
│ ├── <image_name_1>.png
│ ├── <image_name_2>.png
│ └── ...
└── semantic_rgb/ # semantic segmentation masks (labels are encoded by its color)
├── <image_name_1>.png
├── <image_name_2>.png
└── ...
KITTI import
You can upload KITTI annotations in two ways: rectangles for the detection task and masks for the segmentation task.
For detection tasks the uploading archive should have the following structure:
└─ annotations.zip/
├── labels.txt # optional, labels list for non-original detection labels
└── <subset_name>/
├── label_2/ # left color camera label files
│ ├── <image_name_1>.txt
│ ├── <image_name_2>.txt
│ └── ...
For segmentation tasks the uploading archive should have the following structure:
└─ annotations.zip/
├── label_colors.txt # optional, color map for non-original segmentation labels
└── <subset_name>/
├── instance/ # instance segmentation masks
│ ├── <image_name_1>.png
│ ├── <image_name_2>.png
│ └── ...
├── semantic/ # optional, semantic segmentation masks (labels are encoded by its id)
│ ├── <image_name_1>.png
│ ├── <image_name_2>.png
│ └── ...
└── semantic_rgb/ # optional, semantic segmentation masks (labels are encoded by its color)
├── <image_name_1>.png
├── <image_name_2>.png
└── ...
All annotation files and masks should have structures that are described in the original format specification.
3.22 - LFW
The Labeled Faces in the Wild (LFW) format is primarily used for face verification and face recognition tasks. The LFW format is designed to be straightforward and is compatible with a variety of machine learning and deep learning frameworks.
For more information, see:
LFW export
For export of images:
-
Supported annotations: Tags, Skeletons.
-
Attributes:
negative_pairs(should be defined for labels astext): list of image names with mismatched persons.positive_pairs(should be defined for labels astext): list of image names with matched persons.
-
Tracks: Not supported.
The downloaded file is a .zip archive with the following structure:
<archive_name>.zip/
└── images/ # if the option save images was selected
│ ├── name1/
│ │ ├── name1_0001.jpg
│ │ ├── name1_0002.jpg
│ │ ├── ...
│ ├── name2/
│ │ ├── name2_0001.jpg
│ │ ├── name2_0002.jpg
│ │ ├── ...
│ ├── ...
├── landmarks.txt
├── pairs.txt
└── people.txt
LFW import
The uploaded annotations file should be a zip file with the following structure:
<archive_name>.zip/
└── annotations/
├── landmarks.txt # list with landmark points for each image
├── pairs.txt # list of matched and mismatched pairs of person
└── people.txt # optional file with a list of persons name
Full information about the content of annotation files is available here
Example: create task with images and upload LFW annotations into it
This is one of the possible ways to create a task and add LFW annotations for it.
- On the task creation page:
- Add labels that correspond to the names of the persons.
- For each label define
textattributes with namespositive_pairsandnegative_pairs - Add images using zip archive from local repository:
images.zip/
├── name1_0001.jpg
├── name1_0002.jpg
├── ...
├── name1_<N>.jpg
├── name2_0001.jpg
├── ...
- On the annotation page: Upload annotation -> LFW 1.0 -> choose archive with structure that described in the import section.