This the multi-page printable view of this section. Click here to print.
Advanced
- 1: Projects
- 2: Search
- 3: Shape mode (advanced)
- 4: Track mode (advanced)
- 5: 3D Object annotation (advanced)
- 6: Attribute annotation mode (advanced)
- 7: Annotation with rectangle by 4 points
- 8: Annotation with points
- 9: Annotation with polylines
- 10: Annotation with polygons
- 10.1: Manual drawing
- 10.2: Drawing using automatic borders
- 10.3: Edit polygon
- 10.4: Track mode with polygons
- 10.5: Creating masks
- 11: Annotation with Tags
- 12: Annotation with cuboids
- 12.1: Creating the cuboid
- 12.2: Editing the cuboid
- 13: Models
- 14: AI Tools
- 15: OpenCV tools
- 16: Automatic annotation
- 17: Export/import a task
- 18: Downloading annotations
- 19: Task synchronization with a repository
- 20: Formats
- 21: XML annotation format
- 22: Shortcuts
- 23: Filter
- 24: Review
- 25: Context images for 2d task
- 26: Shape grouping
- 27: Analytics Monitoring
- 28: Command line interface (CLI)
- 29: Simple command line to prepare dataset manifest file
- 30: Data preparation on the fly
- 31: Serverless tutorial
1 - Projects
Create project
At CVAT, you can create a project containing tasks of the same type. All tasks related to the project will inherit a list of labels.
To create a project, go to the projects section by clicking on the Projects
item in the top menu.
On the projects page, you can see a list of projects, use a search, or create a new project by clicking Create New Project
.
You can change: the name of the project, the list of labels (which will be used for tasks created as parts of this project) and a link to the issue.
Once created, the project will appear on the projects page. To open a project, just click on it.
Here you can do the following:
- Change the project’s title.
- Open the
Actions
menu. - Change issue tracker or open issue tracker if it is specified.
- Change labels.
You can add new labels or add attributes for the existing labels in the Raw mode or the Constructor mode.
You can also change the color for different labels. By clicking
Copy
you can copy the labels to the clipboard. - Assigned to — is used to assign a project to a person. Start typing an assignee’s name and/or choose the right person out of the dropdown list.
Tasks
— is a list of all tasks for a particular project.
It is possible to choose a subset for tasks in the project. You can use the available options
(Train
, Test
, Validation
) or set your own.
You can remove the project and all related tasks through the Action menu.
Export project
It is possible to download an entire project instead of exporting individual tasks. In this case,
annotations for all tasks in a project will be available in a single archive.
To export a project, do the following on the Project
page:
- Open the
Actions
menu. - Press the
Export project dataset
button.
Additional information about exporting tasks can be found in the Downloading annotations section.
2 - Search
There are several options how to use the search.
- Search within all fields (owner, assignee, task name, task status, task mode). To execute enter a search string in search field.
- Search for specific fields. How to perform:
owner: admin
- all tasks created by the user who has the substring “admin” in his nameassignee: employee
- all tasks which are assigned to a user who has the substring “employee” in his namename: training
- all tasks with the substring “training” in their namesmode: annotation
ormode: interpolation
- all tasks with images or videos.status: annotation
orstatus: validation
orstatus: completed
- search by statusid: 5
- task with id = 5.
- Multiple filters. Filters can be combined (except for the identifier) using the keyword
AND
:mode: interpolation AND owner: admin
mode: annotation and status: annotation
The search is case insensitive.
3 - Shape mode (advanced)
Basic operations in the mode were described in section shape mode (basics).
Occluded
Occlusion is an attribute used if an object is occluded by another object or
isn’t fully visible on the frame. Use Q
shortcut to set the property
quickly.
Example: the three cars on the figure below should be labeled as occluded.
If a frame contains too many objects and it is difficult to annotate them
due to many shapes placed mostly in the same place, it makes sense
to lock them. Shapes for locked objects are transparent, and it is easy to
annotate new objects. Besides, you can’t change previously annotated objects
by accident. Shortcut: L
.
4 - Track mode (advanced)
Basic operations in the mode were described in section track mode (basics).
Shapes that were created in the track mode, have extra navigation buttons.
-
These buttons help to jump to the previous/next keyframe.
-
The button helps to jump to the initial frame and to the last keyframe.
You can use the Split
function to split one track into two tracks:
5 - 3D Object annotation (advanced)
As well as 2D-task objects, 3D-task objects support the ability to change appearance, attributes, properties and have an action menu. Read more in objects sidebar section.
Moving an object
If you hover the cursor over a cuboid and press Shift+N
, the cuboid will be cut,
so you can paste it in other place (double-click to paste the cuboid).
Copying
As well as in 2D task you can copy and paste objects by Ctrl+C
and Ctrl+V
,
but unlike 2D tasks you have to place a copied object in a 3D space (double click to paste).
Image of the projection window
You can copy or save the projection-window image by left-clicking on it and selecting a “save image as” or “copy image”.
6 - Attribute annotation mode (advanced)
Basic operations in the mode were described in section attribute annotation mode (basics).
It is possible to handle lots of objects on the same frame in the mode.
It is more convenient to annotate objects of the same type. In this case you can apply
the appropriate filter. For example, the following filter will
hide all objects except person: label=="Person"
.
To navigate between objects (person in this case),
use the following buttons switch between objects in the frame
on the special panel:
or shortcuts:
Tab
— go to the next objectShift+Tab
— go to the previous object.
In order to change the zoom level, go to settings (press F3
)
in the workspace tab and set the value Attribute annotation mode (AAM) zoom margin in px.
7 - Annotation with rectangle by 4 points
It is an efficient method of bounding box annotation, proposed here. Before starting, you need to make sure that the drawing method by 4 points is selected.
Press Shape
or Track
for entering drawing mode. Click on four extreme points:
the top, bottom, left- and right-most physical points on the object.
Drawing will be automatically completed right after clicking the fourth point.
Press Esc
to cancel editing.
8 - Annotation with points
8.1 - Points in shape mode
It is used for face, landmarks annotation etc.
Before you start you need to select the Points
. If necessary you can set a fixed number of points
in the Number of points
field, then drawing will be stopped automatically.
Click Shape
to entering the drawing mode. Now you can start annotation of the necessary area.
Points are automatically grouped — all points will be considered linked between each start and finish.
Press N
again or click the Done
button on the top panel to finish marking the area.
You can delete a point by clicking with pressed Ctrl
or right-clicking on a point and selecting Delete point
.
Clicking with pressed Shift
will open the points shape editor.
There you can add new points into an existing shape. You can zoom in/out (when scrolling the mouse wheel)
and move (when clicking the mouse wheel and moving the mouse) while drawing. You can drag an object after
it has been drawn and change the position of individual points after finishing an object.
8.2 - Linear interpolation with one point
You can use linear interpolation for points to annotate a moving object:
-
Before you start, select the
Points
. -
Linear interpolation works only with one point, so you need to set
Number of points
to 1. -
After that select the
Track
. -
Click
Track
to enter the drawing mode left-click to create a point and after that shape will be automatically completed. -
Move forward a few frames and move the point to the desired position, this way you will create a keyframe and intermediate frames will be drawn automatically. You can work with this object as with an interpolated track: you can hide it using the
Outside
, move around keyframes, etc. -
This way you’ll get linear interpolation using the
Points
.
9 - Annotation with polylines
It is used for road markup annotation etc.
Before starting, you need to select the Polyline
. You can set a fixed number of points
in the Number of points
field, then drawing will be stopped automatically.
Click Shape
to enter drawing mode. There are two ways to draw a polyline —
you either create points by clicking or by dragging a mouse on the screen while holding Shift
.
When Shift
isn’t pressed, you can zoom in/out (when scrolling the mouse wheel)
and move (when clicking the mouse wheel and moving the mouse), you can delete
previous points by right-clicking on it.
Press N
again or click the Done
button on the top panel to complete the shape.
You can delete a point by clicking on it with pressed Ctrl
or right-clicking on a point
and selecting Delete point
. Click with pressed Shift
will open a polyline editor.
There you can create new points(by clicking or dragging) or delete part of a polygon closing
the red line on another point. Press Esc
to cancel editing.
10 - Annotation with polygons
10.1 - Manual drawing
It is used for semantic / instance segmentation.
Before starting, you need to select Polygon
on the controls sidebar and choose the correct Label.
- Click
Shape
to enter drawing mode. There are two ways to draw a polygon: either create points by clicking or by dragging the mouse on the screen while holdingShift
.
Clicking points | Holding Shift+Dragging |
---|---|
- When
Shift
isn’t pressed, you can zoom in/out (when scrolling the mouse wheel) and move (when clicking the mouse wheel and moving the mouse), you can also delete the previous point by right-clicking on it. - You can use the
Selected opacity
slider in theObjects sidebar
to change the opacity of the polygon. You can read more in the Objects sidebar section. - Press
N
again or click theDone
button on the top panel for completing the shape. - After creating the polygon, you can move the points or delete them by right-clicking and selecting
Delete point
or clicking with pressedAlt
key in the context menu.
10.2 - Drawing using automatic borders
You can use auto borders when drawing a polygon. Using automatic borders allows you to automatically trace the outline of polygons existing in the annotation.
-
To do this, go to settings -> workspace tab and enable
Automatic Bordering
or pressCtrl
while drawing a polygon. -
Start drawing / editing a polygon.
-
Points of other shapes will be highlighted, which means that the polygon can be attached to them.
-
Define the part of the polygon path that you want to repeat.
-
Click on the first point of the contour part.
-
Then click on any point located on part of the path. The selected point will be highlighted in purple.
-
Click on the last point and the outline to this point will be built automatically.
Besides, you can set a fixed number of points in the Number of points
field, then
drawing will be stopped automatically. To enable dragging you should right-click
inside the polygon and choose Switch pinned property
.
Below you can see results with opacity and black stroke:
If you need to annotate small objects, increase Image Quality
to
95
in Create task
dialog for your convenience.
10.3 - Edit polygon
To edit a polygon you have to click on it while holding Shift
, it will open the polygon editor.
-
In the editor you can create new points or delete part of a polygon by closing the line on another point.
-
When
Intelligent polygon cropping
option is activated in the settings, СVAT considers two criteria to decide which part of a polygon should be cut off during automatic editing.- The first criteria is a number of cut points.
- The second criteria is a length of a cut curve.
If both criteria recommend to cut the same part, algorithm works automatically, and if not, a user has to make the decision. If you want to choose manually which part of a polygon should be cut off, disable
Intelligent polygon cropping
in the settings. In this case after closing the polygon, you can select the part of the polygon you want to leave. -
You can press
Esc
to cancel editing.
10.4 - Track mode with polygons
Polygons in the track mode allow you to mark moving objects more accurately other than using a rectangle (Tracking mode (basic); Tracking mode (advanced)).
-
To create a polygon in the track mode, click the
Track
button. -
Create a polygon the same way as in the case of Annotation with polygons. Press
N
or click theDone
button on the top panel to complete the polygon. -
Pay attention to the fact that the created polygon has a starting point and a direction, these elements are important for annotation of the following frames.
-
After going a few frames forward press
Shift+N
, the old polygon will disappear and you can create a new polygon. The new starting point should match the starting point of the previously created polygon (in this example, the top of the left mirror). The direction must also match (in this example, clockwise). After creating the polygon, pressN
and the intermediate frames will be interpolated automatically. -
If you need to change the starting point, right-click on the desired point and select
Set starting point
. To change the direction, right-click on the desired point and select switch orientation.
There is no need to redraw the polygon every time using Shift+N
,
instead you can simply move the points or edit a part of the polygon by pressing Shift+Click
.
10.5 - Creating masks
Cutting holes in polygons
Currently, CVAT does not support cutting transparent holes in polygons. However, it is poissble to generate holes in exported instance and class masks. To do this, one needs to define a background class in the task and draw holes with it as additional shapes above the shapes needed to have holes:
The editor window:
Remember to use z-axis ordering for shapes by [-] and [+, =] keys.
Exported masks:
Notice that it is currently impossible to have a single instance number for internal shapes (they will be merged into the largest one and then covered by “holes”).
Creating masks
There are several formats in CVAT that can be used to export masks:
Segmentation Mask
(PASCAL VOC masks)CamVid
MOTS
ICDAR
COCO
(RLE-encoded instance masks, guide)TFRecord
(over Datumaro, guide):Datumaro
An example of exported masks (in the Segmentation Mask
format):
Important notices:
- Both boxes and polygons are converted into masks
- Grouped objects are considered as a single instance and exported as a single mask (label and attributes are taken from the largest object in the group)
Class colors
All the labels have associated colors, which are used in the generated masks. These colors can be changed in the task label properties:
Label colors are also displayed in the annotation window on the right panel, where you can show or hide specific labels (only the presented labels are displayed):
A background class can be:
- A default class, which is implicitly-added, of black color (RGB 0, 0, 0)
background
class with any color (has a priority, name is case-insensitive)- Any class of black color (RGB 0, 0, 0)
To change background color in generated masks (default is black),
change background
class color to the desired one.
11 - Annotation with Tags
It is used to annotate frames, tags are not displayed in the workspace.
Before you start, open the drop-down list in the top panel and select Tag annotation
.
The objects sidebar will be replaced with a special panel for working with tags.
Here you can select a label for a tag and add it by clicking on the Add tag
button.
You can also customize hotkeys for each label.
If you need to use only one label for one frame, then enable the Automatically go to the next frame
checkbox, then after you add the tag the frame will automatically switch to the next.
12 - Annotation with cuboids
It is used to annotate 3 dimensional objects such as cars, boxes, etc… Currently the feature supports one point perspective and has the constraint where the vertical edges are exactly parallel to the sides.
12.1 - Creating the cuboid
Before you start, you have to make sure that Cuboid is selected and choose a drawing method ”from rectangle” or “by 4 points”.
Drawing cuboid by 4 points
Choose a drawing method “by 4 points” and click Shape to enter the drawing mode. There are many ways to draw a cuboid. You can draw the cuboid by placing 4 points, after that the drawing will be completed automatically. The first 3 points determine the plane of the cuboid while the last point determines the depth of that plane. For the first 3 points, it is recommended to only draw the 2 closest side faces, as well as the top and bottom face.
A few examples:
Drawing cuboid from rectangle
Choose a drawing method “from rectangle” and click Shape to enter the drawing mode. When you draw using the rectangle method, you must select the frontal plane of the object using the bounding box. The depth and perspective of the resulting cuboid can be edited.
Example:
12.2 - Editing the cuboid
The cuboid can be edited in multiple ways: by dragging points, by dragging certain faces or by dragging planes. First notice that there is a face that is painted with gray lines only, let us call it the front face.
You can move the cuboid by simply dragging the shape behind the front face. The cuboid can be extended by dragging on the point in the middle of the edges. The cuboid can also be extended up and down by dragging the point at the vertices.
To draw with perspective effects it should be assumed that the front face is the closest to the camera.
To begin simply drag the points on the vertices that are not on the gray/front face while holding Shift
.
The cuboid can then be edited as usual.
If you wish to reset perspective effects, you may right click on the cuboid,
and select Reset perspective
to return to a regular cuboid.
The location of the gray face can be swapped with the adjacent visible side face.
You can do it by right clicking on the cuboid and selecting Switch perspective orientation
.
Note that this will also reset the perspective effects.
Certain faces of the cuboid can also be edited, these faces are: the left, right and dorsal faces, relative to the gray face. Simply drag the faces to move them independently from the rest of the cuboid.
You can also use cuboids in track mode, similar to rectangles in track mode (basics and advanced) or Track mode with polygons
13 - Models
To deploy the models, you will need to install the necessary components using Semi-automatic and Automatic Annotation guide. To learn how to deploy the model, read Serverless tutorial.
The Models page contains a list of deep learning (DL) models deployed for semi-automatic and automatic annotation. To open the Models page, click the Models button on the navigation bar. The list of models is presented in the form of a table. The parameters indicated for each model are the following:
Framework
the model is based on- model
Name
- model
Type
:detector
- used for automatic annotation (available in detectors and automatic annotation)interactor
- used for semi-automatic shape annotation (available in interactors)tracker
- used for semi-automatic track annotation (available in trackers)reid
- used to combine individual objects into a track (available in automatic annotation)
Description
- brief description of the modelLabels
- list of the supported labels (only for the models of thedetectors
type)
14 - AI Tools
The tool is designed for semi-automatic and automatic annotation using DL models. The tool is available only if there is a corresponding model. For more details about DL models read the Models section.
Interactors
Interactors are used to create a polygon semi-automatically. Supported DL models are not bound to the label and can be used for any objects. To create a polygon usually you need to use regular or positive points. For some kinds of segmentation negative points are available. Positive points are the points related to the object. Negative points should be placed outside the boundary of the object. In most cases specifying positive points alone is enough to build a polygon. A list of available out-of-the-box interactors is placed below.
-
Before you start, select the
magic wand
on the controls sidebar and go to theInteractors
tab. Then select a label for the polygon and a required DL model. To view help about each of the models, you can click theQuestion mark
icon. -
Click
Interact
to enter the interaction mode. Depending on the selected model, the method of markup will also differ. Now you can place positive and/or negative points. The IOG model also uses a rectangle. Left click creates a positive point and right click creates a negative point. After placing the required number of points (the number is different depending on the model), the request will be sent to the server and when the process is complete a polygon will be created. If you are not satisfied with the result, you can set additional points or remove points. To delete a point, hover over the point you want to delete, if the point can be deleted, it will enlarge and the cursor will turn into a cross, then left-click on the point. If you want to postpone the request and create a few more points, hold downCtrl
and continue (theBlock
button on the top panel will turn blue), the request will be sent after the key is released. -
In the process of drawing, you can select the number of points in the polygon using the switch.
-
You can use the
Selected opacity
slider in theObjects sidebar
to change the opacity of the polygon. You can read more in the Objects sidebar section. -
To finish interaction, click on the
Done
button on the top panel or pressN
on your keyboard. -
When the object is finished, you can edit it like a polygon. You can read about editing polygons in the Annotation with polygons section.
Deep extreme cut (DEXTR)
This is an optimized version of the original model, introduced at the end of 2017. It uses the information about extreme points of an object to get its mask. The mask then converted to a polygon. For now this is the fastest interactor on CPU.
Feature backpropagating refinement scheme (f-BRS)
The model allows to get a mask for an object using positive points (should be left-clicked on the foreground), and negative points (should be right-clicked on the background, if necessary). It is recommended to run the model on GPU, if possible.
High Resolution Net (HRNet)
The model allows to get a mask for an object using positive points (should be left-clicked on the foreground), and negative points (should be right-clicked on the background, if necessary). It is recommended to run the model on GPU, if possible.
Inside-Outside-Guidance
The model uses a bounding box and inside/outside points to create a mask. First of all, you need to create a bounding box, wrapping the object. Then you need to use positive and negative points to say the model where is a foreground, and where is a background. Negative points are optional.
Detectors
Detectors are used to automatically annotate one frame. Supported DL models are suitable only for certain labels.
-
Before you start, click the
magic wand
on the controls sidebar and select theDetectors
tab. You need to match the labels of the DL model (left column) with the labels in your task (right column). Then clickAnnotate
. -
This action will automatically annotates one frame. In the Automatic annotation section you can read how to make automatic annotation of all frames.
Mask RCNN
The model generates polygons for each instance of an object in the image.
Faster RCNN
The model generates bounding boxes for each instance of an object in the image. In this model, RPN and Fast R-CNN are combined into a single network.
Trackers
Trackers are used to automatically annotate an object using bounding box. Supported DL models are not bound to the label and can be used for any objects.
-
Before you start, select the
magic wand
on the controls sidebar and go to theTrackers
tab. Then select aLabel
andTracker
for the object and clickTrack
. Then annotate the desired objects with the bounding box in the first frame. -
All annotated objects will be automatically tracked when you move to the next frame. For tracking, use
Next
button on the top panel or theF
button to move on to the next frame. -
You can enable/disable tracking using
tracker switcher
on sidebar. -
Trackable objects have indication on canvas with a model indication.
-
You can monitoring the process by the messages appearing at the top. If you change one or more objects, before moving to the next frame, you will see a message that the objects states initialization is taking place. The objects that you do not change are already on the server and therefore do not require initialization. After the objects are initialized, tracking will occur.
SiamMask
Fast online Object Tracking and Segmentation. Tracker is able to track different objects in one server request.
Trackable object will be tracked automatically if the previous frame was
a latest keyframe for the object. Have tracker indication on canvas. SiamMask
tracker supported CUDA.
15 - OpenCV tools
The tool based on Open CV Computer Vision library which is an open-source product that includes many CV algorithms. Some of these algorithms can be used to simplify the annotation process.
First step to work with OpenCV is to load it into CVAT. Click on the toolbar icon, then click Load OpenCV
.
Once it is loaded, the tool’s functionality will be available.
Intelligent scissors
Intelligent scissors is an CV method of creating a polygon by placing points with automatic drawing of a line between them. The distance between the adjacent points is limited by the threshold of action, displayed as a red square which is tied to the cursor.
-
First, select the label and then click on the
intelligent scissors
button. -
Create the first point on the boundary of the allocated object. You will see a line repeating the outline of the object.
-
Place the second point, so that the previous point is within the restrictive threshold. After that a line repeating the object boundary will be automatically created between the points.
To increase or lower the action threshold, hold
Ctrl
and scroll the mouse wheel. Increasing action threshold will affect the performance. During the drawing process you can remove the last point by clicking on it with the left mouse button. -
You can also create a boundary manually (like when creating a polygon) by temporarily disabling the automatic line creation. To do that, switch blocking on by pressing
Ctrl
. -
In the process of drawing, you can select the number of points in the polygon using the switch.
-
You can use the
Selected opacity
slider in theObjects sidebar
to change the opacity of the polygon. You can read more in the Objects sidebar section. -
Once all the points are placed, you can complete the creation of the object by clicking on the
Done
button on the top panel or pressN
on your keyboard. As a result, a polygon will be created (read more about the polygons in the annotation with polygons).
Histogram Equalization
Histogram equalization is an CV method that improves contrast in an image in order to stretch out the intensity range. This method usually increases the global contrast of images when its usable data is represented by close contrast values. It is useful in images with backgrounds and foregrounds that are both bright or both dark.
-
First, select the image tab and then click on
histogram equalization
button. -
Then contrast of current frame will be improved. If you change frame, it will be equalized too. You can disable equalization by clicking
histogram equalization
button again.
16 - Automatic annotation
Automatic Annotation is used for creating preliminary annotations.
To use Automatic Annotation you need a DL model. You can use primary models or models uploaded by a user.
You can find the list of available models in the Models
section.
-
To launch automatic annotation, you should open the dashboard and find a task which you want to annotate. Then click the
Actions
button and choose optionAutomatic Annotation
from the dropdown menu. -
In the dialog window select a model you need. DL models are created for specific labels, e.g. the Crossroad model was taught using footage from cameras located above the highway and it is best to use this model for the tasks with similar camera angles. If it’s necessary select the
Clean old annotations
checkbox. Adjust the labels so that the task labels will correspond to the labels of the DL model. For example, let’s consider a task where you have to annotate labels “car” and “person”. You should connect the “person” label from the model to the “person” label in the task. As for the “car” label, you should choose the most fitting label available in the model - the “vehicle” label. The task requires to annotate cars only and choosing the “vehicle” label implies annotation of all vehicles, in this case using auto annotation will help you complete the task faster. ClickSubmit
to begin the automatic annotation process. -
At runtime - you can see the percentage of completion. You can cancel the automatic annotation by clicking on the
Cancel
button. -
The end result of an automatic annotation is an annotation with separate rectangles (or other shapes)
-
You can combine separate bounding boxes into tracks using the
Person reidentification
model. To do this, click on the automatic annotation item in the action menu again and select the model of theReID
type (in this case thePerson reidentification
model). You can set the following parameters:- Model
Threshold
is a maximum cosine distance between objects’ embeddings. Maximum distance
defines a maximum radius that an object can diverge between adjacent frames.
- Model
-
You can remove false positives and edit tracks using
Split
andMerge
functions.
17 - Export/import a task
In CVAT you can export and import tasks. This can be used to backup the task on your PC or to transfer the task to another server.
Export task
To export a task, open the action menu and select Export Task
.
As a result, you’ll get a zip archive containing data, task specification and annotations with the following structure:
.
├── data
│ ├── {user uploaded data}
│ ├── manifest.jsonl
├── task.json
└── annotations.json
Export task API:
- endpoint:
/api/v1/tasks/{id}?action=export
- method:
GET
- responses: 202, 201 with zip archive payload
Import task
To import a task from an archive, go to the tasks page, click the Import Task
button and select the archive you need.
As a result, you’ll get a task containing data, parameters, and annotations of the previously exported task.
Import task API:
- endpoint:
/api/v1/tasks?action=import
- method:
POST
- Content-Type:
multipart/form-data
- responses: 202, 201 with json payload
18 - Downloading annotations
-
To download the latest annotations, you have to save all changes first. Сlick the
Save
button. There is aCtrl+S
shortcut to save annotations quickly. -
After that, сlick the
Menu
button. -
Press the
Export task dataset
button. -
Choose the format for exporting the dataset. Exporting is available in several formats:
- CVAT for video choose if the task is created in interpolation mode.
- CVAT for images choose if a task is created in annotation mode.
- PASCAL VOC
- (VOC) Segmentation mask — archive contains class and instance masks for each frame in the png format and a text file with the value of each color.
- YOLO
- COCO
- TFRecord
- MOT
- LabelMe 3.0
- Datumaro
- ImageNet
- CamVid
- WIDER Face
- VGGFace2
- Market-1501
- ICDAR13/15
For 3D tasks, the following formats are available:
- Kitti Raw Format 1.0
- Sly Point Cloud Format 1.0 - Supervisely Point Cloud dataset
-
To download images with the dataset tick the
Save images
box -
(Optional) To name the resulting archive, use the
Custom name
field.
19 - Task synchronization with a repository
-
At the end of the annotation process, a task is synchronized by clicking
Synchronize
on the task page. Notice: this feature works only if a git repository was specified when the task was created. -
After synchronization the button
Sync
is highlighted in green. The annotation is now in the repository in a temporary branch. -
The next step is to go to the repository and manually create a pull request to the main branch.
-
After confirming the PR, when the annotation is saved in the main branch, the color of the task changes to blue.
20 - Formats
CVAT supported the following formats:
20.1 -
CVAT
This is the native CVAT annotation format. It supports all CVAT annotations features, so it can be used to make data backups.
-
supported annotations CVAT for Images: Rectangles, Polygons, Polylines, Points, Cuboids, Tags, Tracks
-
supported annotations CVAT for Videos: Rectangles, Polygons, Polylines, Points, Cuboids, Tracks
-
attributes are supported
CVAT for images export
Downloaded file: a ZIP file of the following structure:
- tracks are split by frames
CVAT for videos export
Downloaded file: a ZIP file of the following structure:
- shapes are exported as single-frame tracks
CVAT loader
Uploaded file: an XML file or a ZIP file of the structures above
20.2 -
Datumaro format
Datumaro is a tool, which can help with complex dataset and annotation transformations, format conversions, dataset statistics, merging, custom formats etc. It is used as a provider of dataset support in CVAT, so basically, everything possible in CVAT is possible in Datumaro too, but Datumaro can offer dataset operations.
- supported annotations: any 2D shapes, labels
- supported attributes: any
Import annotations in Datumaro format
Uploaded file: a zip archive of the following structure:
JSON annotations files in the annotations
directory should have similar structure:
Export annotations in Datumaro format
Downloaded file: a zip archive of the following structure:
20.3 -
LabelMe
LabelMe export
Downloaded file: a zip archive of the following structure:
- supported annotations: Rectangles, Polygons (with attributes)
LabelMe import
Uploaded file: a zip archive of the following structure:
- supported annotations: Rectangles, Polygons, Masks (as polygons)
20.4 -
MOT sequence
MOT export
Downloaded file: a zip archive of the following structure:
- supported annotations: Rectangle shapes and tracks
- supported attributes:
visibility
(number),ignored
(checkbox)
MOT import
Uploaded file: a zip archive of the structure above or:
- supported annotations: Rectangle tracks
20.5 -
MOTS PNG
MOTS PNG export
Downloaded file: a zip archive of the following structure:
- supported annotations: Rectangle and Polygon tracks
MOTS PNG import
Uploaded file: a zip archive of the structure above
- supported annotations: Polygon tracks
20.6 -
MS COCO Object Detection
COCO export
Downloaded file: a zip archive with the structure described here
- supported annotations: Polygons, Rectangles
- supported attributes:
is_crowd
(checkbox or integer with values 0 and 1) - specifies that the instance (an object group) should have an RLE-encoded mask in thesegmentation
field. All the grouped shapes are merged into a single mask, the largest one defines all the object propertiesscore
(number) - the annotationscore
field- arbitrary attributes - will be stored in the
attributes
annotation section
Support for COCO tasks via Datumaro is described here For example, support for COCO keypoints over Datumaro:
- Install Datumaro
pip install datumaro
- Export the task in the
Datumaro
format, unzip - Export the Datumaro project in
coco
/coco_person_keypoints
formatsdatum export -f coco -p path/to/project [-- --save-images]
This way, one can export CVAT points as single keypoints or
keypoint lists (without the visibility
COCO flag).
COCO import
Uploaded file: a single unpacked *.json
or a zip archive with the structure described
here
(without images).
- supported annotations: Polygons, Rectangles (if the
segmentation
field is empty)
How to create a task from MS COCO dataset
-
Download the MS COCO dataset.
For example
val images
andinstances
annotations -
Create a CVAT task with the following labels:
-
Select
val2017.zip
as data (See Creating an annotation task guide for details) -
Unpack
annotations_trainval2017.zip
-
click
Upload annotation
button, chooseCOCO 1.1
and selectinstances_val2017.json
annotation file. It can take some time.
20.7 -
Pascal VOC
-
supported annotations:
- Rectangles (detection and layout tasks)
- Tags (action- and classification tasks)
- Polygons (segmentation task)
-
supported attributes:
occluded
(both UI option and a separate attribute)truncated
anddifficult
(should be defined for labels ascheckbox
-es)- action attributes (import only, should be defined as
checkbox
-es) - arbitrary attributes (in the
attributes
section of XML files)
Pascal VOC export
Downloaded file: a zip archive of the following structure:
Pascal VOC import
Uploaded file: a zip archive of the structure declared above or the following:
It must be possible for CVAT to match the frame name and file name
from annotation .xml
file (the filename
tag, e. g.
<filename>2008_004457.jpg</filename>
).
There are 2 options:
-
full match between frame name and file name from annotation
.xml
(in cases when task was created from images or image archive). -
match by frame number. File name should be
<number>.jpg
orframe_000000.jpg
. It should be used when task was created from video.
Segmentation mask export
Downloaded file: a zip archive of the following structure:
Mask is a png
image with 1 or 3 channels where each pixel
has own color which corresponds to a label.
Colors are generated following to Pascal VOC algorithm.
(0, 0, 0)
is used for background by default.
- supported shapes: Rectangles, Polygons
Segmentation mask import
Uploaded file: a zip archive of the following structure:
It is also possible to import grayscale (1-channel) PNG masks. For grayscale masks provide a list of labels with the number of lines equal to the maximum color index on images. The lines must be in the right order so that line index is equal to the color index. Lines can have arbitrary, but different, colors. If there are gaps in the used color indices in the annotations, they must be filled with arbitrary dummy labels. Example:
q:0,128,0:: # color index 0
aeroplane:10,10,128:: # color index 1
_dummy2:2,2,2:: # filler for color index 2
_dummy3:3,3,3:: # filler for color index 3
boat:108,0,100:: # color index 3
...
_dummy198:198,198,198:: # filler for color index 198
_dummy199:199,199,199:: # filler for color index 199
...
the last label:12,28,0:: # color index 200
- supported shapes: Polygons
How to create a task from Pascal VOC dataset
-
Download the Pascal Voc dataset (Can be downloaded from the PASCAL VOC website)
-
Create a CVAT task with the following labels:
You can add
~checkbox=difficult:false ~checkbox=truncated:false
attributes for each label if you want to use them.Select interesting image files (See Creating an annotation task guide for details)
-
zip the corresponding annotation files
-
click
Upload annotation
button, choosePascal VOC ZIP 1.1
and select the zip file with annotations from previous step. It may take some time.
20.8 -
YOLO
- Format specification
- supported annotations: Rectangles
YOLO export
Downloaded file: a zip archive with following structure:
Each annotation *.txt
file has a name that corresponds to the name of
the image file (e. g. frame_000001.txt
is the annotation
for the frame_000001.jpg
image).
The *.txt
file structure: each line describes label and bounding box
in the following format label_id cx cy w h
.
obj.names
contains the ordered list of label names.
YOLO import
Uploaded file: a zip archive of the same structure as above It must be possible to match the CVAT frame (image name) and annotation file name. There are 2 options:
-
full match between image name and name of annotation
*.txt
file (in cases when a task was created from images or archive of images). -
match by frame number (if CVAT cannot match by name). File name should be in the following format
<number>.jpg
. It should be used when task was created from a video.
How to create a task from YOLO formatted dataset (from VOC for example)
-
Follow the official guide(see Training YOLO on VOC section) and prepare the YOLO formatted annotation files.
-
Zip train images
-
Create a CVAT task with the following labels:
Select images. zip as data. Most likely you should use
share
functionality because size of images. zip is more than 500Mb. See Creating an annotation task guide for details. -
Create
obj.names
with the following content: -
Zip all label files together (we need to add only label files that correspond to the train subset)
-
Click
Upload annotation
button, chooseYOLO 1.1
and select the zipfile with labels from the previous step.
20.9 -
TFRecord
TFRecord is a very flexible format, but we try to correspond the format that used in TF object detection with minimal modifications.
Used feature description:
TFRecord export
Downloaded file: a zip archive with following structure:
- supported annotations: Rectangles, Polygons (as masks, manually over Datumaro)
How to export masks:
- Export annotations in
Datumaro
format - Apply
polygons_to_masks
andboxes_to_masks
transforms
- Export in the
TF Detection API
format
TFRecord import
Uploaded file: a zip archive of following structure:
- supported annotations: Rectangles
How to create a task from TFRecord dataset (from VOC2007 for example)
- Create
label_map.pbtxt
file with the following content:
to convert VOC2007 dataset to TFRecord format. As example:
-
Zip train images
-
Create a CVAT task with the following labels:
Select images. zip as data. See Creating an annotation task guide for details.
-
Zip
pascal.tfrecord
andlabel_map.pbtxt
files together -
Click
Upload annotation
button, chooseTFRecord 1.0
and select the zip filewith labels from the previous step. It may take some time.
20.10 -
ImageNet
ImageNet export
Downloaded file: a zip archive of the following structure:
- supported annotations: Labels
ImageNet import
Uploaded file: a zip archive of the structure above
- supported annotations: Labels
20.11 -
WIDER Face
WIDER Face export
Downloaded file: a zip archive of the following structure:
- supported annotations: Rectangles (with attributes), Labels
- supported attributes:
blur
,expression
,illumination
,pose
,invalid
occluded
(both the annotation property & an attribute)
WIDER Face import
Uploaded file: a zip archive of the structure above
- supported annotations: Rectangles (with attributes), Labels
- supported attributes:
blur
,expression
,illumination
,occluded
,pose
,invalid
20.12 -
CamVid
CamVid export
Downloaded file: a zip archive of the following structure:
Mask is a png
image with 1 or 3 channels where each pixel
has own color which corresponds to a label.
(0, 0, 0)
is used for background by default.
- supported annotations: Rectangles, Polygons
CamVid import
Uploaded file: a zip archive of the structure above
- supported annotations: Polygons
20.13 -
VGGFace2
VGGFace2 export
Downloaded file: a zip archive of the following structure:
- supported annotations: Rectangles, Points (landmarks - groups of 5 points)
VGGFace2 import
Uploaded file: a zip archive of the structure above
- supported annotations: Rectangles, Points (landmarks - groups of 5 points)
20.14 -
Market-1501
Market-1501 export
Downloaded file: a zip archive of the following structure:
- supported annotations: Label
market-1501
with attributes (query
,person_id
,camera_id
)
Market-1501 import
Uploaded file: a zip archive of the structure above
- supported annotations: Label
market-1501
with attributes (query
,person_id
,camera_id
)
20.15 -
ICDAR13/15
ICDAR13/15 export
Downloaded file: a zip archive of the following structure:
Word recognition task:
- supported annotations: Label
icdar
with attributecaption
Text localization task:
- supported annotations: Rectangles and Polygons with label
icdar
and attributetext
Text segmentation task:
- supported annotations: Rectangles and Polygons with label
icdar
and attributesindex
,text
,color
,center
ICDAR13/15 import
Uploaded file: a zip archive of the structure above
Word recognition task:
- supported annotations: Label
icdar
with attributecaption
Text localization task:
- supported annotations: Rectangles and Polygons with label
icdar
and attributetext
Text segmentation task:
- supported annotations: Rectangles and Polygons with label
icdar
and attributesindex
,text
,color
,center
21 - XML annotation format
When you want to download annotations from Computer Vision Annotation Tool (CVAT) you can choose one of several data formats. The document describes XML annotation format. Each format has X.Y version (e.g. 1.0). In general the major version (X) is incremented when the data format has incompatible changes and the minor version (Y) is incremented when the data format is slightly modified (e.g. it has one or several extra fields inside meta information). The document will describe all changes for all versions of XML annotation format.
Version 1.1
There are two different formats for images and video tasks at the moment.
The both formats have a common part which is described below. From the previous version flipped
tag was added.
Also original_size
tag was added for interpolation mode to specify frame size.
In annotation mode each image tag has width
and height
attributes for the same purpose.
Annotation
Below you can find description of the data format for images tasks.
On each image it is possible to have many different objects. Each object can have multiple attributes.
If an annotation task is created with z_order
flag then each object will have z_order
attribute which is used
to draw objects properly when they are intersected (if z_order
is bigger the object is closer to camera).
In previous versions of the format only box
shape was available.
In later releases polygon
, polyline
, and points
were added. Please see below for more details:
Example:
Interpolation
Below you can find description of the data format for video tasks. The annotation contains tracks. Each track corresponds to an object which can be presented on multiple frames. The same object cannot be presented on the same frame in multiple locations. Each location of the object can have multiple attributes even if an attribute is immutable for the object it will be cloned for each location (a known redundancy).
Example:
Version 1
There are two different formats for images and video tasks at the moment. Both formats has a common part which is described below:
Annotation
Below you can find description of the data format for images tasks.
On each image it is possible to have many different objects. Each object can have multiple attributes.
Example:
Interpolation
Below you can find description of the data format for video tasks. The annotation contains tracks. Each track corresponds to an object which can be presented on multiple frames. The same object cannot be presented on the same frame in multiple locations. Each location of the object can have multiple attributes even if an attribute is immutable for the object it will be cloned for each location (a known redundancy).
Example:
22 - Shortcuts
Many UI elements have shortcut hints. Put your pointer to a required element to see it.
Shortcut | Common |
---|---|
Main functions | |
F1 |
Open/hide the list of available shortcuts |
F2 |
Go to the settings page or go back |
Ctrl+S |
Go to the settings page or go back |
Ctrl+Z |
Cancel the latest action related with objects |
Ctrl+Shift+Z or Ctrl+Y |
Cancel undo action |
Hold Mouse Wheel |
To move an image frame (for example, while drawing) |
Player | |
F |
Go to the next frame |
D |
Go to the previous frame |
V |
Go forward with a step |
C |
Go backward with a step |
Right |
Search the next frame that satisfies to the filters or next frame which contain any objects |
Left |
Search the previous frame that satisfies to the filters or previous frame which contain any objects |
Space |
Start/stop automatic changing frames |
` or ~ |
Focus on the element to change the current frame |
Modes | |
N |
Repeat the latest procedure of drawing with the same parameters |
M |
Activate or deactivate mode to merging shapes |
Alt+M |
Activate or deactivate mode to splitting shapes |
G |
Activate or deactivate mode to grouping shapes |
Shift+G |
Reset group for selected shapes (in group mode) |
Esc |
Cancel any active canvas mode |
Image operations | |
Ctrl+R |
Change image angle (add 90 degrees) |
Ctrl+Shift+R |
Change image angle (subtract 90 degrees) |
Shift+B+= |
Increase brightness level for the image |
Shift+B+- |
Decrease brightness level for the image |
Shift+C+= |
Increase contrast level for the image |
Shift+C+- |
Decrease contrast level for the image |
Shift+S+= |
Increase saturation level for the image |
Shift+S+- |
Increase contrast level for the image |
Shift+G+= |
Make the grid more visible |
Shift+G+- |
Make the grid less visible |
Shift+G+Enter |
Set another color for the image grid |
Operations with objects | |
Ctrl |
Switch automatic bordering for polygons and polylines during drawing/editing |
Hold Ctrl |
When the shape is active and fix it |
Alt+Click on point |
Deleting a point (used when hovering over a point of polygon, polyline, points) |
Shift+Click on point |
Editing a shape (used when hovering over a point of polygon, polyline or points) |
Right-Click on shape |
Display of an object element from objects sidebar |
T+L |
Change locked state for all objects in the sidebar |
L |
Change locked state for an active object |
T+H |
Change hidden state for objects in the sidebar |
H |
Change hidden state for an active object |
Q or / |
Change occluded property for an active object |
Del or Shift+Del |
Delete an active object. Use shift to force delete of locked objects |
- or _ |
Put an active object “farther” from the user (decrease z axis value) |
+ or = |
Put an active object “closer” to the user (increase z axis value) |
Ctrl+C |
Copy shape to CVAT internal clipboard |
Ctrl+V |
Paste a shape from internal CVAT clipboard |
Hold Ctrl while pasting |
When pasting shape from the buffer for multiple pasting. |
Ctrl+B |
Make a copy of the object on the following frames |
Ctrl+(0..9) |
Changes a label for an activated object or for the next drawn object if no objects are activated |
Operations are available only for track | |
K |
Change keyframe property for an active track |
O |
Change outside property for an active track |
R |
Go to the next keyframe of an active track |
E |
Go to the previous keyframe of an active track |
Attribute annotation mode | |
Up Arrow |
Go to the next attribute (up) |
Down Arrow |
Go to the next attribute (down) |
Tab |
Go to the next annotated object in current frame |
Shift+Tab |
Go to the previous annotated object in current frame |
<number> |
Assign a corresponding value to the current attribute |
Standard 3d mode | |
Shift+arrrowup |
Increases camera roll angle |
Shift+arrrowdown |
Decreases camera roll angle |
Shift+arrrowleft |
Decreases camera pitch angle |
Shift+arrrowright |
Increases camera pitch angle |
Alt+O |
Move the camera up |
Alt+U |
Move the camera down |
Alt+J |
Move the camera left |
Alt+L |
Move the camera right |
Alt+I |
Performs zoom in |
Alt+K |
Performs zoom out |
23 - Filter
There are some reasons to use the feature:
- When you use a filter, objects that don’t match the filter will be hidden.
- The fast navigation between frames which have an object of interest.
Use the
Left Arrow
/Right Arrow
keys for this purpose or customize the UI buttons by right-clicking and selectswitching by filter
. If there are no objects which correspond to the filter, you will go to the previous / next frame which contains any annotated objects.
To apply filters you need to click on the button on the top panel.
It will open a window for filter input. Here you will find two buttons: Add rule
and Add group
.
Rules
The “Add rule” button adds a rule for objects display. A rule may use the following properties:
Supported properties:
Properties | Supported values | Description |
---|---|---|
Label |
all the label names that are in the task | label name |
Type |
shape, track or tag | type of object |
Shape |
all shape types | type of shape |
Occluded |
true or false | occluded (read more) |
Width |
number of px or field | shape width |
Height |
number of px or field | shape height |
ServerID |
number or field | ID of the object on the server (You can find out by forming a link to the object through the Action menu) |
ObjectID |
number or field | ID of the object in your client (indicated on the objects sidebar) |
Attributes |
some other fields including attributes with a similar type or a specific attribute value |
any fields specified by a label |
Supported operators for properties:
==
- Equally; !=
- Not equal; >
- More; >=
- More or equal; <
- Less; <=
- Less or equal;
Any in
; Not in
- these operators allow you to set multiple values in one rule;
Is empty
; is not empty
– these operators don’t require to input a value.
Between
; Not between
– these operators allow you to choose a range between two values.
Some properties support two types of values that you can choose:
You can add multiple rules, to do so click the add rule button and set another rule.
Once you’ve set a new rule, you’ll be able to choose which operator they will be connected by: And
or Or
.
All subsequent rules will be joined by the chosen operator.
Click Submit
to apply the filter or if you want multiple rules to be connected by different operators, use groups.
Groups
To add a group, click the “add group” button. Inside the group you can create rules or groups.
If there is more than one rule in the group, they can be connected by And
or Or
operators.
The rule group will work as well as a separate rule outside the group and will be joined by an
operator outside the group.
You can create groups within other groups, to do so you need to click the add group button within the group.
You can move rules and groups. To move the rule or group, drag it by the button.
To remove the rule or group, click on the Delete
button.
If you activate the Not
button, objects that don’t match the group will be filtered out.
Click Submit
to apply the filter.
The “Cancel” button undoes the filter. The Clear filter
button removes the filter.
Once applied filter automatically appears in Recent used
list. Maximum length of the list is 10.
24 - Review
A special mode to check the annotation allows you to point to an object or area in the frame containing an error.
Review mode
is not available in 3D tasks.
To go into review mode, you need to select Request a review
in the menu and assign the user to run a check.
After that, the job status will be changed to validation
and the reviewer will be able to open the task in review mode.
Review mode is a UI mode, there is a special “issue” tool which you can use to identify objects
or areas in the frame and describe the problem.
-
To do this, first click
open an issue
icon on the controls sidebar: -
Then click on an object in the frame to highlight the object or highlight the area by holding the left mouse button and describe the problem. The object or area will be shaded in red.
-
The created issue will appear in the workspace and in the
issues
tab on the objects sidebar. -
After you save the annotation, other users will be able to see the problem, comment on each issue and change the status of the problem to
resolved
. -
You can use the arrows on the issues tab to navigate the frames that contain problems.
-
Once all the problems are marked, save the annotation, open the menu and select “submit the review”. After that you’ll see a form containing the verification statistics, here you can give an assessment of the job and choose further actions:
- Accept - changes the status of the job to
completed
. - Review next – passes the job to another user for re-review.
- Reject - changes the status of the job to
annotation
.
- Accept - changes the status of the job to
25 - Context images for 2d task
When you create a task, you can provide the images with additional contextual images. To do this, create a folder related_images and place a folder with a contextual image in it (make sure the folder has the same name as the image to which it should be tied). An example of the structure:
- root_directory
- image_1_to_be_annotated.jpg
- image_2_to_be_annotated.jpg
- related_images/
- image_1_to_be_annotated_jpg/
- context_image_for_image_1.jpg
- image_2_to_be_annotated_jpg/
- context_image_for_image_2.jpg
- image_1_to_be_annotated_jpg/
- subdirectory_example/
- image_3_to_be_annotated.jpg
- related_images/
- image_3_to_be_annotated_jpg/
- context_image_for_image_3.jpg
- image_3_to_be_annotated_jpg/
The contextual image is displayed in the upper right corner of the workspace. You can hide it by clicking on the corresponding button or maximize the image by clicking on it.
When the image is maximized, you can rotate it clockwise/counterclockwise and zoom in/out.
You can also move the image by moving the mouse while holding down the LMB
and zoom in/out by scrolling the mouse wheel.
To close the image, just click the X
.
26 - Shape grouping
This feature allows us to group several shapes.
You may use the Group Shapes
button or shortcuts:
G
— start selection / end selection in group modeEsc
— close group modeShift+G
— reset group for selected shapes
You may select shapes clicking on them or selecting an area.
Grouped shapes will have group_id
filed in dumped annotation.
Also you may switch color distribution from an instance (default) to a group.
You have to switch Color By Group
checkbox for that.
Shapes that don’t have group_id
, will be highlighted in white.
27 - Analytics Monitoring
If your CVAT instance was created with analytics support, you can press the Analytics
button in the dashboard
and analytics and journals will be opened in a new tab.
The analytics allows you to see how much time every user spends on each task and how much work they did over any time range.
It also has an activity graph which can be modified with a number of users shown and a timeframe.
28 - Command line interface (CLI)
Description A simple command line interface for working with CVAT tasks. At the moment it implements a basic feature set but may serve as the starting point for a more comprehensive CVAT administration tool in the future.
Overview of functionality:
- Create a new task (supports name, bug tracker, project, labels JSON, local/share/remote files)
- Delete tasks (supports deleting a list of task IDs)
- List all tasks (supports basic CSV or JSON output)
- Download JPEG frames (supports a list of frame IDs)
- Dump annotations (supports all formats via format string)
- Upload annotations for a task in the specified format (e.g. ‘YOLO ZIP 1.0’)
- Export and download a whole task
- Import a task
Usage
Examples
- Create a task
cli.py create "new task" --labels labels.json local file1.jpg file2.jpg
- Delete some tasks
cli.py delete 100 101 102
- List all tasks
cli.py ls
- Dump annotations
cli.py dump --format "CVAT for images 1.1" 103 output.xml
29 - Simple command line to prepare dataset manifest file
Steps before use
When used separately from Computer Vision Annotation Tool(CVAT), the required dependencies must be installed
Ubuntu:20.04
Install dependencies:
Create an environment and install the necessary python modules:
Using
Alternative way to use with openvino/cvat_server
Examples of using
Create a dataset manifest in the current directory with video which contains enough keyframes:
Create a dataset manifest with video which does not contain enough keyframes:
Create a dataset manifest with images:
Create a dataset manifest with pattern (may be used *
, ?
, []
):
Create a dataset manifest with openvino/cvat_server
:
Examples of generated manifest.jsonl
files
A maifest file contains some intuitive information and some specific like:
pts
- time at which the frame should be shown to the user
checksum
- md5
hash sum for the specific image/frame
For a video
For a dataset with images
30 - Data preparation on the fly
Description
Data on the fly processing is a way of working with data, the main idea of which is as follows: when creating a task, the minimum necessary meta information is collected. This meta information allows in the future to create necessary chunks when receiving a request from a client.
Generated chunks are stored in a cache of the limited size with a policy of evicting less popular items.
When a request is received from a client, the required chunk is searched for in the cache. If the chunk does not exist yet, it is created using prepared meta information and then put into the cache.
This method of working with data allows:
- reduce the task creation time.
- store data in a cache of the limited size with a policy of evicting less popular items.
Unfortunately, this method will not work for all videos with a valid manifest file. If there are not enough keyframes in the video for smooth video decoding, the task will be created in another way. Namely, all chunks will be prepared during task creation, which may take some time.
Uploading a manifest with data
When creating a task, you can upload a manifest.jsonl
file along with the video or dataset with images.
You can see how to prepare it here.
31 - Serverless tutorial
Introduction
Computers have now become our partners. They help us to solve routine problems, fix mistakes, find information, etc. It is a natural idea to use their compute power to annotate datasets. There are multiple DL models for classification, object detection, semantic segmentation which can do data annotation for us. And it is relatively simple to integrate your own ML/DL solution into CVAT.
But the world is not perfect and we don’t have a silver bullet which can solve all our problems. Usually, available DL models are trained on public datasets which cannot cover all specific cases. Very often you want to detect objects which cannot be recognized by these models. Our annotation requirements can be so strict that automatically annotated objects cannot be accepted as is, and it is easier to annotate them from scratch. You always need to keep in mind all these mentioned limitations. Even if you have a DL solution which can perfectly annotate 50% of your data, it means that manual work will only be reduced in half.
When we know that DL models can help us to annotate data faster, the next question is how to use them? In CVAT all such DL models are implemented as serverless functions for the Nuclio serverless platform. And there are multiple implemented functions which can be found in the serverless directory such as Mask RCNN, Faster RCNN, SiamMask, Inside Outside Guidance, Deep Extreme Cut, etc. Follow the installation guide to build and deploy these serverless functions. See the user guide to understand how to use these functions in the UI to automatically annotate data.
What is a serverless function and why is it used for automatic annotation in CVAT? Let’s assume that you have a DL model and want to use it for AI-assisted annotation. The naive approach is to implement a Python script which uses the DL model to prepare a file with annotations in a public format like MS COCO or Pascal VOC. After that you can upload the annotation file into CVAT. It works but it is not user-friendly. How to make CVAT run the script for you?
You can pack the script with your DL model into a container which provides a standard interface for interacting with it. One way to do that is to use the function as a service approach. Your script becomes a function inside cloud infrastructure which can be called over HTTP. The Nuclio serverless platform helps us to implement and manage such functions.
CVAT supports Nuclio out of the box if it is built properly. See the installation guide for instructions. Thus if you deploy a serverless function, the CVAT server can see it and call it with appropriate arguments. Of course there are some tricks how to create serverless functions for CVAT and we will discuss them in next sections of the tutorial.
Using builtin DL models in practice
In the tutorial it is assumed that you already have the cloned
CVAT GitHub repo.
To build CVAT with serverless support you need to run docker-compose
command
with specific configuration files. In the case it is docker-compose.serverless.yml
.
It has necessary instructions how to build and deploy Nuclio platform as a
docker container and enable corresponding support in CVAT.
Next step is to deploy builtin serverless functions using Nuclio command
line tool (aka nuctl
). It is assumed that you followed
the installation guide and nuctl
is already installed on your operating system. Run the following
command to check that it works. In the beginning you should not have
any deployed serverless functions.
Let’s see on examples how to use DL models for annotation in different computer vision tasks.
Tracking using SiamMask
In this use case a user needs to annotate all individual objects on a video as tracks. Basically for every object we need to know its location on every frame.
First step is to deploy SiamMask. The deployment process
can depend on your operating system. On Linux you can use serverless/deploy_cpu.sh
auxiliary script, but below we are using nuctl
directly.
Let’s see how it works in the UI. Go to the models tab and check that you can see SiamMask in the list. If you cannot, it means that there are some problems. Go to one of our public channels and ask for help.
After that, go to the new task page and
create a task with this video file. You can choose any task name,
any labels, and even another video file if you like. In this case, the Remote sources
option was used to specify the video file. Press submit
button at the end to
finish the process.
Open the task and use AI tools to start tracking an object. Draw a bounding box around an object, and sequentially switch through the frame and correct the restrictive box if necessary.
Finally you will get bounding boxes.
SiamMask
model is more optimized to work on Nvidia GPUs.
For more information about deploying the model for the GPU, read on.
Object detection using YOLO-v3
First of all let’s deploy the DL model. The deployment process is similar for
all serverless functions. Need to run nuctl deploy
command with appropriate
arguments. To simplify the process, you can use serverless/deploy_cpu.sh
command. Inference of the serverless function is optimized for CPU using
Intel OpenVINO framework.
Again, go to models tab and check that you can
see YOLO v3
in the list. If you cannot by a reason it means that there are some
problems. Go to one of our public channels and ask for help.
Let us reuse the task which you created for testing SiamMask
serverless function
above. Choose the magic wand
tool, go to the Detectors
tab, and select
YOLO v3
model. Press Annotate
button and after a couple of seconds you
should see detection results. Do not forget to save annotations.
Also it is possible to run a detector for the whole annotation task. Thus CVAT will run the serverless function on every frame of the task and submit results directly into database. For more details please read the guide.
Objects segmentation using Mask-RCNN
If you have a detector, which returns polygons, you can segment objects. One
of such detectors is Mask-RCNN
. There are several implementations of the
detector available out of the box:
serverless/openvino/omz/public/mask_rcnn_inception_resnet_v2_atrous_coco
is optimized using Intel OpenVINO framework and works well if it is run on an Intel CPU.serverless/tensorflow/matterport/mask_rcnn/
is optimized for GPU.
The deployment process for a serverless function optimized for GPU is similar.
Just need to run serverless/deploy_gpu.sh
script. It runs mostly the same
commands but utilize function-gpu.yaml
configuration file instead of
function.yaml
internally. See next sections if you want to understand the
difference.
Note: Please do not run several GPU functions at the same time. In many cases it will not work out of the box. For now you should manually schedule different functions on different GPUs and it requires source code modification. Nuclio autoscaler does not support the local platform (docker).
Now you should be able to annotate objects using segmentation masks.
Adding your own DL models
Choose a DL model
For the tutorial I will choose a popular AI library with a lot of models inside. In your case it can be your own model. If it is based on detectron2 it will be easy to integrate. Just follow the tutorial.
Detectron2 is Facebook AI Research’s next generation library that provides state-of-the-art detection and segmentation algorithms. It is the successor of Detectron and maskrcnn-benchmark. It supports a number of computer vision research projects and production applications in Facebook.
Clone the repository somewhere. I assume that all other experiments will be
run from the cloned detectron2
directory.
Run local experiments
Let’s run a detection model locally. First of all need to install requirements for the library.
In my case I have Ubuntu 20.04 with python 3.8.5. I installed PyTorch 1.8.1 for Linux with pip, python, and CPU inside a virtual environment. Follow opencv-python installation guide to get the library for demo and visualization.
Install the detectron2 library from your local clone (you should be inside detectron2 directory).
After the library from Facebook AI Research is installed, we can run a couple of experiments. See the official tutorial for more examples. I decided to experiment with RetinaNet. First step is to download model weights.
To run experiments let’s download an image with cats from wikipedia.
Finally let’s run the DL model inference on CPU. If all is fine, you will see a window with cats and bounding boxes around them with scores.
Next step is to minimize demo/demo.py
script and keep code which is necessary to load,
run, and interpret output of the model only. Let’s hard code parameters and remove
argparse. Keep only code which is responsible for working with an image. There is
no common advice how to minimize some code.
Finally you should get something like the code below which has fixed config, read a predefined image, initialize predictor, and run inference. As the final step it prints all detected bounding boxes with scores and labels.
DL model as a serverless function
When we know how to run the DL model locally, we can prepare a serverless function which can be used by CVAT to annotate data. Let’s see how function.yaml will look like…
Let’s look at faster_rcnn_inception_v2_coco serverless
function configuration as an example and try adapting it to our case.
First of all let’s invent an unique name for the new function:
pth.facebookresearch.detectron2.retinanet_r101
. Section annotations
describes our function for CVAT serverless subsystem:
annotations.name
is a display nameannotations.type
is a type of the serverless function. It can have several different values. Basically it affects input and output of the function. In our case it hasdetector
type and it means that the integrated DL model can generate shapes with labels for an image.annotations.framework
is used for information only and can have arbitrary value. Usually it has values like OpenVINO, PyTorch, TensorFlow, etc.annotations.spec
describes the list of labels which the model supports. In the case the DL model was trained on MS COCO dataset and the list of labels correspond to the dataset.spec.description
is used to provide basic information for the model.
All other parameters are described in Nuclio documentation.
spec.handler
is the entry point to your function.spec.runtime
is the name of the language runtime.spec.eventTimeout
is the global event timeout
Next step is to describe how to build our serverless function:
spec.build.image
is the name of your docker imagespec.build.baseImage
is the name of a base container image from which to build the functionspec.build.directives
are commands to build your docker image
In our case we start from Ubuntu 20.04 base image, install curl
to download
weights for our model, git
to clone detectron2 project from GitHub, and
python
together with pip
. Repeat installation steps which we used to setup
the DL model locally with minor modifications.
For Nuclio platform we have to specify a couple of more parameters:
spec.triggers.myHttpTrigger
describes HTTP trigger to handle incoming HTTP requests.spec.platform
describes some important parameters to run your functions likerestartPolicy
andmountMode
. Read Nuclio documentation for more details.
Full code can be found here: detectron2/retinanet/nuclio/function.yaml
Next step is to adapt our source code which we implemented to run the DL model
locally to requirements of Nuclio platform. First step is to load the model
into memory using init_context(context)
function. Read more about the function
in Best Practices and Common Pitfalls.
After that we need to accept incoming HTTP requests, run inference,
reply with detection results. For the process our entry point is resposible
which we specified in our function specification handler(context, event)
.
Again in accordance to function specification the entry point should be
located inside main.py
.
Full code can be found here: detectron2/retinanet/nuclio/main.py
Deploy RetinaNet serverless function
To use the new serverless function you have to deploy it using nuctl
command.
The actual deployment process is described in
automatic annotation guide.
Advanced capabilities
Optimize using GPU
To optimize a function for a specific device (e.g. GPU), basically you just need to modify instructions above to run the function on the target device. In most cases it will be necessary to modify installation instructions only.
For RetinaNet R101
which was added above modifications will look like:
Note: GPU has very limited amount of memory and it doesn’t allow to run multiple serverless functions in parallel for now using free open-source Nuclio version on the local platform because scaling to zero feature is absent. Theoretically it is possible to run different functions on different GPUs, but it requires to change source code on corresponding serverless functions to choose a free GPU.
Debugging a serverless function
Let’s say you have a problem with your serverless function and want to debug it.
Of course you can use context.logger.info
or similar methods to print the
intermediate state of your function.
Another way is to debug using Visual Studio Code.
Please see instructions below to setup your environment step by step.
Let’s modify our function.yaml to include debugpy
package and specify that maxWorkers
count is 1. Otherwise both workers will
try to use the same port and it will lead to an exception in python code.
Change main.py
to listen to a port (e.g. 5678). Insert code below
in the beginning of your file with entry point.
After these changes deploy the serverless function once again. For
serverless/pytorch/facebookresearch/detectron2/retinanet/nuclio/
you should
run the command below:
To debug python code inside a container you have to publish the port (in this tutorial it is 5678). Nuclio deploy command doesn’t support that and we have to workaround it using SSH port forwarding.
- Install SSH server on your host machine using
sudo apt install openssh-server
- In
/etc/ssh/sshd_config
host file setGatewayPorts yes
- Restart ssh service to apply changes using
sudo systemctl restart ssh.service
Next step is to install ssh client inside the container and run port forwarding.
In the snippet below instead of user
and ipaddress
provide username and
IP address of your host (usually IP address starts from 192.168.
). You will
need to confirm that you want to connect to your host computer and enter your
password. Keep the terminal open after that.
See how the latest command looks like in my case:
Finally, add the configuration below into your launch.json. Open Visual Studio Code and
run Serverless Debug
configuration, set a breakpoint in main.py
and try to call the
serverless function from CVAT UI. The breakpoint should be triggered in Visual Studio
Code and it should be possible to inspect variables and debug code.
Note: In case of changes in the source code, need to re-deploy the function and initiate port forwarding again.
Troubleshooting
First of all need to check that you are using the recommended version of
Nuclio framework. In my case it is 1.5.16
but you need to check the
installation manual.
Check that Nuclio dashboard is running and its version corresponds to nuctl
.
Be sure that the model, which doesn’t work, is healthy. In my case Inside Outside Guidance is not running.
Let’s run it. Go to the root of CVAT repository and run the deploying command.
In this case the container was built some time ago and the port 49154 was
assigned by Nuclio. Now the port is used by openvino-dextr
as we can
see in logs. To prove our hypothesis just need to run a couple of docker
commands:
To solve the problem let’s just remove the previous container for the function.
In this case it is eb0c1ee46630
. After that the deploying command works as
expected.
When you investigate an issue with a serverless function, it is extremely
useful to look at logs. Just run a couple of commands like
docker logs <container>
.
If before model deployment you see that the NODE PORT
is 0, you need to assign it manually.
Add the port: 32001
attribute to the function.yaml
file of each model, before you deploy the model.
Different ports should be prescribed for different models.
Installation serverless functions on Windows 10 with using the Ubuntu subsystem
If you encounter a problem running serverless functions on Windows 10, you can use the Ubuntu subsystem, for this do the following:
-
Install
WSL 2
andDocker Desktop
as described in installation manual -
Install Ubuntu 18.04 from Microsoft store.
-
Enable integration for Ubuntu-18.04 in the settings of
Docker Desktop
in theResources
WSL integration
tab: -
Then you can download and install
nuctl
on Ubuntu, using the automatic annotation guide. -
Install
git
and clone repository on Ubuntu, as described in the installation manual. -
After that, run the commands from this tutorial through Ubuntu.