This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Advanced

This section contains advanced documents for CVAT users

1: Projects page
2: Organization
3: Search
4: Shape mode (advanced)
5: Track mode (advanced)
6: 3D Object annotation (advanced)
7: Attribute annotation mode (advanced)
8: Annotation with rectangles
9: Annotation with polygons

9.1: Manual drawing
9.2: Drawing using automatic borders
9.3: Edit polygon
9.4: Track mode with polygons
9.5: Creating masks

10: Annotation with polylines
11: Annotation with points

11.1: Points in shape mode
11.2: Linear interpolation with one point

12: Annotation with ellipses
13: Annotation with cuboids

13.1: Creating the cuboid
13.2: Editing the cuboid

14: Annotation with skeletons

14.1: Creating the skeleton
14.2: Editing the skeleton

15: Annotation with brush tool
16: Annotation with tags
17: Models
18: Annotation quality & Honeypot
19: OpenCV and AI Tools
20: Automatic annotation
21: Specification for annotators
22: Backup Task and Project
23: Frame deleting
24: Export/import datasets and upload annotation
25: Formats

25.1:
25.2:
25.3:
25.4:
25.5:
25.6:
25.7:
25.8:
25.9:
25.10:
25.11:
25.12:
25.13:
25.14:
25.15:
25.16:
25.17:
25.18:
25.19:

26: Task synchronization with a repository
27: XML annotation format
28: Shortcuts
29: Filter
30: Review
31: Contextual images
32: Shape grouping
33: Dataset Manifest
34: Data preparation on the fly
35: Serverless tutorial

1 - Projects page

Creating and exporting projects in CVAT.

Projects page

On this page you can create a new project, create a project from a backup, and also see the created projects.

In the upper left corner there is a search bar, using which you can find the project by project name, assignee etc. In the upper right corner there are sorting, quick filters and filter.

Filter

Applying filter disables the quick filter.

The filter works similarly to the filters for annotation, you can create rules from properties, operators and values and group rules into groups. For more details, see the filter section. Learn more about date and time selection.

For clear all filters press Clear filters.

Supported properties for projects list

Properties	Supported values	Description
`Assignee`	username	Assignee is the user who is working on the project, task or job. (is specified on task page)
`Owner`	username	The user who owns the project, task, or job
`Last updated`	last modified date and time (or value range)	The date can be entered in the `dd.MM.yyyy HH:mm` format or by selecting the date in the window that appears when you click on the input field
`ID`	number or range of job ID
`Name`	name	On the tasks page - name of the task, on the project page - name of the project

Create a project

At CVAT, you can create a project containing tasks of the same type. All tasks related to the project will inherit a list of labels.

To create a project, go to the projects section by clicking on the Projects item in the top menu. On the projects page, you can see a list of projects, use a search, or create a new project by clicking on the + button and select Create New Project.

Note that the project will be created in the organization that you selected at the time of creation. Read more about organizations.

You can change: the name of the project, the list of labels (which will be used for tasks created as parts of this project) and a skeleton if it’s necessary. In advanced configuration also you can specify: a link to the issue, source and target storages. Learn more about creating a label list, creating the skeleton and attach cloud storage.

To save and open project click on Submit & Open button. Also you can click on Submit & Continue button for creating several projects in sequence

Once created, the project will appear on the projects page. To open a project, just click on it.

Here you can do the following:

Change the project’s title.
Open the Actions menu. Each button is responsible for a specific function in the Actions menu:
- Export dataset/Import dataset - download/upload annotations or annotations and images in a specific format. More information is available in the export/import datasets section.
- Backup project - make a backup of the project read more in the backup section.
- Delete - remove the project and all related tasks.
Change issue tracker or open issue tracker if it is specified.
Change labels and skeleton. You can add new labels or add attributes for the existing labels in the Raw mode or the Constructor mode. You can also change the color for different labels. By clicking Setup skeleton you can create a skeleton for this project.
Assigned to — is used to assign a project to a person. Start typing an assignee’s name and/or choose the right person out of the dropdown list.
Tasks — is a list of all tasks for a particular project, with the ability to search, sort and filter for tasks in the project. Read more about search. Read more about sorting and filter It is possible to choose a subset for tasks in the project. You can use the available options (Train, Test, Validation) or set your own.

2 - Organization

Using organization in CVAT.

Organization is a feature for teams of several users who work together on projects and share tasks.

Create an Organization, invite your team members, and assign roles to make the team work better on shared tasks.

See:

Personal workspace
Create new organization
- Switching between organizations
Organization page
- Invite members into organization
- Delete organization

Personal workspace

The account’s default state is activated when no Organization is selected.

If you do not select an Organization, the system links all new resources directly to your personal account, that inhibits resource sharing with others.

When Personal workspace is selected, it will be marked with a tick in the menu.

Create new organization

To create an organization, do the following:

Log in to the CVAT.
On the top menu, click your Username > Organization > + Create.
Fill in the following fields and click Submit.

Field	Description
Short name	A name of the organization that will be displayed in the CVAT menu.
Full Name	Optional. Full name of the organization.
Description	Optional. Description of organization.
Email	Optional. Your email.
Phone number	Optional. Your phone number.
Location	Optional. Organization address.

The created organization will be available at you Username > Organization

Switching between organizations

If you have more than one Organization, it is possible to switch between these Organizations at any given time.

Follow these steps:

In the top menu, select your Username > Organization.
From the drop-down menu, under the Personal space section, choose the desired Organization.

Note, that if you’ve created more than 10 organizations, a Switch organization line will appear in the drop-down menu.

Click on it to see the Select organization dialog, and select organization from drop-down list.

Organization page

Organization page is a place, where you can edit the Organization information and manage Organization members.

Note that in order to access the organization page, you must first activate the organization (see Switching between organizations). Without activation, the organization page will remain inaccessible.
An organization is considered activated when it’s ticked in the drop-down menu and its name is visible in the top-right corner under the username.

To go to the Organization page, do the following:

On the top menu, click your Username > Organization.
In the drop-down menu, select Organization.
In the drop-down menu, click Settings.

Invite members into organization

To add members to Organization do the following:

Go to the Organization page, and click Invite members.
Fill in the form (see below).
Click Ok.

The Invite Members form has the following fields:

Field	Description
Email	Specifies the email address of the user who is being added to the Organization. Note, that the user you’re inviting must already have a CVAT account (on the same instance) registered to the email address you’re sending the invitation to.
Role drop-down list	Defines the role of the user which sets the level of access within the Organization: Worker: Has access only to the tasks, projects, and jobs assigned to them. Supervisor: Can create and assign jobs, tasks, and projects to the Organization members. Maintainer: Has the same capabilities as the Supervisor, but with additional visibility over all tasks and projects created by other members, complete access to Cloud Storages, and the ability to modify members and their roles. Owner: role assigned to the creator of the organization by default. Has maximum capabilities and cannot be changed or assigned to the other user.
Invite more	Button to add another user to the Organization.

Members of Organization will appear on the Organization page.

The member of the organization can leave the organization by going to Organization page > Leave organization.

The organization owner can remove members, by clicking on the Bin icon.

Delete organization

You can remove an organization that you created.

Note: Removing an organization will delete all related resources (annotations, jobs, tasks, projects, cloud storage, and so on).

To remove an organization, do the following:

Go to the Organization page.
In the top-right corner click Actions > Remove organization.
Enter the short name of the organization in the dialog field.
Click Remove.

3 - Search

Overview of available search options.

There are several options how to use the search.

Search within all fields (owner, assignee, task name, task status, task mode). To execute enter a search string in search field.
Search for specific fields. How to perform:
- owner: admin - all tasks created by the user who has the substring “admin” in his name
- assignee: employee - all tasks which are assigned to a user who has the substring “employee” in his name
- name: training - all tasks with the substring “training” in their names
- mode: annotation or mode: interpolation - all tasks with images or videos.
- status: annotation or status: validation or status: completed - search by status
- id: 5 - task with id = 5.
Multiple filters. Filters can be combined (except for the identifier) using the keyword AND:
- mode: interpolation AND owner: admin
- mode: annotation and status: annotation

The search is case insensitive.

4 - Shape mode (advanced)

Advanced operations available during annotation in shape mode.

Basic operations in the mode were described in section shape mode (basics).

Occluded Occlusion is an attribute used if an object is occluded by another object or isn’t fully visible on the frame. Use Q shortcut to set the property quickly.

Example: the three cars on the figure below should be labeled as occluded.

If a frame contains too many objects and it is difficult to annotate them due to many shapes placed mostly in the same place, it makes sense to lock them. Shapes for locked objects are transparent, and it is easy to annotate new objects. Besides, you can’t change previously annotated objects by accident. Shortcut: L.

5 - Track mode (advanced)

Advanced operations available during annotation in track mode.

Basic operations in the mode were described in section track mode (basics).

Shapes that were created in the track mode, have extra navigation buttons.

These buttons help to jump to the previous/next keyframe.
The button helps to jump to the initial frame and to the last keyframe.

You can use the Split function to split one track into two tracks:

6 - 3D Object annotation (advanced)

Overview of advanced operations available when annotating 3D objects.

As well as 2D-task objects, 3D-task objects support the ability to change appearance, attributes, properties and have an action menu. Read more in objects sidebar section.

Moving an object

If you hover the cursor over a cuboid and press Shift+N, the cuboid will be cut, so you can paste it in other place (double-click to paste the cuboid).

Copying

As well as in 2D task you can copy and paste objects by Ctrl+C and Ctrl+V, but unlike 2D tasks you have to place a copied object in a 3D space (double click to paste).

Image of the projection window

You can copy or save the projection-window image by left-clicking on it and selecting a “save image as” or “copy image”.

7 - Attribute annotation mode (advanced)

Advanced operations available in attribute annotation mode.

Basic operations in the mode were described in section attribute annotation mode (basics).

It is possible to handle lots of objects on the same frame in the mode.

It is more convenient to annotate objects of the same type. In this case you can apply the appropriate filter. For example, the following filter will hide all objects except person: label=="Person".

To navigate between objects (person in this case), use the following buttons switch between objects in the frame on the special panel:

or shortcuts:

Tab — go to the next object
Shift+Tab — go to the previous object.

In order to change the zoom level, go to settings (press F3) in the workspace tab and set the value Attribute annotation mode (AAM) zoom margin in px.

8 - Annotation with rectangles

To learn more about annotation using a rectangle, see the sections:

Rotation rectangle

To rotate the rectangle, pull on the rotation point. Rotation is done around the center of the rectangle. To rotate at a fixed angle (multiple of 15 degrees), hold shift. In the process of rotation, you can see the angle of rotation.

Annotation with rectangle by 4 points

It is an efficient method of bounding box annotation, proposed here. Before starting, you need to make sure that the drawing method by 4 points is selected.

Press Shape or Track for entering drawing mode. Click on four extreme points: the top, bottom, left- and right-most physical points on the object. Drawing will be automatically completed right after clicking the fourth point. Press Esc to cancel editing.

9 - Annotation with polygons

Guide to creating and editing polygons.

9.1 - Manual drawing

It is used for semantic / instance segmentation.

Before starting, you need to select Polygon on the controls sidebar and choose the correct Label.

Click Shape to enter drawing mode. There are two ways to draw a polygon: either create points by clicking or by dragging the mouse on the screen while holding Shift.

Clicking points	Holding Shift+Dragging

When Shift isn’t pressed, you can zoom in/out (when scrolling the mouse wheel) and move (when clicking the mouse wheel and moving the mouse), you can also delete the previous point by right-clicking on it.
You can use the Selected opacity slider in the Objects sidebar to change the opacity of the polygon. You can read more in the Objects sidebar section.
Press N again or click the Done button on the top panel for completing the shape.
After creating the polygon, you can move the points or delete them by right-clicking and selecting Delete point or clicking with pressed Alt key in the context menu.

9.2 - Drawing using automatic borders

You can use auto borders when drawing a polygon. Using automatic borders allows you to automatically trace the outline of polygons existing in the annotation.

To do this, go to settings -> workspace tab and enable Automatic Bordering or press Ctrl while drawing a polygon.
Start drawing / editing a polygon.
Points of other shapes will be highlighted, which means that the polygon can be attached to them.
Define the part of the polygon path that you want to repeat.
Click on the first point of the contour part.
Then click on any point located on part of the path. The selected point will be highlighted in purple.
Click on the last point and the outline to this point will be built automatically.

Besides, you can set a fixed number of points in the Number of points field, then drawing will be stopped automatically. To enable dragging you should right-click inside the polygon and choose Switch pinned property.

Below you can see results with opacity and black stroke:

If you need to annotate small objects, increase Image Quality to 95 in Create task dialog for your convenience.

9.3 - Edit polygon

To edit a polygon you have to click on it while holding Shift, it will open the polygon editor.

In the editor you can create new points or delete part of a polygon by closing the line on another point.
When Intelligent polygon cropping option is activated in the settings, CVAT considers two criteria to decide which part of a polygon should be cut off during automatic editing.
- The first criteria is a number of cut points.
- The second criteria is a length of a cut curve.
If both criteria recommend to cut the same part, algorithm works automatically, and if not, a user has to make the decision. If you want to choose manually which part of a polygon should be cut off, disable Intelligent polygon cropping in the settings. In this case after closing the polygon, you can select the part of the polygon you want to leave.
You can press Esc to cancel editing.

9.4 - Track mode with polygons

Polygons in the track mode allow you to mark moving objects more accurately other than using a rectangle (Tracking mode (basic); Tracking mode (advanced)).

To create a polygon in the track mode, click the Track button.
Create a polygon the same way as in the case of Annotation with polygons. Press N or click the Done button on the top panel to complete the polygon.
Pay attention to the fact that the created polygon has a starting point and a direction, these elements are important for annotation of the following frames.
After going a few frames forward press Shift+N, the old polygon will disappear and you can create a new polygon. The new starting point should match the starting point of the previously created polygon (in this example, the top of the left mirror). The direction must also match (in this example, clockwise). After creating the polygon, press N and the intermediate frames will be interpolated automatically.
If you need to change the starting point, right-click on the desired point and select Set starting point. To change the direction, right-click on the desired point and select switch orientation.

There is no need to redraw the polygon every time using Shift+N, instead you can simply move the points or edit a part of the polygon by pressing Shift+Click.

9.5 - Creating masks

Cutting holes in polygons

Currently, CVAT does not support cutting transparent holes in polygons. However, it is poissble to generate holes in exported instance and class masks. To do this, one needs to define a background class in the task and draw holes with it as additional shapes above the shapes needed to have holes:

The editor window:

The editor

Remember to use z-axis ordering for shapes by [-] and [+, =] keys.

Exported masks:

A class mask An instance mask

Notice that it is currently impossible to have a single instance number for internal shapes (they will be merged into the largest one and then covered by “holes”).

Creating masks

There are several formats in CVAT that can be used to export masks:

Segmentation Mask (PASCAL VOC masks)
CamVid
MOTS
ICDAR
COCO (RLE-encoded instance masks, guide)
TFRecord (over Datumaro, guide):
Datumaro

An example of exported masks (in the Segmentation Mask format):

A class mask An instance mask

Important notices:

Both boxes and polygons are converted into masks
Grouped objects are considered as a single instance and exported as a single mask (label and attributes are taken from the largest object in the group)

Class colors

All the labels have associated colors, which are used in the generated masks. These colors can be changed in the task label properties:

Label colors are also displayed in the annotation window on the right panel, where you can show or hide specific labels (only the presented labels are displayed):

A background class can be:

A default class, which is implicitly-added, of black color (RGB 0, 0, 0)
background class with any color (has a priority, name is case-insensitive)
Any class of black color (RGB 0, 0, 0)

To change background color in generated masks (default is black), change background class color to the desired one.

10 - Annotation with polylines

Guide to annotating tasks using polylines.

It is used for road markup annotation etc.

Before starting, you need to select the Polyline. You can set a fixed number of points in the Number of points field, then drawing will be stopped automatically.

Click Shape to enter drawing mode. There are two ways to draw a polyline — you either create points by clicking or by dragging a mouse on the screen while holding Shift. When Shift isn’t pressed, you can zoom in/out (when scrolling the mouse wheel) and move (when clicking the mouse wheel and moving the mouse), you can delete previous points by right-clicking on it. Press N again or click the Done button on the top panel to complete the shape. You can delete a point by clicking on it with pressed Ctrl or right-clicking on a point and selecting Delete point. Click with pressed Shift will open a polyline editor. There you can create new points(by clicking or dragging) or delete part of a polygon closing the red line on another point. Press Esc to cancel editing.

11 - Annotation with points

Guide to annotating tasks using single points or shapes containing multiple points.

11.1 - Points in shape mode

It is used for face, landmarks annotation etc.

Before you start you need to select the Points. If necessary you can set a fixed number of points in the Number of points field, then drawing will be stopped automatically.

Click Shape to entering the drawing mode. Now you can start annotation of the necessary area. Points are automatically grouped — all points will be considered linked between each start and finish. Press N again or click the Done button on the top panel to finish marking the area. You can delete a point by clicking with pressed Ctrl or right-clicking on a point and selecting Delete point. Clicking with pressed Shift will open the points shape editor. There you can add new points into an existing shape. You can zoom in/out (when scrolling the mouse wheel) and move (when clicking the mouse wheel and moving the mouse) while drawing. You can drag an object after it has been drawn and change the position of individual points after finishing an object.

11.2 - Linear interpolation with one point

You can use linear interpolation for points to annotate a moving object:

Before you start, select the Points.
Linear interpolation works only with one point, so you need to set Number of points to 1.
After that select the Track.
Click Track to enter the drawing mode left-click to create a point and after that shape will be automatically completed.
Move forward a few frames and move the point to the desired position, this way you will create a keyframe and intermediate frames will be drawn automatically. You can work with this object as with an interpolated track: you can hide it using the Outside, move around keyframes, etc.
This way you’ll get linear interpolation using the Points.

12 - Annotation with ellipses

Guide to annotating tasks using ellipses.

It is used for road sign annotation etc.

First of all you need to select the ellipse on the controls sidebar.

Choose a Label and click Shape or Track to start drawing. An ellipse can be created the same way as a rectangle, you need to specify two opposite points, and the ellipse will be inscribed in an imaginary rectangle. Press N or click the Done button on the top panel to complete the shape.

You can rotate ellipses using a rotation point in the same way as rectangles.

13 - Annotation with cuboids

Guide to creating and editing cuboids.

It is used to annotate 3 dimensional objects such as cars, boxes, etc… Currently the feature supports one point perspective and has the constraint where the vertical edges are exactly parallel to the sides.

13.1 - Creating the cuboid

Before you start, you have to make sure that Cuboid is selected and choose a drawing method ”from rectangle” or “by 4 points”.

Drawing cuboid by 4 points

Choose a drawing method “by 4 points” and click Shape to enter the drawing mode. There are many ways to draw a cuboid. You can draw the cuboid by placing 4 points, after that the drawing will be completed automatically. The first 3 points determine the plane of the cuboid while the last point determines the depth of that plane. For the first 3 points, it is recommended to only draw the 2 closest side faces, as well as the top and bottom face.

A few examples:

Drawing cuboid from rectangle

Choose a drawing method “from rectangle” and click Shape to enter the drawing mode. When you draw using the rectangle method, you must select the frontal plane of the object using the bounding box. The depth and perspective of the resulting cuboid can be edited.

Example:

13.2 - Editing the cuboid

The cuboid can be edited in multiple ways: by dragging points, by dragging certain faces or by dragging planes. First notice that there is a face that is painted with gray lines only, let us call it the front face.

You can move the cuboid by simply dragging the shape behind the front face. The cuboid can be extended by dragging on the point in the middle of the edges. The cuboid can also be extended up and down by dragging the point at the vertices.

To draw with perspective effects it should be assumed that the front face is the closest to the camera. To begin simply drag the points on the vertices that are not on the gray/front face while holding Shift. The cuboid can then be edited as usual.

If you wish to reset perspective effects, you may right click on the cuboid, and select Reset perspective to return to a regular cuboid.

The location of the gray face can be swapped with the adjacent visible side face. You can do it by right clicking on the cuboid and selecting Switch perspective orientation. Note that this will also reset the perspective effects.

Certain faces of the cuboid can also be edited, these faces are: the left, right and dorsal faces, relative to the gray face. Simply drag the faces to move them independently from the rest of the cuboid.

You can also use cuboids in track mode, similar to rectangles in track mode (basics and advanced) or Track mode with polygons

14 - Annotation with skeletons

Guide to creating and editing skeletons.

Skeletons should be used as annotations templates when you need to annotate complex objects sharing the same structure (e.g. human pose estimation, facial landmarks, etc.). A skeleton consist of any number of points (also called as elements), joined or not joined by edges. Any point itself is considered like an individual object with its own attributes and properties (like color, occluded, outside, etc). At the same time a skeleton point can exist only within the parent skeleton.

Any skeleton elements can be hidden (by marking them outside) if necessary (for example if a part is out of a frame). Currently there are two formats which support exporting skeletons: CVAT & COCO.

14.1 - Creating the skeleton

Initial skeleton setup

Unlike other CVAT objects, to start annotating using skeletons, first of all you need to setup a skeleton. You can do that in the label configurator during creating a task/project, or later in created instances.

So, start by clicking Setup skeleton option:

Below the regular label form where you need to add a name, and setup attributes if necessary, you will see a drawing area with some buttons aside:

PUT AN IMAGE AS A BACKGROUND - is a helpful feature you can use to draw a skeleton template easier, seeing an example - object you need to annotate in the future.
PUT NEW SKELETON POINTS - is activated by default. It is a mode where you can add new skeleton points clicking the drawing area.
DRAW AN EDGE BETWEEN TWO POINTS - in this mode you can add an edge, clicking any two points, which are not joined yet.
REMOVE A DRAWN SKELETON POINTS - in this mode clicking a point will remove the point and all attached edges. You can also remove an edge only, it will be highlighted as red on hover.
DOWNLOAD DRAWN TEMPLATE AS AN .SVG - you can download setup configuration to use it in future
UPLOAD A TEMPLATE FROM AN .SVG FILE - you can upload previously downloaded configuration

Let’s draw an exampe skeleton - star. After the skeleton is drawn, you can setup each its point. Just hover the point, do right mouse click and click Configure:

Here you can setup a point name, its color and attributes if necessary like for a regular CVAT label:

Press Done button to finish editing the point. Press Continue button to save the skeleton. Continue creating a task/project in a regular way.

For an existing task/project you are not allowed to change a skeleton configuration for now. You can copy/insert skeletons configuration using Raw tab of the label configurator.

Drawing a skeleton from rectangle

In opened job go to left sidebar and find Draw new skeleton control, hover it:

If the control is absent, be sure you have setup at least one skeleton in the corresponding task/project. In a pop-up dropdown you can select between a skeleton Shape and a skeleton Track, depends on your task. Draw a skeleton as a regular bounding box, clicking two points on a canvas:

Well done, you’ve just created the first skeleton.

14.2 - Editing the skeleton

Editing skeletons on the canvas

A drawn skeleton is wrapped by a bounding box for a user convenience. Using this wrapper the user can edit the skeleton as a regular bounding box, by dragging, resizing, or rotating:

Moreover, each the skeleton point can be dragged itself. After dragging, the wrapping bounding box is adjusted automatically, other points are not affected:

You can use Shortcuts on both a skeleton itself and its elements.

Hover the mouse cursor over the bounding box to apply a shortcut on the whole skeleton (like lock, occluded, pinned, keyframe and outside for skeleton tracks)
Hover the mouse cursor over one of skeleton points to apply a shortcut to this point (the same shortcuts list, but outside is available also for a skeleton shape elements)

Using the sidebar is another way to setup skeleton properties, and attributes. It works a similar way, like for other kinds of objects supported by CVAT, but with some changes:

A user is not allowed to switch a skeleton label
Outside property is always available for skeleton elements (it does not matter if they are tracks or not)
Additional collapse is available for a user, to see a list of skeleton parts

15 - Annotation with brush tool

Guide to annotating tasks using brush tools.

With a brush tool, you can create masks for disjoint objects, that have multiple parts, such as a house hiding behind trees, a car behind a pedestrian, or a pillar behind a traffic sign. The brush tool has several modes, for example: erase pixels, change brush shapes, and polygon-to-mask mode.

Use brush tool for Semantic (Panoptic) and Instance Image Segmentation tasks.
For more information about segmentation masks in CVAT, see Creating masks.

See:

Brush tool menu
Annotation with brush
Annotation with polygon-to-mask
Remove underlying pixels
AI Tools
Import and export

The brush tool menu appears on the top of the screen after you click Shape:

BT Menu

It has the following elements:

Element	Description
	Save mask saves the created mask. The saved mask will appear on the object sidebar
	Save mask and continue adds a new mask to the object sidebar and allows you to draw a new one immediately.
	Brush adds new mask/ new regions to the previously added mask).
	Eraser removes part of the mask.
	Polygon selection tool. Selection will become a mask.
	Remove polygon selection subtracts part of the polygon selection.
	Brush size in pixels. Note: Visible only when Brush or Eraser are selected.
	Brush shape with two options: circle and square. Note: Visible only when Brush or Eraser are selected.
	Remove underlying pixels. When you are drawing or editing a mask with this tool, pixels on other masks that are located at the same positions as the pixels of the current mask are deleted.
	Label that will be assigned to the newly created mask
	Move. Click and hold to move the menu bar to the other place on the screen

Annotation with brush

To annotate with brush, do the following:

From the controls sidebar, select Brush .
In the Draw new mask menu, select label for your mask, and click Shape.
The Brush tool will be selected by default.
With the brush, draw a mask on the object you want to label.
To erase selection, use Eraser
After you applied the mask, on the top menu bar click Save mask
to finish the process (or N on the keyboard).
Added object will appear on the objects sidebar.

To add the next object, repeat steps 1 to 5. All added objects will be visible on the image and the objects sidebar.

To save the job with all added objects, on the top menu click Save Save .

Annotation with polygon-to-mask

To annotat with polygon-to-mask, do the following:

From the controls sidebar, select Brush .
In the Draw new mask menu, select label for your mask, and click Shape.
In the brush tool menu, select Polygon .
With the Polygon tool, draw a mask for the object you want to label.
To correct selection, use Remove polygon selection .
Use Save mask (or N on the keyboard)
to switch between add/remove polygon tools:
After you added the polygon selection, on the top menu bar click Save mask
to finish the process (or N on the keyboard).
Click Save mask again (or N on the keyboard).
The added object will appear on the objects sidebar.

To add the next object, repeat steps 1 to 5.

All added objects will be visible on the image and the objects sidebar.

To save the job with all added objects, on the top menu click Save Save .

Remove underlying pixels

Use Remove underlying pixels tool when you want to add a mask and simultaneously delete the pixels of
other masks that are located at the same positions. It is a highly useful feature to avoid meticulous drawing edges twice between two different objects.

Remove pixel

AI Tools

You can convert AI tool masks to polygons. To do this, use the following AI tool menu:

Save

Go to the Detectors tab.
Switch toggle Masks to polygons to the right.
Add source and destination labels from the drop-down lists.
Click Annotate.

Import and export

For export, see Export dataset

Import follows the general import dataset procedure, with the additional option of converting masks to polygons.

Note: This option is available for formats that work with masks only.

To use it, when uploading the dataset, switch the Convert masks to polygon toggle to the right:

Remove pixel

16 - Annotation with tags

It is used to annotate frames, tags are not displayed in the workspace. Before you start, open the drop-down list in the top panel and select Tag annotation.

The objects sidebar will be replaced with a special panel for working with tags. Here you can select a label for a tag and add it by clicking on the Plus button. You can also customize hotkeys for each label.

If you need to use only one label for one frame, then enable the Automatically go to the next frame checkbox, then after you add the tag the frame will automatically switch to the next.

Tags will be shown in the top left corner of the canvas. You can show/hide them in the settings.

17 - Models

To deploy the models, you will need to install the necessary components using Semi-automatic and Automatic Annotation guide. To learn how to deploy the model, read Serverless tutorial.

The Models page contains a list of deep learning (DL) models deployed for semi-automatic and automatic annotation. To open the Models page, click the Models button on the navigation bar. The list of models is presented in the form of a table. The parameters indicated for each model are the following:

Framework the model is based on
model Name
model Type:
- detector - used for automatic annotation (available in detectors and automatic annotation)
- interactor - used for semi-automatic shape annotation (available in interactors)
- tracker - used for semi-automatic track annotation (available in trackers)
- reid - used to combine individual objects into a track (available in automatic annotation)
Description - brief description of the model
Labels - list of the supported labels (only for the models of the detectors type)

18 - Annotation quality & Honeypot

How to check the quality of annotation in CVAT

In CVAT, it’s possible to evaluate the quality of annotation through the creation of a Ground truth job, referred to as a Honeypot. To estimate the task quality, CVAT compares all other jobs in the task against the established Ground truth job, and calculates annotation quality based on this comparison.

Note that quality estimation only supports 2d tasks. It supports all the annotation types except 2d cuboids.

Note that tracks are considered separate shapes and compared on a per-frame basis with other tracks and shapes.

See:

Ground truth job
Managing Ground Truth jobs: Import, Export, and Deletion
- Import
- Export
- Delete
Assessing data quality with Ground truth jobs
Annotation quality & Honeypot video tutorial

Ground truth job

A Ground truth job is a way to tell CVAT where to store and get the “correct” annotations for task quality estimation.

To estimate task quality, you need to create a Ground truth job in the task, and annotate it. You don’t need to annotate the whole dataset twice, the annotation quality of a small part of the data shows the quality of annotation for the whole dataset.

For the quality assurance to function correctly, the Ground truth job must have a small portion of the task frames and the frames must be chosen randomly. Depending on the dataset size and task complexity, 5-15% of the data is typically good enough for quality estimation, while keeping extra annotation overhead acceptable.

For example, in a typical task with 2000 frames, selecting just 5%, which is 100 extra frames to annotate, is enough to estimate the annotation quality. If the task contains only 30 frames, it’s advisable to select 8-10 frames, which is about 30%.

It is more than 15% but in the case of smaller datasets, we need more samples to estimate quality reliably.

To create a Ground truth job, do the following:

Create a task, and open the task page.
Click +.
In the Add new job window, fill in the following fields:
- Job type: Use the default parameter Ground truth.
- Frame selection method: Use the default parameter Random.
- Quantity %: Set the desired percentage of frames for the Ground truth job.
  Note that when you use Quantity %, the Frames field will be autofilled.
- Frame count: Set the desired number of frames for the “ground truth” job.
  Note that when you use Frames, the Quantity % field will be will be autofilled.
- Seed: (Optional) If you need to make the random selection reproducible, specify this number. It can be any integer number, the same value will yield the same random selection (given that the frame number is unchanged).
  Note that if you want to use a custom frame sequence, you can do this using the server API instead, see Jobs API #create.
Click Submit.
Annotate frames, save your work.
Change the status of the job to Completed.
Change Stage to Accepted.

The Ground truth job will appear in the jobs list.

Add new job

Managing Ground Truth jobs: Import, Export, and Deletion

Annotations from Ground truth jobs are not included in the dataset export, they also cannot be imported during task annotations import or with automatic annotation for the task.

Import, export, and delete options are available from the job’s menu.

Add new job

Import

If you want to import annotations into the Ground truth job, do the following.

Open the task, and find the Ground truth job in the jobs list.
Click on three dots to open the menu.
From the menu, select Import annotations.
Select import format, and select file.
Click OK.

Note that if there are imported annotations for the frames that exist in the task, but are not included in the Ground truth job, they will be ignored. This way, you don’t need to worry about “cleaning up” your Ground truth annotations for the whole dataset before importing them. Importing annotations for the frames that are not known in the task still raises errors.

Export

To export annotations from the Ground truth job, do the following.

Open the task, and find a job in the jobs list.
Click on three dots to open the menu.
From the menu, select Export annotations.

Delete

To delete the Ground truth job, do the following.

Open the task, and find the Ground truth job in the jobs list.
Click on three dots to open the menu.
From the menu, select Delete.

Assessing data quality with Ground truth jobs

Once you’ve established the Ground truth job, proceed to annotate the dataset.

CVAT will begin the quality comparison between the annotated task and the Ground truth job in this task once it is finished (on the acceptance stage and in the completed state).

Note that the process of quality calculation may take up to several hours, depending on the amount of data and labeled objects, and is not updated immediately after task updates.

To view results go to the Task > Actions > View analytics> Performance tab.

Add new job

Quality data

The Analytics page has the following fields:

Field	Description
Mean annotation quality	Displays the average quality of annotations, which includes: the count of accurate annotations, total task annotations, ground truth annotations, accuracy rate, precision rate, and recall rate.
GT Conflicts	Conflicts identified during quality assessment, including extra or missing annotations. Mouse over the ? icon for a detailed conflict report on your dataset.
Issues	Number of opened issues. If no issues were reported, will show 0.
Quality report	Quality report in JSON format.
Ground truth job data	“Information about ground truth job, including date, time, and number of issues.
List of jobs	List of all the jobs in the task

Annotation quality settings

If you need to tweak some aspects of comparisons, you can do this from the Annotation Quality Settings menu.

You can configure what overlap should be considered low or how annotations must be compared.

The updated settings will take effect on the next quality update.

To open Annotation Quality Settings, find Quality report and on the right side of it, click on three dots.

The following window will open. Hover over the ? marks to understand what each field represents.

Add new job

Annotation quality settings have the following parameters:

Field	Description
Min overlap threshold	Min overlap threshold(IoU) is used for the distinction between matched / unmatched shapes.
Low overlap threshold	Low overlap threshold is used for the distinction between strong/weak (low overlap) matches.
OKS Sigma	IoU threshold for points. The percent of the box area, used as the radius of the circle around the GT point, where the checked point is expected to be.
Relative thickness (frame side %)	Thickness of polylines, relative to the (image area) ^ 0.5. The distance to the boundary around the GT line inside of which the checked line points should be.
Check orientation	Indicates that polylines have direction.
Min similarity gain (%)	The minimal gain in the GT IoU between the given and reversed line directions to consider the line inverted. Only useful with the Check orientation parameter.
Compare groups	Enables or disables annotation group checks.
Min group match threshold	Minimal IoU for groups to be considered matching, used when the Compare groups are enabled.
Check object visibility	Check for partially-covered annotations. Masks and polygons will be compared to each other.
Min visibility threshold	Minimal visible area percent of the spatial annotations (polygons, masks)
For reporting covered annotations, useful with the Check object visibility option.
Match only visible parts	Use only the visible part of the masks and polygons in comparisons.

GT conflicts in the CVAT interface

To see GT Conflicts in the CVAT interface, go to Review > Issues > Show ground truth annotations and conflicts.

GT conflict

The ground truth (GT) annotation is depicted as a dotted-line box with an associated label.

Upon hovering over an issue on the right-side panel with your mouse, the corresponding GT Annotation gets highlighted.

Use arrows in the Issue toolbar to move between GT conflicts.

To create an issue related to the conflict, right-click on the bounding box and from the menu select the type of issue you want to create.

GT conflict

Annotation quality & Honeypot video tutorial

This video demonstrates the process:

19 - OpenCV and AI Tools

Overview of semi-automatic and automatic annotation tools available in CVAT.

Label and annotate your data in semi-automatic and automatic mode with the help of AI and OpenCV tools.

While interpolation is good for annotation of the videos made by the security cameras, AI and OpenCV tools are good for both: videos where the camera is stable and videos, where it moves together with the object, or movements of the object are chaotic.

See:

Interactors
Detectors
Trackers
OpenCV: histogram equalization

Interactors

Interactors are a part of AI and OpenCV tools.

Use interactors to label objects in images by creating a polygon semi-automatically.

When creating a polygon, you can use positive points or negative points (for some models):

Positive points define the area in which the object is located.
Negative points define the area in which the object is not located.

AI tools: annotate with interactors

To annotate with interactors, do the following:

Click Magic wand , and go to the Interactors tab.
From the Label drop-down, select a label for the polygon.
From the Interactor drop-down, select a model (see Interactors models).
Click the Question mark to see information about each model:
(Optional) If the model returns masks, and you need to convert masks to polygons, use the Convert masks to polygons toggle.
Click Interact.
Use the left click to add positive points and the right click to add negative points.
Number of points you can add depends on the model.
On the top menu, click Done (or Shift+N, N).

AI tools: add extra points

Note: More points improve outline accuracy, but make shape editing harder. Fewer points make shape editing easier, but reduce outline accuracy.

Each model has a minimum required number of points for annotation. Once the required number of points is reached, the request is automatically sent to the server. The server processes the request and adds a polygon to the frame.

For a more accurate outline, postpone request to finish adding extra points first:

Hold down the Ctrl key.
On the top panel, the Block button will turn blue.
Add points to the image.
Release the Ctrl key, when ready.

In case you used Mask to polygon when the object is finished, you can edit it like a polygon.

You can change the number of points in the polygon with the slider:

AI tools: delete points

To delete a point, do the following:

With the cursor, hover over the point you want to delete.
If the point can be deleted, it will enlarge and the cursor will turn into a cross.
Left-click on the point.

OpenCV: intelligent scissors

To use Intelligent scissors, do the following:

On the menu toolbar, click OpenCV and wait for the library to load.
Go to the Drawing tab, select the label, and click on the Intelligent scissors button.
Add the first point on the boundary of the allocated object.
You will see a line repeating the outline of the object.
Add the second point, so that the previous point is within the restrictive threshold.
After that a line repeating the object boundary will be automatically created between the points.
To finish placing points, on the top menu click Done (or N on the keyboard).

As a result, a polygon will be created.

You can change the number of points in the polygon with the slider:

To increase or lower the action threshold, hold Ctrl and scroll the mouse wheel.

During the drawing process, you can remove the last point by clicking on it with the left mouse button.

Settings

On how to adjust the polygon, see Objects sidebar.
For more information about polygons in general, see Annotation with polygons.

Interactors models

Model	Tool	Description
Segment Anything Model (SAM)	AI Tools	The Segment Anything Model (SAM) produces high quality object masks, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 billion masks, and has strong zero-shot performance on a variety of segmentation tasks. For more information, see: GitHub: Segment Anything Site: Segment Anything Paper: Segment Anything
Deep extreme cut (DEXTR)	AI Tool	This is an optimized version of the original model, introduced at the end of 2017. It uses the information about extreme points of an object to get its mask. The mask is then converted to a polygon. For now this is the fastest interactor on the CPU. For more information, see: GitHub: DEXTR-PyTorch Site: DEXTR-PyTorch Paper: DEXTR-PyTorch
Feature backpropagating refinement scheme (f-BRS)	AI Tool	The model allows to get a mask for an object using positive points (should be left-clicked on the foreground), and negative points (should be right-clicked on the background, if necessary). It is recommended to run the model on GPU, if possible. For more information, see: GitHub: f-BRS Paper: f-BRS
High Resolution Net (HRNet)	AI Tool	The model allows to get a mask for an object using positive points (should be left-clicked on the foreground), and negative points (should be right-clicked on the background, if necessary). It is recommended to run the model on GPU, if possible. For more information, see: GitHub: HRNet Paper: HRNet
Inside-Outside-Guidance (IOG)	AI Tool	The model uses a bounding box and inside/outside points to create a mask. First of all, you need to create a bounding box, wrapping the object. Then you need to use positive and negative points to say the model where is a foreground, and where is a background. Negative points are optional. For more information, see: GitHub: IOG Paper: IOG
Intelligent scissors	OpenCV	Intelligent scissors is a CV method of creating a polygon by placing points with the automatic drawing of a line between them. The distance between the adjacent points is limited by the threshold of action, displayed as a red square that is tied to the cursor. For more information, see: Site: Intelligent Scissors Specification

Detectors

Detectors are a part of AI tools.

Use detectors to automatically identify and locate objects in images or videos.

Labels matching

Each model is trained on a dataset and supports only the dataset’s labels.

For example:

DL model has the label car.
Your task (or project) has the label vehicle.

To annotate, you need to match these two labels to give DL model a hint, that in this case car = vehicle.

If you have a label that is not on the list of DL labels, you will not be able to match them.

For this reason, supported DL models are suitable only for certain labels.
To check the list of labels for each model, see Detectors models.

Annotate with detectors

To annotate with detectors, do the following:

Click Magic wand , and go to the Detectors tab.
From the Model drop-down, select model (see Detectors models).
From the left drop-down select the DL model label, from the right drop-down select the matching label of your task.
(Optional) If the model returns masks, and you need to convert masks to polygons, use the Convert masks to polygons toggle.
Click Annotate.

This action will automatically annotate one frame. For automatic annotation of multiple frames, see Automatic annotation.

Detectors models

Model	Description
Mask RCNN	The model generates polygons for each instance of an object in the image. For more information, see: GitHub: Mask RCNN Paper: Mask RCNN
Faster RCNN	The model generates bounding boxes for each instance of an object in the image. In this model, RPN and Fast R-CNN are combined into a single network. For more information, see: GitHub: Faster RCNN Paper: Faster RCNN
YOLO v3	YOLO v3 is a family of object detection architectures and models pre-trained on the COCO dataset. For more information, see: GitHub: YOLO v3 Site: YOLO v3 Paper: YOLO v3
YOLO v5	YOLO v5 is a family of object detection architectures and models based on the Pytorch framework. For more information, see: GitHub: YOLO v5 Site: YOLO v5
Semantic segmentation for ADAS	This is a segmentation network to classify each pixel into 20 classes. For more information, see: Site: ADAS
Mask RCNN with Tensorflow	Mask RCNN version with Tensorflow. The model generates polygons for each instance of an object in the image. For more information, see: GitHub: Mask RCNN Paper: Mask RCNN
Faster RCNN with Tensorflow	Faster RCNN version with Tensorflow. The model generates bounding boxes for each instance of an object in the image. In this model, RPN and Fast R-CNN are combined into a single network. For more information, see: Site: Faster RCNN with Tensorflow Paper: Faster RCNN
RetinaNet	Pytorch implementation of RetinaNet object detection. For more information, see: Specification: RetinaNet Paper: RetinaNet Documentation: RetinaNet
Face Detection	Face detector based on MobileNetV2 as a backbone for indoor and outdoor scenes shot by a front-facing camera. For more information, see: Site: Face Detection 0205

Trackers

Trackers are part of AI and OpenCV tools.

Use trackers to identify and label objects in a video or image sequence that are moving or changing over time.

AI tools: annotate with trackers

To annotate with trackers, do the following:

Click Magic wand , and go to the Trackers tab.
From the Label drop-down, select the label for the object.
From Tracker drop-down, select tracker.
Click Track, and annotate the objects with the bounding box in the first frame.
Go to the top menu and click Next (or the F on the keyboard) to move to the next frame.
All annotated objects will be automatically tracked.

OpenCV: annotate with trackers

To annotate with trackers, do the following:

On the menu toolbar, click OpenCV and wait for the library to load.
Go to the Tracker tab, select the label, and click Tracking.
From the Label drop-down, select the label for the object.
From Tracker drop-down, select tracker.
Click Track.
To move to the next frame, on the top menu click the Next button (or F on the keyboard).

All annotated objects will be automatically tracked when you move to the next frame.

When tracking

To enable/disable tracking, use Tracker switcher on the sidebar.
Trackable objects have an indication on canvas with a model name.
You can follow the tracking by the messages appearing at the top.

Trackers models

Model	Tool	Description
TrackerMIL	OpenCV	TrackerMIL model is not bound to labels and can be used for any object. It is a fast client-side model designed to track simple non-overlapping objects. For more information, see: Article: Object Tracking using OpenCV
SiamMask	AI Tools	Fast online Object Tracking and Segmentation. The trackable object will be tracked automatically if the previous frame was the latest keyframe for the object. For more information, see: GitHub: SiamMask Paper: SiamMask
Transformer Tracking (TransT)	AI Tools	Simple and efficient online tool for object tracking and segmentation. If the previous frame was the latest keyframe for the object, the trackable object will be tracked automatically. This is a modified version of the PyTracking Python framework based on Pytorch For more information, see: GitHub: TransT Paper: TransT

OpenCV: histogram equalization

Histogram equalization improves the contrast by stretching the intensity range.

It increases the global contrast of images when its usable data is represented by close contrast values.

It is useful in images with backgrounds and foregrounds that are bright or dark.

To improve the contrast of the image, do the following:

In the OpenCV menu, go to the Image tab.
Click on Histogram equalization button.

Histogram equalization will improve contrast on current and following frames.

Example of the result:

To disable Histogram equalization, click on the button again.

20 - Automatic annotation

Automatic annotation of tasks

Automatic annotation in CVAT is a tool that you can use to automatically pre-annotate your data with pre-trained models.

CVAT can use models from the following sources:

Pre-installed models.
Models integrated from Hugging Face and Roboflow.
Self-hosted models deployed with Nuclio.

The following table describes the available options:

	Self-hosted	Cloud
Price	Free	See Pricing
Models	You have to add models	You can use pre-installed models
Hugging Face & Roboflow integration	Not supported	Supported

See:

Running Automatic annotation
Labels matching
Models
Adding models from Hugging Face and Roboflow

Running Automatic annotation

To start automatic annotation, do the following:

On the top menu, click Tasks.
Find the task you want to annotate and click Action > Automatic annotation.
In the Automatic annotation dialog, from the drop-down list, select a model.
Match the labels of the model and the task.
(Optional) In case you need the model to return masks as polygons, switch toggle Return masks as polygons.
(Optional) In case you need to remove all previous annotations, switch toggle Clean old annotations.
Click Annotate.

CVAT will show the progress of annotation on the progress bar.

Progress bar

You can stop the automatic annotation at any moment by clicking cancel.

Labels matching

Each model is trained on a dataset and supports only the dataset’s labels.

For example:

DL model has the label car.
Your task (or project) has the label vehicle.

To annotate, you need to match these two labels to give CVAT a hint that, in this case, car = vehicle.

If you have a label that is not on the list of DL labels, you will not be able to match them.

For this reason, supported DL models are suitable only for certain labels.

To check the list of labels for each model, see Models papers and official documentation.

Models

Automatic annotation uses pre-installed and added models.

For self-hosted solutions, you need to install Automatic Annotation first and add models.

List of pre-installed models:

Model	Description
Attributed face detection	Three OpenVINO models work together: Face Detection 0205: face detector based on MobileNetV2 as a backbone with a FCOS head for indoor and outdoor scenes shot by a front-facing camera. Emotions recognition retail 0003: fully convolutional network for recognition of five emotions (‘neutral’, ‘happy’, ‘sad’, ‘surprise’, ‘anger’). Age gender recognition retail 0013: fully convolutional network for simultaneous Age/Gender recognition. The network can recognize the age of people in the [18 - 75] years old range; it is not applicable for children since their faces were not in the training set.
RetinaNet R101	RetinaNet is a one-stage object detection model that utilizes a focal loss function to address class imbalance during training. Focal loss applies a modulating term to the cross entropy loss to focus learning on hard negative examples. RetinaNet is a single, unified network composed of a backbone network and two task-specific subnetworks. For more information, see: Site: RetinaNET
Text detection	Text detector based on PixelLink architecture with MobileNetV2, depth_multiplier=1.4 as a backbone for indoor/outdoor scenes. For more information, see: Site: OpenVINO Text detection 004
YOLO v3	YOLO v3 is a family of object detection architectures and models pre-trained on the COCO dataset. For more information, see: Site: YOLO v3
YOLO v5	YOLO v5 is a family of object detection architectures and models based on the Pytorch framework. For more information, see: GitHub: YOLO v5 Site: YOLO v5
YOLO v7	YOLOv7 is an advanced object detection model that outperforms other detectors in terms of both speed and accuracy. It can process frames at a rate ranging from 5 to 160 frames per second (FPS) and achieves the highest accuracy with 56.8% average precision (AP) among real-time object detectors running at 30 FPS or higher on the V100 graphics processing unit (GPU). For more information, see: GitHub: YOLO v7 Paper: YOLO v7

Adding models from Hugging Face and Roboflow

In case you did not find the model you need, you can add a model of your choice from Hugging Face or Roboflow.

Note, that you cannot add models from Hugging Face and Roboflow to self-hosted CVAT.

For more information, see Streamline annotation by integrating Hugging Face and Roboflow models.

This video demonstrates the process:

21 - Specification for annotators

Learn how to easily create and add specification for annotators using the Guide feature.

The Guide feature provides a built-in markdown editor that allows you to create specification for annotators.

Once you create and submit the specification, it will be accessible from the annotation interface (see below).

You can attach the specification to Projects or to Tasks.

The attachment procedure is the same for individual users and organizations.

See:

Adding specification to Project
- Editing rights
Adding specification to Task
- Editing rights
Access to specification for annotators
Markdown editor guide
Specification for annotators' video tutorial

Adding specification to Project

To add specification to the Projects, do the following:

Go to the Projects page and click on the project to which you want to add specification.
Under the Project description, click Edit.

Project specification

Add instruction to the Markdown editor, and click Submit.

Editing rights

For individual users: only the project owner and the project assignee can edit the specification.
For organizations: specification additionally can be edited by the organization owner and maintainer

Editor rights

Adding specification to Task

To add specification to the Task, do the following:

Go to the Tasks page and click on the task to which you want to add specification.
Under the Task description, click Edit.
Add instruction to the Markdown editor, and click Submit.

Editing rights

For individual users: only the task owner and task assignee can edit the specification.
For organizations: only the task owner, maintainer, and task assignee can edit the specification.

Editor rights

Access to specification for annotators

To open specification, do the following:

Open the job to see the annotation interface.
In the top right corner, click Guide button().

Markdown editor guide

The markdown editor for Guide has two panes. Add instructions to the left pane, and the editor will immediately show the formatted result on the right.

Markdown editor

You can write in raw markdown or use the toolbar on the top of the editor.

Markdown editor

Element	Description
1	Text formatting: bold, cursive, and strikethrough.
2	Insert a horizontal rule (horizontal line).
3	Add a title, heading, or subheading. It provides a drop-down list to select the title level (from 1 to 6).
4	Add a link. Note: If you left-click on the link, it will open in the same window.
5	Add a quote.
6	Add a single line of code.
7	Add a block of code.
8	Add a comment. The comment is only visible to Guide editors and remains invisible to annotators.
9	Add a picture. To use this option, first, upload the picture to an external resource and then add the link in the editor. Alternatively, you can drag and drop a picture into the editor, which will upload it to the CVAT server and add it to the specification.
10	Add a list: bullet list, numbered list, and checklist.
11	Hide the editor pane: options to hide the right pane, show both panes or hide the left pane.
12	Enable full-screen mode.

Specification for annotators' video tutorial

Video tutorial on how to use the Guide feature.

22 - Backup Task and Project

Overview

In CVAT you can backup tasks and projects. This can be used to backup a task or project on your PC or to transfer to another server.

Create backup

To backup a task or project, open the action menu and select Backup Task or Backup Project.

You can backup a project or a task locally on your PC or using an attached cloud storage.

(Optional) Specify the name in the Custom name text field for backup, otherwise the file of backup name will be given by the mask project_<project_name>_backup_<date>_<time>.zip for the projects and task_<task_name>_backup_<date>_<time>.zip for the tasks.

If you want to save a backup to a specific attached cloud storage, you should additionally turn off the switch Use default settings, select the Cloud storage value in the Target storage and select this storage in the list of the attached cloud storages.

Create backup APIs

endpoints:
- /tasks/{id}/backup
- /projects/{id}/backup
method: GET
responses: 202, 201 with zip archive payload

Upload backup APIs

endpoints:
- /api/tasks/backup
- /api/projects/backup
method: POST
Content-Type: multipart/form-data
responses: 202, 201 with json payload

Create from backup

To create a task or project from a backup, go to the tasks or projects page, click the Create from backup button and select the archive you need.

As a result, you’ll get a task containing data, parameters, and annotations of the previously exported task.

Backup file structure

As a result, you’ll get a zip archive containing data, task or project and task specification and annotations with the following structure:

Task Backup Structure
Project Backup Structure

    .
    ├── data
    │   └── {user uploaded data}
    ├── task.json
    └── annotations.json

    .
    ├── task_{id}
    │   ├── data
    │   │   └── {user uploaded data}
    │   ├── task.json
    │   └── annotations.json
    └── project.json

23 - Frame deleting

This section explains how to delete and restore a frame from a task.

Delete frame

You can delete the current frame from a task. This frame will not be presented either in the UI or in the exported annotation. Thus, it is possible to mark corrupted frames that are not subject to annotation.

Go to the Job annotation view and click on the Delete frame button (Alt+Del).

Note: When you delete with the shortcut, the frame will be deleted immediately without additional confirmation.
After that you will be asked to confirm frame deleting.

Note: all annotations from that frame will be deleted, unsaved annotations will be saved and the frame will be invisible in the annotation view (Until you make it visible in the settings). If there is some overlap in the task and the deleted frame falls within this interval, then this will cause this frame to become unavailable in another job as well.
When you delete a frame in a job with tracks, you may need to adjust some tracks manually. Common adjustments are:
- Add keyframes at the edges of the deleted interval for the interpolation to look correct;
- Move the keyframe start or end keyframe to the correct side of the deleted interval.

If you need to enable showing the deleted frames, you can do it in the settings.

Go to the settings and chose Player settings.
Click on the Show deleted frames checkbox. And close the settings dialog.
Then you will be able to navigate through deleted frames. But annotation tools will be unavailable. Deleted frames differ in the corresponding overlay.
There are view ways to navigate through deleted frames without enabling this option:
- Go to the frame via direct navigation methods: navigation slider or frame input field,
- Go to the frame via the direct link.
Navigation with step will not count deleted frames.

Restore deleted frame

You can also restore deleted frames in the task.

Turn on deleted frames visibility, as it was told in the previous part, and go to the deleted frame you want to restore.
Click on the Restore icon. The frame will be restored immediately.

24 - Export/import datasets and upload annotation

This section explains how to download and upload datasets (including annotation, images, and metadata) of projects, tasks, and jobs.

Export dataset

You can export a dataset to a project, task or job.

To download the latest annotations, you have to save all changes first. Click the Save button. There is a Ctrl+S shortcut to save annotations quickly.
After that, click the Menu button. Exporting and importing of task and project datasets takes place through the Action menu.
Press the Export task dataset button.
Choose the format for exporting the dataset. Exporting and importing is available in:
- Standard CVAT formats:
  - CVAT for video choose if the task is created in interpolation mode.
  - CVAT for images choose if a task is created in annotation mode.
- And also in formats from the list of annotation formats supported by CVAT.
- For 3D tasks, the following formats are available:
  - Kitti Raw Format 1.0
  - Sly Point Cloud Format 1.0 - Supervisely Point Cloud dataset
To download images with the dataset, enable the Save images option.
(Optional) To name the resulting archive, use the Custom name field.
You can choose a storage for dataset export by selecting a target storage Local or Cloud storage. The default settings are the settings that had been selected when the project was created (for example, if you specified a local storage when you created the project, then by default, you will be prompted to export the dataset to your PC). You can find out the default value by hovering the mouse over the ?. Learn more about attach cloud storage.

Import dataset

You can import dataset only to a project. In this case, the data will be split into subsets. To import a dataset, do the following on the Project page:

Open the Actions menu.
Press the Import dataset button.
Select the dataset format (if you did not specify a custom name during export, the format will be in the archive name).
Drag the file to the file upload area or click on the upload area to select the file through the explorer.

You can also import a dataset from an attached cloud storage. Here you should select the annotation format, then select a cloud storage from the list or use default settings if you have already specified required cloud storage for task or project and specify a zip archive to the text field File name.

During the import process, you will be able to track the progress of the import.

Upload annotations

In the task or job you can upload an annotation. For this select the item Upload annotation in the menu Action of the task or in the job Menu on the Top panel select the format in which you plan to upload the annotation and select the annotation file or archive via explorer.

Or you can also use the attached cloud storage to upload the annotation file.

25 - Formats

List of annotation formats supported by CVAT.

CVAT supported the following formats:

25.1 -

CVAT

This is the native CVAT annotation format. It supports all CVAT annotations features, so it can be used to make data backups.

supported annotations CVAT for Images: Rectangles, Polygons, Polylines, Points, Cuboids, Skeletons, Tags, Tracks
supported annotations CVAT for Videos: Rectangles, Polygons, Polylines, Points, Cuboids, Skeletons, Tracks
attributes are supported

CVAT for images export

Downloaded file: a ZIP file of the following structure:

taskname.zip/
├── images/
|   ├── img1.png
|   └── img2.jpg
└── annotations.xml

tracks are split by frames

CVAT for videos export

Downloaded file: a ZIP file of the following structure:

taskname.zip/
├── images/
|   ├── frame_000000.png
|   └── frame_000001.png
└── annotations.xml

shapes are exported as single-frame tracks

CVAT loader

Uploaded file: an XML file or a ZIP file of the structures above

25.2 -

Datumaro format

Datumaro is a tool, which can help with complex dataset and annotation transformations, format conversions, dataset statistics, merging, custom formats etc. It is used as a provider of dataset support in CVAT, so basically, everything possible in CVAT is possible in Datumaro too, but Datumaro can offer dataset operations.

supported annotations: any 2D shapes, labels
supported attributes: any

Import annotations in Datumaro format

Uploaded file: a zip archive of the following structure:

<archive_name>.zip/
└── annotations/
    ├── subset1.json # fully description of classes and all dataset items
    └── subset2.json # fully description of classes and all dataset items

JSON annotations files in the annotations directory should have similar structure:

{
  "info": {},
  "categories": {
    "label": {
      "labels": [
        {
          "name": "label_0",
          "parent": "",
          "attributes": []
        },
        {
          "name": "label_1",
          "parent": "",
          "attributes": []
        }
      ],
      "attributes": []
    }
  },
  "items": [
    {
      "id": "img1",
      "annotations": [
        {
          "id": 0,
          "type": "polygon",
          "attributes": {},
          "group": 0,
          "label_id": 1,
          "points": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0],
          "z_order": 0
        },
        {
          "id": 1,
          "type": "bbox",
          "attributes": {},
          "group": 1,
          "label_id": 0,
          "z_order": 0,
          "bbox": [1.0, 2.0, 3.0, 4.0]
        },
        {
          "id": 2,
          "type": "mask",
          "attributes": {},
          "group": 1,
          "label_id": 0,
          "rle": {
            "counts": "d0d0:F\\0",
            "size": [10, 10]
          },
          "z_order": 0
        }
      ]
    }
  ]
}

Export annotations in Datumaro format

Downloaded file: a zip archive of the following structure:

taskname.zip/
├── annotations/
│   └── default.json # fully description of classes and all dataset items
└── images/ # if the option `save images` was selected
    └── default
        ├── image1.jpg
        ├── image2.jpg
        ├── ...

25.3 -

LabelMe

Dataset examples

LabelMe export

Downloaded file: a zip archive of the following structure:

taskname.zip/
├── img1.jpg
└── img1.xml

supported annotations: Rectangles, Polygons (with attributes)

LabelMe import

Uploaded file: a zip archive of the following structure:

taskname.zip/
├── Masks/
|   ├── img1_mask1.png
|   └── img1_mask2.png
├── img1.xml
├── img2.xml
└── img3.xml

supported annotations: Rectangles, Polygons, Masks (as polygons)

25.4 -

MOT sequence

Dataset examples

MOT export

Downloaded file: a zip archive of the following structure:

taskname.zip/
├── img1/
|   ├── image1.jpg
|   └── image2.jpg
└── gt/
    ├── labels.txt
    └── gt.txt

# labels.txt
cat
dog
person
...

# gt.txt
# frame_id, track_id, x, y, w, h, "not ignored", class_id, visibility, <skipped>
1,1,1363,569,103,241,1,1,0.86014
...

supported annotations: Rectangle shapes and tracks
supported attributes: visibility (number), ignored (checkbox)

MOT import

Uploaded file: a zip archive of the structure above or:

taskname.zip/
├── labels.txt # optional, mandatory for non-official labels
└── gt.txt

supported annotations: Rectangle tracks

25.5 -

MOTS PNG

Dataset examples

MOTS PNG export

Downloaded file: a zip archive of the following structure:

taskname.zip/
└── <any_subset_name>/
    |   images/
    |   ├── image1.jpg
    |   └── image2.jpg
    └── instances/
        ├── labels.txt
        ├── image1.png
        └── image2.png

# labels.txt
cat
dog
person
...

supported annotations: Rectangle and Polygon tracks

MOTS PNG import

Uploaded file: a zip archive of the structure above

supported annotations: Polygon tracks

25.6 -

MS COCO Object Detection

COCO export

Downloaded file: a zip archive with the structure described here

archive.zip/
├── images/
│   ├── train/
│   │   ├── <image_name1.ext>
│   │   ├── <image_name2.ext>
│   │   └── ...
│   └── val/
│       ├── <image_name1.ext>
│       ├── <image_name2.ext>
│       └── ...
└── annotations/
   ├── <task>_<subset_name>.json
   └── ...

If the dataset is exported from a Project, the subsets are named the same way as they are named in the project. In other cases there will be a single default subset, containing all the data. The <task> part corresponds to one of the COCO tasks: instances, person_keypoints, panoptic, image_info, labels, captions, stuff. There can be several annotation files in the archive.

supported annotations: Polygons, Rectangles
supported attributes:
- is_crowd (checkbox or integer with values 0 and 1) - specifies that the instance (an object group) should have an RLE-encoded mask in the segmentation field. All the grouped shapes are merged into a single mask, the largest one defines all the object properties
- score (number) - the annotation score field
- arbitrary attributes - will be stored in the attributes annotation section

Support for COCO tasks via Datumaro is described here For example, support for COCO keypoints over Datumaro:

Install Datumaro pip install datumaro
Export the task in the Datumaro format, unzip
Export the Datumaro project in coco / coco_person_keypoints formats datum export -f coco -p path/to/project [-- --save-images]

This way, one can export CVAT points as single keypoints or keypoint lists (without the visibility COCO flag).

COCO import

Uploaded file: a single unpacked *.json or a zip archive with the structure described above or here (without images).

supported annotations: Polygons, Rectangles (if the segmentation field is empty)
supported tasks: instances, person_keypoints (only segmentations will be imported), panoptic

MS COCO Keypoint Detection

Format specification

COCO export

Downloaded file: a zip archive with the structure described here

supported annotations: Skeletons
supported attributes:
- is_crowd (checkbox or integer with values 0 and 1) - specifies that the instance (an object group) should have an RLE-encoded mask in the segmentation field. All the grouped shapes are merged into a single mask, the largest one defines all the object properties
- score (number) - the annotation score field
- arbitrary attributes - will be stored in the attributes annotation section

COCO import

Uploaded file: a single unpacked *.json or a zip archive with the structure described here (without images).

supported annotations: Skeletons

How to create a task from MS COCO dataset

Download the MS COCO dataset.

For example val images and instances annotations

Create a CVAT task with the following labels:

person bicycle car motorcycle airplane bus train truck boat "traffic light" "fire hydrant" "stop sign" "parking meter" bench bird cat dog horse sheep cow elephant bear zebra giraffe backpack umbrella handbag tie suitcase frisbee skis snowboard "sports ball" kite "baseball bat" "baseball glove" skateboard surfboard "tennis racket" bottle "wine glass" cup fork knife spoon bowl banana apple sandwich orange broccoli carrot "hot dog" pizza donut cake chair couch "potted plant" bed "dining table" toilet tv laptop mouse remote keyboard "cell phone" microwave oven toaster sink refrigerator book clock vase scissors "teddy bear" "hair drier" toothbrush

Select val2017.zip as data (See Creating an annotation task guide for details)
Unpack annotations_trainval2017.zip
click Upload annotation button, choose COCO 1.1 and select instances_val2017.json annotation file. It can take some time.

25.7 -

Pascal VOC

Format specification
Dataset examples
supported annotations:
- Rectangles (detection and layout tasks)
- Tags (action- and classification tasks)
- Polygons (segmentation task)
supported attributes:
- occluded (both UI option and a separate attribute)
- truncated and difficult (should be defined for labels as checkbox -es)
- action attributes (import only, should be defined as checkbox -es)
- arbitrary attributes (in the attributes section of XML files)

Pascal VOC export

Downloaded file: a zip archive of the following structure:

taskname.zip/
├── JPEGImages/
│   ├── <image_name1>.jpg
│   ├── <image_name2>.jpg
│   └── <image_nameN>.jpg
├── Annotations/
│   ├── <image_name1>.xml
│   ├── <image_name2>.xml
│   └── <image_nameN>.xml
├── ImageSets/
│   └── Main/
│       └── default.txt
└── labelmap.txt

# labelmap.txt
# label : color_rgb : 'body' parts : actions
background:::
aeroplane:::
bicycle:::
bird:::

Pascal VOC import

Uploaded file: a zip archive of the structure declared above or the following:

taskname.zip/
├── <image_name1>.xml
├── <image_name2>.xml
└── <image_nameN>.xml

It must be possible for CVAT to match the frame name and file name from annotation .xml file (the filename tag, e. g. <filename>2008_004457.jpg</filename> ).

There are 2 options:

full match between frame name and file name from annotation .xml (in cases when task was created from images or image archive).
match by frame number. File name should be <number>.jpg or frame_000000.jpg. It should be used when task was created from video.

Segmentation mask export

Downloaded file: a zip archive of the following structure:

taskname.zip/
├── labelmap.txt # optional, required for non-VOC labels
├── ImageSets/
│   └── Segmentation/
│       └── default.txt # list of image names without extension
├── SegmentationClass/ # merged class masks
│   ├── image1.png
│   └── image2.png
└── SegmentationObject/ # merged instance masks
    ├── image1.png
    └── image2.png

# labelmap.txt
# label : color (RGB) : 'body' parts : actions
background:0,128,0::
aeroplane:10,10,128::
bicycle:10,128,0::
bird:0,108,128::
boat:108,0,100::
bottle:18,0,8::
bus:12,28,0::

Mask is a png image with 1 or 3 channels where each pixel has own color which corresponds to a label. Colors are generated following to Pascal VOC algorithm. (0, 0, 0) is used for background by default.

supported shapes: Rectangles, Polygons

Segmentation mask import

Uploaded file: a zip archive of the following structure:

  taskname.zip/
  ├── labelmap.txt # optional, required for non-VOC labels
  ├── ImageSets/
  │   └── Segmentation/
  │       └── <any_subset_name>.txt
  ├── SegmentationClass/
  │   ├── image1.png
  │   └── image2.png
  └── SegmentationObject/
      ├── image1.png
      └── image2.png

It is also possible to import grayscale (1-channel) PNG masks. For grayscale masks provide a list of labels with the number of lines equal to the maximum color index on images. The lines must be in the right order so that line index is equal to the color index. Lines can have arbitrary, but different, colors. If there are gaps in the used color indices in the annotations, they must be filled with arbitrary dummy labels. Example:

q:0,128,0:: # color index 0
aeroplane:10,10,128:: # color index 1
_dummy2:2,2,2:: # filler for color index 2
_dummy3:3,3,3:: # filler for color index 3
boat:108,0,100:: # color index 3
...
_dummy198:198,198,198:: # filler for color index 198
_dummy199:199,199,199:: # filler for color index 199
...
the last label:12,28,0:: # color index 200

supported shapes: Polygons

How to create a task from Pascal VOC dataset

Download the Pascal Voc dataset (Can be downloaded from the PASCAL VOC website)
Create a CVAT task with the following labels:
```
aeroplane bicycle bird boat bottle bus car cat chair cow diningtable
dog horse motorbike person pottedplant sheep sofa train tvmonitor
```
You can add ~checkbox=difficult:false ~checkbox=truncated:false attributes for each label if you want to use them.

Select interesting image files (See Creating an annotation task guide for details)
zip the corresponding annotation files
click Upload annotation button, choose Pascal VOC ZIP 1.1

and select the zip file with annotations from previous step. It may take some time.

25.8 -

YOLO

Format specification
Dataset examples
supported annotations: Rectangles

YOLO export

Downloaded file: a zip archive with following structure:

archive.zip/
├── obj.data
├── obj.names
├── obj_<subset>_data
│   ├── image1.txt
│   └── image2.txt
└── train.txt # list of subset image paths

# the only valid subsets are: train, valid
# train.txt and valid.txt:
obj_<subset>_data/image1.jpg
obj_<subset>_data/image2.jpg

# obj.data:
classes = 3 # optional
names = obj.names
train = train.txt
valid = valid.txt # optional
backup = backup/ # optional

# obj.names:
cat
dog
airplane

# image_name.txt:
# label_id - id from obj.names
# cx, cy - relative coordinates of the bbox center
# rw, rh - relative size of the bbox
# label_id cx cy rw rh
1 0.3 0.8 0.1 0.3
2 0.7 0.2 0.3 0.1

Each annotation *.txt file has a name that corresponds to the name of the image file (e. g. frame_000001.txt is the annotation for the frame_000001.jpg image). The *.txt file structure: each line describes label and bounding box in the following format label_id cx cy w h. obj.names contains the ordered list of label names.

YOLO import

Uploaded file: a zip archive of the same structure as above It must be possible to match the CVAT frame (image name) and annotation file name. There are 2 options:

full match between image name and name of annotation *.txt file (in cases when a task was created from images or archive of images).
match by frame number (if CVAT cannot match by name). File name should be in the following format <number>.jpg . It should be used when task was created from a video.

How to create a task from YOLO formatted dataset (from VOC for example)

Follow the official guide(see Training YOLO on VOC section) and prepare the YOLO formatted annotation files.
Zip train images

zip images.zip -j -@ < train.txt

Create a CVAT task with the following labels:
```
aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog
horse motorbike person pottedplant sheep sofa train tvmonitor
```
Select images. zip as data. Most likely you should use share functionality because size of images. zip is more than 500Mb. See Creating an annotation task guide for details.

Create obj.names with the following content:

aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
person
pottedplant
sheep
sofa
train
tvmonitor

Zip all label files together (we need to add only label files that correspond to the train subset)

cat train.txt | while read p; do echo ${p%/*/*}/labels/${${p##*/}%%.*}.txt; done | zip labels.zip -j -@ obj.names

Click Upload annotation button, choose YOLO 1.1 and select the zip

file with labels from the previous step.

25.9 -

TFRecord

Dataset examples

TFRecord is a very flexible format, but we try to correspond the format that used in TF object detection with minimal modifications.

Used feature description:

image_feature_description = {
    'image/filename': tf.io.FixedLenFeature([], tf.string),
    'image/source_id': tf.io.FixedLenFeature([], tf.string),
    'image/height': tf.io.FixedLenFeature([], tf.int64),
    'image/width': tf.io.FixedLenFeature([], tf.int64),
    # Object boxes and classes.
    'image/object/bbox/xmin': tf.io.VarLenFeature(tf.float32),
    'image/object/bbox/xmax': tf.io.VarLenFeature(tf.float32),
    'image/object/bbox/ymin': tf.io.VarLenFeature(tf.float32),
    'image/object/bbox/ymax': tf.io.VarLenFeature(tf.float32),
    'image/object/class/label': tf.io.VarLenFeature(tf.int64),
    'image/object/class/text': tf.io.VarLenFeature(tf.string),
}

TFRecord export

Downloaded file: a zip archive with following structure:

taskname.zip/
├── default.tfrecord
└── label_map.pbtxt

# label_map.pbtxt
item {
	id: 1
	name: 'label_0'
}
item {
	id: 2
	name: 'label_1'
}
...

supported annotations: Rectangles, Polygons (as masks, manually over Datumaro)

How to export masks:

Export annotations in Datumaro format
Apply polygons_to_masks and boxes_to_masks transforms

datum transform -t polygons_to_masks -p path/to/proj -o ptm
datum transform -t boxes_to_masks -p ptm -o btm

Export in the TF Detection API format

datum export -f tf_detection_api -p btm [-- --save-images]

TFRecord import

Uploaded file: a zip archive of following structure:

taskname.zip/
└── <any name>.tfrecord

supported annotations: Rectangles

How to create a task from TFRecord dataset (from VOC2007 for example)

Create label_map.pbtxt file with the following content:

item {
    id: 1
    name: 'aeroplane'
}
item {
    id: 2
    name: 'bicycle'
}
item {
    id: 3
    name: 'bird'
}
item {
    id: 4
    name: 'boat'
}
item {
    id: 5
    name: 'bottle'
}
item {
    id: 6
    name: 'bus'
}
item {
    id: 7
    name: 'car'
}
item {
    id: 8
    name: 'cat'
}
item {
    id: 9
    name: 'chair'
}
item {
    id: 10
    name: 'cow'
}
item {
    id: 11
    name: 'diningtable'
}
item {
    id: 12
    name: 'dog'
}
item {
    id: 13
    name: 'horse'
}
item {
    id: 14
    name: 'motorbike'
}
item {
    id: 15
    name: 'person'
}
item {
    id: 16
    name: 'pottedplant'
}
item {
    id: 17
    name: 'sheep'
}
item {
    id: 18
    name: 'sofa'
}
item {
    id: 19
    name: 'train'
}
item {
    id: 20
    name: 'tvmonitor'
}

Use create_pascal_tf_record.py

to convert VOC2007 dataset to TFRecord format. As example:

python create_pascal_tf_record.py --data_dir <path to VOCdevkit> --set train --year VOC2007 --output_path pascal.tfrecord --label_map_path label_map.pbtxt

Zip train images

cat <path to VOCdevkit>/VOC2007/ImageSets/Main/train.txt | while read p; do echo <path to VOCdevkit>/VOC2007/JPEGImages/${p}.jpg  ; done | zip images.zip -j -@

Create a CVAT task with the following labels:

aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor

Select images. zip as data. See Creating an annotation task guide for details.

Zip pascal.tfrecord and label_map.pbtxt files together

zip anno.zip -j <path to pascal.tfrecord> <path to label_map.pbtxt>

Click Upload annotation button, choose TFRecord 1.0 and select the zip file

with labels from the previous step. It may take some time.

25.10 -

ImageNet

Dataset examples

ImageNet export

Downloaded file: a zip archive of the following structure:

# if we save images:
taskname.zip/
├── label1/
|   ├── label1_image1.jpg
|   └── label1_image2.jpg
└── label2/
    ├── label2_image1.jpg
    ├── label2_image3.jpg
    └── label2_image4.jpg

# if we keep only annotation:
taskname.zip/
├── <any_subset_name>.txt
└── synsets.txt

supported annotations: Labels

ImageNet import

Uploaded file: a zip archive of the structure above

supported annotations: Labels

25.11 -

WIDER Face

Dataset examples

WIDER Face export

Downloaded file: a zip archive of the following structure:

taskname.zip/
├── labels.txt # optional
├── wider_face_split/
│   └── wider_face_<any_subset_name>_bbx_gt.txt
└── WIDER_<any_subset_name>/
    └── images/
        ├── 0--label0/
        │   └── 0_label0_image1.jpg
        └── 1--label1/
            └── 1_label1_image2.jpg

supported annotations: Rectangles (with attributes), Labels
supported attributes:
- blur, expression, illumination, pose, invalid
- occluded (both the annotation property & an attribute)

WIDER Face import

Uploaded file: a zip archive of the structure above

supported annotations: Rectangles (with attributes), Labels
supported attributes:
- blur, expression, illumination, occluded, pose, invalid

25.12 -

CamVid

Dataset examples

CamVid export

Downloaded file: a zip archive of the following structure:

taskname.zip/
├── labelmap.txt # optional, required for non-CamVid labels
├── <any_subset_name>/
|   ├── image1.png
|   └── image2.png
├── <any_subset_name>annot/
|   ├── image1.png
|   └── image2.png
└── <any_subset_name>.txt

# labelmap.txt
# color (RGB) label
0 0 0 Void
64 128 64 Animal
192 0 128 Archway
0 128 192 Bicyclist
0 128 64 Bridge

Mask is a png image with 1 or 3 channels where each pixel has own color which corresponds to a label. (0, 0, 0) is used for background by default.

supported annotations: Rectangles, Polygons

CamVid import

Uploaded file: a zip archive of the structure above

supported annotations: Polygons

25.13 -

VGGFace2

Dataset examples

VGGFace2 export

Downloaded file: a zip archive of the following structure:

taskname.zip/
├── labels.txt # optional
├── <any_subset_name>/
|   ├── label0/
|   |   └── image1.jpg
|   └── label1/
|       └── image2.jpg
└── bb_landmark/
    ├── loose_bb_<any_subset_name>.csv
    └── loose_landmark_<any_subset_name>.csv
# labels.txt
# n000001 car
label0 <class0>
label1 <class1>

supported annotations: Rectangles, Points (landmarks - groups of 5 points)

VGGFace2 import

Uploaded file: a zip archive of the structure above

supported annotations: Rectangles, Points (landmarks - groups of 5 points)

25.14 -

Market-1501

Dataset examples

Market-1501 export

Downloaded file: a zip archive of the following structure:

taskname.zip/
├── bounding_box_<any_subset_name>/
│   └── image_name_1.jpg
└── query
    ├── image_name_2.jpg
    └── image_name_3.jpg
# if we keep only annotation:
taskname.zip/
└── images_<any_subset_name>.txt
# images_<any_subset_name>.txt
query/image_name_1.jpg
bounding_box_<any_subset_name>/image_name_2.jpg
bounding_box_<any_subset_name>/image_name_3.jpg
# image_name = 0001_c1s1_000015_00.jpg
0001 - person id
c1 - camera id (there are totally 6 cameras)
s1 - sequence
000015 - frame number in sequence
00 - means that this bounding box is the first one among the several

supported annotations: Label market-1501 with attributes (query, person_id, camera_id)

Market-1501 import

Uploaded file: a zip archive of the structure above

supported annotations: Label market-1501 with attributes (query, person_id, camera_id)

25.15 -

ICDAR13/15

Dataset examples

ICDAR13/15 export

Downloaded file: a zip archive of the following structure:

# word recognition task
taskname.zip/
└── word_recognition/
    └── <any_subset_name>/
        ├── images
        |   ├── word1.png
        |   └── word2.png
        └── gt.txt
# text localization task
taskname.zip/
└── text_localization/
    └── <any_subset_name>/
        ├── images
        |   ├── img_1.png
        |   └── img_2.png
        ├── gt_img_1.txt
        └── gt_img_1.txt
#text segmentation task
taskname.zip/
└── text_localization/
    └── <any_subset_name>/
        ├── images
        |   ├── 1.png
        |   └── 2.png
        ├── 1_GT.bmp
        ├── 1_GT.txt
        ├── 2_GT.bmp
        └── 2_GT.txt

Word recognition task:

supported annotations: Label icdar with attribute caption

Text localization task:

supported annotations: Rectangles and Polygons with label icdar and attribute text

Text segmentation task:

supported annotations: Rectangles and Polygons with label icdar and attributes index, text, color, center

ICDAR13/15 import

Uploaded file: a zip archive of the structure above

Word recognition task:

supported annotations: Label icdar with attribute caption

Text localization task:

supported annotations: Rectangles and Polygons with label icdar and attribute text

Text segmentation task:

supported annotations: Rectangles and Polygons with label icdar and attributes index, text, color, center

25.16 -

Open Images

Format specification
Dataset examples
Supported annotations:
- Rectangles (detection task)
- Tags (classification task)
- Polygons (segmentation task)
Supported attributes:
- Labels
  - score (should be defined for labels as text or number). The confidence level from 0 to 1.
- Bounding boxes
  - score (should be defined for labels as text or number). The confidence level from 0 to 1.
  - occluded (both UI option and a separate attribute). Whether the object is occluded by another object.
  - truncated (should be defined for labels as checkbox -es). Whether the object extends beyond the boundary of the image.
  - is_group_of (should be defined for labels as checkbox -es). Whether the object represents a group of objects of the same class.
  - is_depiction (should be defined for labels as checkbox -es). Whether the object is a depiction (such as a drawing) rather than a real object.
  - is_inside (should be defined for labels as checkbox -es). Whether the object is seen from the inside.
- Masks
  - box_id (should be defined for labels as text). An identifier for the bounding box associated with the mask.
  - predicted_iou (should be defined for labels as text or number). Predicted IoU value with respect to the ground truth.

Open Images export

Downloaded file: a zip archive of the following structure:

└─ taskname.zip/
    ├── annotations/
    │   ├── bbox_labels_600_hierarchy.json
    │   ├── class-descriptions.csv
    |   ├── images.meta  # additional file with information about image sizes
    │   ├── <subset_name>-image_ids_and_rotation.csv
    │   ├── <subset_name>-annotations-bbox.csv
    │   ├── <subset_name>-annotations-human-imagelabels.csv
    │   └── <subset_name>-annotations-object-segmentation.csv
    ├── images/
    │   ├── subset1/
    │   │   ├── <image_name101.jpg>
    │   │   ├── <image_name102.jpg>
    │   │   └── ...
    │   ├── subset2/
    │   │   ├── <image_name201.jpg>
    │   │   ├── <image_name202.jpg>
    │   │   └── ...
    |   ├── ...
    └── masks/
        ├── subset1/
        │   ├── <mask_name101.png>
        │   ├── <mask_name102.png>
        │   └── ...
        ├── subset2/
        │   ├── <mask_name201.png>
        │   ├── <mask_name202.png>
        │   └── ...
        ├── ...

Open Images import

Uploaded file: a zip archive of the following structure:

└─ upload.zip/
    ├── annotations/
    │   ├── bbox_labels_600_hierarchy.json
    │   ├── class-descriptions.csv
    |   ├── images.meta  # optional, file with information about image sizes
    │   ├── <subset_name>-image_ids_and_rotation.csv
    │   ├── <subset_name>-annotations-bbox.csv
    │   ├── <subset_name>-annotations-human-imagelabels.csv
    │   └── <subset_name>-annotations-object-segmentation.csv
    └── masks/
        ├── subset1/
        │   ├── <mask_name101.png>
        │   ├── <mask_name102.png>
        │   └── ...
        ├── subset2/
        │   ├── <mask_name201.png>
        │   ├── <mask_name202.png>
        │   └── ...
        ├── ...

Image ids in the <subset_name>-image_ids_and_rotation.csv should match with image names in the task.

25.17 -

Cityscapes

Format specification
Dataset examples
Supported annotations
- Polygons (segmentation task)
Supported attributes
- ‘is_crowd’ (boolean, should be defined for labels as checkbox -es) Specifies if the annotation label can distinguish between different instances. If False, the annotation id field encodes the instance id.

Cityscapes export

Downloaded file: a zip archive of the following structure:

.
├── label_color.txt
├── gtFine
│   ├── <subset_name>
│   │   └── <city_name>
│   │       ├── image_0_gtFine_instanceIds.png
│   │       ├── image_0_gtFine_color.png
│   │       ├── image_0_gtFine_labelIds.png
│   │       ├── image_1_gtFine_instanceIds.png
│   │       ├── image_1_gtFine_color.png
│   │       ├── image_1_gtFine_labelIds.png
│   │       ├── ...
└── imgsFine  # if saving images was requested
    └── leftImg8bit
        ├── <subset_name>
        │   └── <city_name>
        │       ├── image_0_leftImg8bit.png
        │       ├── image_1_leftImg8bit.png
        │       ├── ...

label_color.txt a file that describes the color for each label

# label_color.txt example
# r g b label_name
0 0 0 background
0 255 0 tree
...

*_gtFine_color.png class labels encoded by its color.
*_gtFine_labelIds.png class labels are encoded by its index.
*_gtFine_instanceIds.png class and instance labels encoded by an instance ID. The pixel values encode class and the individual instance: the integer part of a division by 1000 of each ID provides class ID, the remainder is the instance ID. If a certain annotation describes multiple instances, then the pixels have the regular ID of that class

Cityscapes annotations import

Uploaded file: a zip archive with the following structure:

.
├── label_color.txt # optional
└── gtFine
    └── <city_name>
        ├── image_0_gtFine_instanceIds.png
        ├── image_1_gtFine_instanceIds.png
        ├── ...

Creating task with Cityscapes dataset

Create a task with the labels you need or you can use the labels and colors of the original dataset. To work with the Cityscapes format, you must have a black color label for the background.

Original Cityscapes color map:

[
    {"name": "unlabeled", "color": "#000000", "attributes": []},
    {"name": "egovehicle", "color": "#000000", "attributes": []},
    {"name": "rectificationborder", "color": "#000000", "attributes": []},
    {"name": "outofroi", "color": "#000000", "attributes": []},
    {"name": "static", "color": "#000000", "attributes": []},
    {"name": "dynamic", "color": "#6f4a00", "attributes": []},
    {"name": "ground", "color": "#510051", "attributes": []},
    {"name": "road", "color": "#804080", "attributes": []},
    {"name": "sidewalk", "color": "#f423e8", "attributes": []},
    {"name": "parking", "color": "#faaaa0", "attributes": []},
    {"name": "railtrack", "color": "#e6968c", "attributes": []},
    {"name": "building", "color": "#464646", "attributes": []},
    {"name": "wall", "color": "#66669c", "attributes": []},
    {"name": "fence", "color": "#be9999", "attributes": []},
    {"name": "guardrail", "color": "#b4a5b4", "attributes": []},
    {"name": "bridge", "color": "#966464", "attributes": []},
    {"name": "tunnel", "color": "#96785a", "attributes": []},
    {"name": "pole", "color": "#999999", "attributes": []},
    {"name": "polegroup", "color": "#999999", "attributes": []},
    {"name": "trafficlight", "color": "#faaa1e", "attributes": []},
    {"name": "trafficsign", "color": "#dcdc00", "attributes": []},
    {"name": "vegetation", "color": "#6b8e23", "attributes": []},
    {"name": "terrain", "color": "#98fb98", "attributes": []},
    {"name": "sky", "color": "#4682b4", "attributes": []},
    {"name": "person", "color": "#dc143c", "attributes": []},
    {"name": "rider", "color": "#ff0000", "attributes": []},
    {"name": "car", "color": "#00008e", "attributes": []},
    {"name": "truck", "color": "#000046", "attributes": []},
    {"name": "bus", "color": "#003c64", "attributes": []},
    {"name": "caravan", "color": "#00005a", "attributes": []},
    {"name": "trailer", "color": "#00006e", "attributes": []},
    {"name": "train", "color": "#005064", "attributes": []},
    {"name": "motorcycle", "color": "#0000e6", "attributes": []},
    {"name": "bicycle", "color": "#770b20", "attributes": []},
    {"name": "licenseplate", "color": "#00000e", "attributes": []}
]

Upload images when creating a task:

images.zip/
    ├── image_0.jpg
    ├── image_1.jpg
    ├── ...

After creating the task, upload the Cityscapes annotations as described in the previous section.

25.18 -

KITTI

Format specification for KITTI detection
Format specification for KITTI segmentation
Dataset examples
supported annotations:
- Rectangles (detection task)
- Polygon (segmentation task)
supported attributes:
- occluded (both UI option and a separate attribute). Indicates that a significant portion of the object within the bounding box is occluded by another object
- truncated supported only for rectangles (should be defined for labels as checkbox -es). Indicates that the bounding box specified for the object does not correspond to the full extent of the object
- ‘is_crowd’ supported only for polygons (should be defined for labels as checkbox -es). Indicates that the annotation covers multiple instances of the same class

KITTI annotations export

Downloaded file: a zip archive of the following structure:

└─ annotations.zip/
    ├── label_colors.txt # list of pairs r g b label_name
    ├── labels.txt # list of labels
    └── default/
        ├── label_2/ # left color camera label files
        │   ├── <image_name_1>.txt
        │   ├── <image_name_2>.txt
        │   └── ...
        ├── instance/ # instance segmentation masks
        │   ├── <image_name_1>.png
        │   ├── <image_name_2>.png
        │   └── ...
        ├── semantic/ # semantic segmentation masks (labels are encoded by its id)
        │   ├── <image_name_1>.png
        │   ├── <image_name_2>.png
        │   └── ...
        └── semantic_rgb/ # semantic segmentation masks (labels are encoded by its color)
            ├── <image_name_1>.png
            ├── <image_name_2>.png
            └── ...

KITTI annotations import

You can upload KITTI annotations in two ways: rectangles for the detection task and masks for the segmentation task.

For detection tasks the uploading archive should have the following structure:

└─ annotations.zip/
    ├── labels.txt # optional, labels list for non-original detection labels
    └── <subset_name>/
        ├── label_2/ # left color camera label files
        │   ├── <image_name_1>.txt
        │   ├── <image_name_2>.txt
        │   └── ...

For segmentation tasks the uploading archive should have the following structure:

└─ annotations.zip/
    ├── label_colors.txt # optional, color map for non-original segmentation labels
    └── <subset_name>/
        ├── instance/ # instance segmentation masks
        │   ├── <image_name_1>.png
        │   ├── <image_name_2>.png
        │   └── ...
        ├── semantic/ # optional, semantic segmentation masks (labels are encoded by its id)
        │   ├── <image_name_1>.png
        │   ├── <image_name_2>.png
        │   └── ...
        └── semantic_rgb/ # optional, semantic segmentation masks (labels are encoded by its color)
            ├── <image_name_1>.png
            ├── <image_name_2>.png
            └── ...

All annotation files and masks should have structures that are described in the original format specification.

25.19 -

LFW

Format specification
Dataset examples
Supported annotations: tags, points.
Supported attributes:
- negative_pairs (should be defined for labels as text): list of image names with mismatched persons.
- positive_pairs (should be defined for labels as text): list of image names with matched persons.

Import LFW annotation

The uploaded annotations file should be a zip file with the following structure:

<archive_name>.zip/
    └── annotations/
        ├── landmarks.txt # list with landmark points for each image
        ├── pairs.txt # list of matched and mismatched pairs of person
        └── people.txt # optional file with a list of persons name

Full information about the content of annotation files is available here

Export LFW annotation

Downloaded file: a zip archive of the following structure:

<archive_name>.zip/
    └── images/ # if the option save images was selected
    │    ├── name1/
    │    │   ├── name1_0001.jpg
    │    │   ├── name1_0002.jpg
    │    │   ├── ...
    │    ├── name2/
    │    │   ├── name2_0001.jpg
    │    │   ├── name2_0002.jpg
    │    │   ├── ...
    │    ├── ...
    ├── landmarks.txt
    ├── pairs.txt
    └── people.txt

Example: create task with images and upload LFW annotations into it

This is one of the possible ways to create a task and add LFW annotations for it.

On the task creation page:
- Add labels that correspond to the names of the persons.
- For each label define text attributes with names positive_pairs and negative_pairs
- Add images using zip archive from local repository:

images.zip/
    ├── name1_0001.jpg
    ├── name1_0002.jpg
    ├── ...
    ├── name1_<N>.jpg
    ├── name2_0001.jpg
    ├── ...

On the annotation page: Upload annotation -> LFW 1.0 -> choose archive with structure that described in the import section.

26 - Task synchronization with a repository

Notice: this feature works only if a git repository was specified when the task was created.

At the end of the annotation process, a task is synchronized by clicking Synchronize on the task page. If the synchronization is successful, the button will change to Sychronized in blue:
The annotation is now in the repository in a temporary branch. The next step is to go to the repository and manually create a pull request to the main branch.
After merging the PR, when the annotation is saved in the main branch, the button changes to Merged and is highlighted in green.

If annotation in the task does not correspond annotations in the repository, the sync button will turn red:

27 - XML annotation format

When you want to download annotations from Computer Vision Annotation Tool (CVAT) you can choose one of several data formats. The document describes XML annotation format. Each format has X.Y version (e.g. 1.0). In general the major version (X) is incremented when the data format has incompatible changes and the minor version (Y) is incremented when the data format is slightly modified (e.g. it has one or several extra fields inside meta information). The document will describe all changes for all versions of XML annotation format.

Version 1.1

There are two different formats for images and video tasks at the moment. The both formats have a common part which is described below. From the previous version flipped tag was added. Also original_size tag was added for interpolation mode to specify frame size. In annotation mode each image tag has width and height attributes for the same purpose.

For what is rle, see Run-length encoding

<?xml version="1.0" encoding="utf-8"?>
<annotations>
  <version>1.1</version>
  <meta>
    <task>
      <id>Number: id of the task</id>
      <name>String: some task name</name>
      <size>Number: count of frames/images in the task</size>
      <mode>String: interpolation or annotation</mode>
      <overlap>Number: number of overlapped frames between segments</overlap>
      <bugtracker>String: URL on an page which describe the task</bugtracker>
      <flipped>Boolean: were images of the task flipped? (True/False)</flipped>
      <created>String: date when the task was created</created>
      <updated>String: date when the task was updated</updated>
      <labels>
        <label>
          <name>String: name of the label (e.g. car, person)</name>
          <type>String: any, bbox, cuboid, cuboid_3d, ellipse, mask, polygon, polyline, points, skeleton, tag</type>
          <attributes>
            <attribute>
              <name>String: attribute name</name>
              <mutable>Boolean: mutable (allow different values between frames)</mutable>
              <input_type>String: select, checkbox, radio, number, text</input_type>
              <default_value>String: default value</default_value>
              <values>String: possible values, separated by newlines
ex. value 2
ex. value 3</values>
            </attribute>
          </attributes>
          <svg>String: label representation in svg, only for skeletons</svg>
          <parent>String: label parent name, only for skeletons</parent>
        </label>
      </labels>
      <segments>
        <segment>
          <id>Number: id of the segment</id>
          <start>Number: first frame</start>
          <stop>Number: last frame</stop>
          <url>String: URL (e.g. http://cvat.example.com/?id=213)</url>
        </segment>
      </segments>
      <owner>
        <username>String: the author of the task</username>
        <email>String: email of the author</email>
      </owner>
      <original_size>
        <width>Number: frame width</width>
        <height>Number: frame height</height>
      </original_size>
    </task>
    <dumped>String: date when the annotation was dumped</dumped>
  </meta>
  ...
</annotations>

Annotation

Below you can find description of the data format for images tasks. On each image it is possible to have many different objects. Each object can have multiple attributes. If an annotation task is created with z_order flag then each object will have z_order attribute which is used to draw objects properly when they are intersected (if z_order is bigger the object is closer to camera). In previous versions of the format only box shape was available. In later releases mask, polygon, polyline, points, skeletons and tags were added. Please see below for more details:

<?xml version="1.0" encoding="utf-8"?>
<annotations>
  ...
  <image id="Number: id of the image (the index in lexical order of images)" name="String: path to the image"
    width="Number: image width" height="Number: image height">
    <box label="String: the associated label" xtl="Number: float" ytl="Number: float" xbr="Number: float" ybr="Number: float" occluded="Number: 0 - False, 1 - True" z_order="Number: z-order of the object">
      <attribute name="String: an attribute name">String: the attribute value</attribute>
      ...
    </box>
    <polygon label="String: the associated label" points="x0,y0;x1,y1;..." occluded="Number: 0 - False, 1 - True"
    z_order="Number: z-order of the object">
      <attribute name="String: an attribute name">String: the attribute value</attribute>
      ...
    </polygon>
    <polyline label="String: the associated label" points="x0,y0;x1,y1;..." occluded="Number: 0 - False, 1 - True"
    z_order="Number: z-order of the object">
      <attribute name="String: an attribute name">String: the attribute value</attribute>
      ...
    </polyline>
    <polyline label="String: the associated label" points="x0,y0;x1,y1;..." occluded="Number: 0 - False, 1 - True"
    z_order="Number: z-order of the object">
      <attribute name="String: an attribute name">String: the attribute value</attribute>
      ...
    </polyline>
    <points label="String: the associated label" points="x0,y0;x1,y1;..." occluded="Number: 0 - False, 1 - True"
    z_order="Number: z-order of the object">
      <attribute name="String: an attribute name">String: the attribute value</attribute>
      ...
    </points>
    <tag label="String: the associated label" source="manual or auto">
      <attribute name="String: an attribute name">String: the attribute value</attribute>
      ...
    </tag>
    <skeleton label="String: the associated label" z_order="Number: z-order of the object">
      <points label="String: the associated label" occluded="Number: 0 - False, 1 - True" outside="Number: 0 - False, 1 - True" points="x0,y0;x1,y1">
        <attribute name="String: an attribute name">String: the attribute value</attribute>
      </points>
      ...
      <attribute name="String: an attribute name">String: the attribute value</attribute>
      ...
    </skeleton>
    <mask label="String: the associated label" source="manual or auto" occluded="Number: 0 - False, 1 - True" rle="RLE mask" left="Number: left coordinate of the image where the mask begins" top="Number: top coordinate of the image where the mask begins" width="Number: width of the mask" height="Number: height of the mask" z_order="Number: z-order of the object">
    </mask>
    ...
  </image>
  ...
</annotations>

Example:

<?xml version="1.0" encoding="utf-8"?>
<annotations>
  <version>1.1</version>
  <meta>
    <task>
      <id>4</id>
      <name>segmentation</name>
      <size>27</size>
      <mode>annotation</mode>
      <overlap>0</overlap>
      <bugtracker></bugtracker>
      <flipped>False</flipped>
      <created>2018-09-25 11:34:24.617558+03:00</created>
      <updated>2018-09-25 11:38:27.301183+03:00</updated>
      <labels>
        <label>
          <name>car</name>
          <attributes>
          </attributes>
        </label>
        <label>
          <name>traffic_line</name>
          <attributes>
          </attributes>
        </label>
        <label>
          <name>wheel</name>
          <attributes>
          </attributes>
        </label>
        <label>
          <name>plate</name>
          <attributes>
          </attributes>
        </label>
        <label>
          <name>s1</name>
          <type>skeleton</type>
          <attributes>
          </attributes>
          <svg>&lt;line x1="36.87290954589844" y1="47.732025146484375" x2="86.87290954589844" y2="10.775501251220703" stroke="black" data-type="edge" data-node-from="2" stroke-width="0.5" data-node-to="3"&gt;&lt;/line&gt;&lt;line x1="25.167224884033203" y1="22.64841079711914" x2="36.87290954589844" y2="47.732025146484375" stroke="black" data-type="edge" data-node-from="1" stroke-width="0.5" data-node-to="2"&gt;&lt;/line&gt;&lt;circle r="1.5" stroke="black" fill="#b3b3b3" cx="25.167224884033203" cy="22.64841079711914" stroke-width="0.1" data-type="element node" data-element-id="1" data-node-id="1" data-label-name="1"&gt;&lt;/circle&gt;&lt;circle r="1.5" stroke="black" fill="#b3b3b3" cx="36.87290954589844" cy="47.732025146484375" stroke-width="0.1" data-type="element node" data-element-id="2" data-node-id="2" data-label-name="2"&gt;&lt;/circle&gt;&lt;circle r="1.5" stroke="black" fill="#b3b3b3" cx="86.87290954589844" cy="10.775501251220703" stroke-width="0.1" data-type="element node" data-element-id="3" data-node-id="3" data-label-name="3"&gt;&lt;/circle&gt;</svg>
        </label>
        <label>
          <name>1</name>
          <type>points</type>
          <attributes>
          </attributes>
          <parent>s1</parent>
        </label>
        <label>
          <name>2</name>
          <type>points</type>
          <attributes>
          </attributes>
          <parent>s1</parent>
        </label>
        <label>
          <name>3</name>
          <type>points</type>
          <attributes>
          </attributes>
          <parent>s1</parent>
        </label>
      </labels>
      <segments>
        <segment>
          <id>4</id>
          <start>0</start>
          <stop>26</stop>
          <url>http://localhost:8080/?id=4</url>
        </segment>
      </segments>
      <owner>
        <username>admin</username>
        <email></email>
      </owner>
    </task>
    <dumped>2018-09-25 11:38:28.799808+03:00</dumped>
  </meta>
  <image id="0" name="filename000.jpg" width="1600" height="1200">
    <box label="plate" xtl="797.33" ytl="870.92" xbr="965.52" ybr="928.94" occluded="0" z_order="4">
    </box>
    <polygon label="car" points="561.30,916.23;561.30,842.77;554.72,761.63;553.62,716.67;565.68,677.20;577.74,566.45;547.04,559.87;536.08,542.33;528.40,520.40;541.56,512.72;559.10,509.43;582.13,506.14;588.71,464.48;583.23,448.03;587.61,434.87;594.19,431.58;609.54,399.78;633.66,369.08;676.43,294.52;695.07,279.17;703.84,279.17;735.64,268.20;817.88,264.91;923.14,266.01;997.70,274.78;1047.04,283.55;1063.49,289.04;1090.90,330.70;1111.74,371.27;1135.86,397.59;1147.92,428.29;1155.60,435.97;1157.79,451.32;1156.69,462.28;1159.98,491.89;1163.27,522.59;1173.14,513.82;1199.46,516.01;1224.68,521.49;1225.77,544.52;1207.13,568.64;1181.91,576.32;1178.62,582.90;1177.53,619.08;1186.30,680.48;1199.46,711.19;1206.03,733.12;1203.84,760.53;1197.26,818.64;1199.46,840.57;1203.84,908.56;1192.88,930.49;1184.10,939.26;1162.17,944.74;1139.15,960.09;1058.01,976.54;1028.40,969.96;1002.09,972.15;931.91,974.35;844.19,972.15;772.92,972.15;729.06,967.77;713.71,971.06;685.20,973.25;659.98,968.86;644.63,984.21;623.80,983.12;588.71,985.31;560.20,966.67" occluded="0" z_order="1">
    </polygon>
    <polyline label="traffic_line" points="462.10,0.00;126.80,1200.00" occluded="0" z_order="3">
    </polyline>
    <polyline label="traffic_line" points="1212.40,0.00;1568.66,1200.00" occluded="0" z_order="2">
    </polyline>
    <points label="wheel" points="574.90,939.48;1170.16,907.90;1130.69,445.26;600.16,459.48" occluded="0" z_order="5">
    </points>
    <tag label="good_frame" source="manual">
    </tag>
    <skeleton label="s1" source="manual" z_order="0">
      <points label="1" occluded="0" source="manual" outside="0" points="54.47,94.81">
      </points>
      <points label="2" occluded="0" source="manual" outside="0" points="68.02,162.34">
      </points>
      <points label="3" occluded="0" source="manual" outside="0" points="125.87,62.85">
      </points>
    </skeleton>
    <mask label="car" source="manual" occluded="0" rle="3, 5, 7, 7, 5, 9, 3, 11, 2, 11, 2, 12, 1, 12, 1, 26, 1, 12, 1, 12, 2, 11, 3, 9, 5, 7, 7, 5, 3" left="707" top="888" width="13" height="15" z_order="0">
    </mask>
  </image>
</annotations>

Interpolation

Below you can find description of the data format for video tasks. The annotation contains tracks. Each track corresponds to an object which can be presented on multiple frames. The same object cannot be presented on the same frame in multiple locations. Each location of the object can have multiple attributes even if an attribute is immutable for the object it will be cloned for each location (a known redundancy).

<?xml version="1.0" encoding="utf-8"?>
<annotations>
  ...
  <track id="Number: id of the track (doesn't have any special meeting)" label="String: the associated label" source="manual or auto">
    <box frame="Number: frame" xtl="Number: float" ytl="Number: float" xbr="Number: float" ybr="Number: float" outside="Number: 0 - False, 1 - True" occluded="Number: 0 - False, 1 - True" keyframe="Number: 0 - False, 1 - True">
      <attribute name="String: an attribute name">String: the attribute value</attribute>
      ...
    </box>
    <polygon frame="Number: frame" points="x0,y0;x1,y1;..." outside="Number: 0 - False, 1 - True" occluded="Number: 0 - False, 1 - True" keyframe="Number: 0 - False, 1 - True">
      <attribute name="String: an attribute name">String: the attribute value</attribute>
    </polygon>
    <polyline frame="Number: frame" points="x0,y0;x1,y1;..." outside="Number: 0 - False, 1 - True" occluded="Number: 0 - False, 1 - True" keyframe="Number: 0 - False, 1 - True">
      <attribute name="String: an attribute name">String: the attribute value</attribute>
    </polyline>
    <points frame="Number: frame" points="x0,y0;x1,y1;..." outside="Number: 0 - False, 1 - True" occluded="Number: 0 - False, 1 - True" keyframe="Number: 0 - False, 1 - True">
      <attribute name="String: an attribute name">String: the attribute value</attribute>
    </points>
    <mask frame="Number: frame" outside="Number: 0 - False, 1 - True" occluded="Number: 0 - False, 1 - True" rle="RLE mask" left="Number: left coordinate of the image where the mask begins" top="Number: top coordinate of the image where the mask begins" width="Number: width of the mask" height="Number: height of the mask" z_order="Number: z-order of the object">
    </mask>
    ...
  </track>
  <track id="Number: id of the track (doesn't have any special meeting)" label="String: the associated label" source="manual or auto">
    <skeleton frame="Number: frame" keyframe="Number: 0 - False, 1 - True">
      <points label="String: the associated label" outside="Number: 0 - False, 1 - True" occluded="Number: 0 - False, 1 - True" keyframe="Number: 0 - False, 1 - True" points="x0,y0;x1,y1">
      </points>
      ...
    </skeleton>
    ...
  </track>
  ...
</annotations>

Example:

<?xml version="1.0" encoding="utf-8"?>
<annotations>
  <version>1.1</version>
  <meta>
    <task>
      <id>5</id>
      <name>interpolation</name>
      <size>4620</size>
      <mode>interpolation</mode>
      <overlap>5</overlap>
      <bugtracker></bugtracker>
      <flipped>False</flipped>
      <created>2018-09-25 12:32:09.868194+03:00</created>
      <updated>2018-09-25 16:05:05.619841+03:00</updated>
      <labels>
        <label>
          <name>person</name>
          <attributes>
          </attributes>
        </label>
        <label>
          <name>car</name>
          <attributes>
          </attributes>
        </label>
        <label>
          <name>s1</name>
          <type>skeleton</type>
          <attributes>
          </attributes>
          <svg>&lt;line x1="36.87290954589844" y1="47.732025146484375" x2="86.87290954589844" y2="10.775501251220703" stroke="black" data-type="edge" data-node-from="2" stroke-width="0.5" data-node-to="3"&gt;&lt;/line&gt;&lt;line x1="25.167224884033203" y1="22.64841079711914" x2="36.87290954589844" y2="47.732025146484375" stroke="black" data-type="edge" data-node-from="1" stroke-width="0.5" data-node-to="2"&gt;&lt;/line&gt;&lt;circle r="1.5" stroke="black" fill="#b3b3b3" cx="25.167224884033203" cy="22.64841079711914" stroke-width="0.1" data-type="element node" data-element-id="1" data-node-id="1" data-label-name="1"&gt;&lt;/circle&gt;&lt;circle r="1.5" stroke="black" fill="#b3b3b3" cx="36.87290954589844" cy="47.732025146484375" stroke-width="0.1" data-type="element node" data-element-id="2" data-node-id="2" data-label-name="2"&gt;&lt;/circle&gt;&lt;circle r="1.5" stroke="black" fill="#b3b3b3" cx="86.87290954589844" cy="10.775501251220703" stroke-width="0.1" data-type="element node" data-element-id="3" data-node-id="3" data-label-name="3"&gt;&lt;/circle&gt;</svg>
        </label>
        <label>
          <name>1</name>
          <type>points</type>
          <attributes>
          </attributes>
          <parent>s1</parent>
        </label>
        <label>
          <name>2</name>
          <type>points</type>
          <attributes>
          </attributes>
          <parent>s1</parent>
        </label>
        <label>
          <name>3</name>
          <type>points</type>
          <attributes>
          </attributes>
          <parent>s1</parent>
        </label>
      </labels>
      <segments>
        <segment>
          <id>5</id>
          <start>0</start>
          <stop>4619</stop>
          <url>http://localhost:8080/?id=5</url>
        </segment>
      </segments>
      <owner>
        <username>admin</username>
        <email></email>
      </owner>
      <original_size>
        <width>640</width>
        <height>480</height>
      </original_size>
    </task>
    <dumped>2018-09-25 16:05:07.134046+03:00</dumped>
  </meta>
  <track id="0" label="car">
    <polygon frame="0" points="324.79,213.16;323.74,227.90;347.42,237.37;371.11,217.37;350.05,190.00;318.47,191.58" outside="0" occluded="0" keyframe="1">
    </polygon>
    <polygon frame="1" points="324.79,213.16;323.74,227.90;347.42,237.37;371.11,217.37;350.05,190.00;318.47,191.58" outside="1" occluded="0" keyframe="1">
    </polygon>
    <polygon frame="6" points="305.32,237.90;312.16,207.90;352.69,206.32;355.32,233.16;331.11,254.74" outside="0" occluded="0" keyframe="1">
    </polygon>
    <polygon frame="7" points="305.32,237.90;312.16,207.90;352.69,206.32;355.32,233.16;331.11,254.74" outside="1" occluded="0" keyframe="1">
    </polygon>
    <polygon frame="13" points="313.74,233.16;331.11,220.00;359.53,243.16;333.21,283.16;287.95,274.74" outside="0" occluded="0" keyframe="1">
    </polygon>
    <polygon frame="14" points="313.74,233.16;331.11,220.00;359.53,243.16;333.21,283.16;287.95,274.74" outside="1" occluded="0" keyframe="1">
    </polygon>
  </track>
  <track id="1" label="s1" source="manual">
    <skeleton frame="0" keyframe="1" z_order="0">
      <points label="1" outside="0" occluded="0" keyframe="1" points="112.07,258.59">
      </points>
      <points label="2" outside="0" occluded="0" keyframe="1" points="127.87,333.23">
      </points>
      <points label="3" outside="0" occluded="0" keyframe="1" points="195.37,223.27">
      </points>
    </skeleton>
    <skeleton frame="1" keyframe="1" z_order="0">
      <points label="1" outside="1" occluded="0" keyframe="1" points="112.07,258.59">
      </points>
      <points label="2" outside="1" occluded="0" keyframe="1" points="127.87,333.23">
      </points>
      <points label="3" outside="1" occluded="0" keyframe="1" points="195.37,223.27">
      </points>
    </skeleton>
    <skeleton frame="6" keyframe="1" z_order="0">
      <points label="1" outside="0" occluded="0" keyframe="0" points="120.07,270.59">
      </points>
      <points label="2" outside="0" occluded="0" keyframe="0" points="140.87,350.23">
      </points>
      <points label="3" outside="0" occluded="0" keyframe="0" points="210.37,260.27">
      </points>
    </skeleton>
    <skeleton frame="7" keyframe="1" z_order="0">
      <points label="1" outside="1" occluded="0" keyframe="1" points="120.07,270.59">
      </points>
      <points label="2" outside="1" occluded="0" keyframe="1" points="140.87,350.23">
      </points>
      <points label="3" outside="1" occluded="0" keyframe="1" points="210.37,260.27">
      </points>
    </skeleton>
    <skeleton frame="13" keyframe="0" z_order="0">
      <points label="1" outside="0" occluded="0" keyframe="0" points="112.07,258.59">
      </points>
      <points label="2" outside="0" occluded="0" keyframe="0" points="127.87,333.23">
      </points>
      <points label="3" outside="0" occluded="0" keyframe="0" points="195.37,223.27">
      </points>
    </skeleton>
    <skeleton frame="14" keyframe="1" z_order="0">
      <points label="1" outside="1" occluded="0" keyframe="1" points="112.07,258.59">
      </points>
      <points label="2" outside="1" occluded="0" keyframe="1" points="127.87,333.23">
      </points>
      <points label="3" outside="1" occluded="0" keyframe="1" points="195.37,223.27">
      </points>
    </skeleton>
  </track>
</annotations>

28 - Shortcuts

List of available mouse and keyboard shortcuts.

Many UI elements have shortcut hints. Put your pointer to a required element to see it.

Shortcut	Common
	Main functions
`F1`	Open/hide the list of available shortcuts
`F2`	Go to the settings page or go back
`Ctrl+S`	Go to the settings page or go back
`Ctrl+Z`	Cancel the latest action related with objects
`Ctrl+Shift+Z` or `Ctrl+Y`	Cancel undo action
Hold `Mouse Wheel`	To move an image frame (for example, while drawing)
	Player
`F`	Go to the next frame
`D`	Go to the previous frame
`V`	Go forward with a step
`C`	Go backward with a step
`Right`	Search the next frame that satisfies to the filters or next frame which contain any objects
`Left`	Search the previous frame that satisfies to the filters or previous frame which contain any objects
`Space`	Start/stop automatic changing frames
` or `~`	Focus on the element to change the current frame
	Modes
`N`	Repeat the latest procedure of drawing with the same parameters
`M`	Activate or deactivate mode to merging shapes
`Alt+M`	Activate or deactivate mode to splitting shapes
`G`	Activate or deactivate mode to grouping shapes
`Shift+G`	Reset group for selected shapes (in group mode)
`Esc`	Cancel any active canvas mode
	Image operations
`Ctrl+R`	Change image angle (add 90 degrees)
`Ctrl+Shift+R`	Change image angle (subtract 90 degrees)
	Operations with objects
`Ctrl`	Switch automatic bordering for polygons and polylines during drawing/editing
Hold `Ctrl`	When the shape is active and fix it
`Alt+Click` on point	Deleting a point (used when hovering over a point of polygon, polyline, points)
`Shift+Click` on point	Editing a shape (used when hovering over a point of polygon, polyline or points)
`Right-Click` on shape	Display of an object element from objects sidebar
`T+L`	Change locked state for all objects in the sidebar
`L`	Change locked state for an active object
`T+H`	Change hidden state for objects in the sidebar
`H`	Change hidden state for an active object
`Q` or `/`	Change occluded property for an active object
`Del` or `Shift+Del`	Delete an active object. Use shift to force delete of locked objects
`-` or `_`	Put an active object “farther” from the user (decrease z axis value)
`+` or `=`	Put an active object “closer” to the user (increase z axis value)
`Ctrl+C`	Copy shape to CVAT internal clipboard
`Ctrl+V`	Paste a shape from internal CVAT clipboard
Hold `Ctrl` while pasting	When pasting shape from the buffer for multiple pasting.
`Ctrl+B`	Make a copy of the object on the following frames
`Ctrl+(0..9)`	Changes a label for an activated object or for the next drawn object if no objects are activated
	Operations are available only for track
`K`	Change keyframe property for an active track
`O`	Change outside property for an active track
`R`	Go to the next keyframe of an active track
`E`	Go to the previous keyframe of an active track
	Attribute annotation mode
`Up Arrow`	Go to the next attribute (up)
`Down Arrow`	Go to the next attribute (down)
`Tab`	Go to the next annotated object in current frame
`Shift+Tab`	Go to the previous annotated object in current frame
`<number>`	Assign a corresponding value to the current attribute
	Standard 3d mode
`Shift+Up Arrow`	Increases camera roll angle
`Shift+Down Arrow`	Decreases camera roll angle
`Shift+Left Arrow`	Decreases camera pitch angle
`Shift+Right Arrow`	Increases camera pitch angle
`Alt+O`	Move the camera up
`Alt+U`	Move the camera down
`Alt+J`	Move the camera left
`Alt+L`	Move the camera right
`Alt+I`	Performs zoom in
`Alt+K`	Performs zoom out

29 - Filter

Guide to using the Filter feature in CVAT.

There are some reasons to use the feature:

When you use a filter, objects that don’t match the filter will be hidden.
The fast navigation between frames which have an object of interest. Use the Left Arrow / Right Arrow keys for this purpose or customize the UI buttons by right-clicking and select switching by filter. If there are no objects which correspond to the filter, you will go to the previous / next frame which contains any annotated objects.

To apply filters you need to click on the button on the top panel.

Create a filter

It will open a window for filter input. Here you will find two buttons: Add rule and Add group.

Rules

The Add rule button adds a rule for objects display. A rule may use the following properties:

Supported properties for annotation

Properties	Supported values	Description
`Label`	all the label names that are in the task	label name
`Type`	shape, track or tag	type of object
`Shape`	all shape types	type of shape
`Occluded`	true or false	occluded (read more)
`Width`	number of px or field	shape width
`Height`	number of px or field	shape height
`ServerID`	number or field	ID of the object on the server (You can find out by forming a link to the object through the Action menu)
`ObjectID`	number or field	ID of the object in your client (indicated on the objects sidebar)
`Attributes`	some other fields including attributes with a similar type or a specific attribute value	any fields specified by a label

Supported operators for properties

== - Equally; != - Not equal; > - More; >= - More or equal; < - Less; <= - Less or equal;

Any in; Not in - these operators allow you to set multiple values in one rule;

Is empty; is not empty – these operators don’t require to input a value.

Between; Not between – these operators allow you to choose a range between two values.

Like - this operator indicate that the property must contain a value.

Starts with; Ends with - filter by beginning or end.

Some properties support two types of values that you can choose:

You can add multiple rules, to do so click the add rule button and set another rule. Once you’ve set a new rule, you’ll be able to choose which operator they will be connected by: And or Or.

All subsequent rules will be joined by the chosen operator. Click Submit to apply the filter or if you want multiple rules to be connected by different operators, use groups.

Groups

To add a group, click the Add group button. Inside the group you can create rules or groups.

If there is more than one rule in the group, they can be connected by And or Or operators. The rule group will work as well as a separate rule outside the group and will be joined by an operator outside the group. You can create groups within other groups, to do so you need to click the add group button within the group.

You can move rules and groups. To move the rule or group, drag it by the button. To remove the rule or group, click on the Delete button.

If you activate the Not button, objects that don’t match the group will be filtered out. Click Submit to apply the filter. The Cancel button undoes the filter. The Clear filter button removes the filter.

Once applied filter automatically appears in Recent used list. Maximum length of the list is 10.

Sort and filter lists

On the projects, task list on the project page, tasks, jobs, and cloud storage pages, you can use sorting and filters.

The applied filter and sorting will be displayed in the URL of your browser, Thus, you can share the page with sorting and filter applied.

Sort by

You can sort by the following parameters:

Jobs list: ID, assignee, updated date, stage, state, task ID, project ID, task name, project name.
Tasks list or tasks list on project page: ID, owner, status, assignee, updated date, subset, mode, dimension, project ID, name, project name.
Projects list: ID, assignee, owner, status, name, updated date.
Cloud storages list: ID, provider type, updated date, display name, resource, credentials, owner, description.

To apply sorting, drag the parameter to the top area above the horizontal bar. The parameters below the horizontal line will not be applied. By moving the parameters you can change the priority, first of all sorting will occur according to the parameters that are above.

Pressing the Sort button switches Ascending sort/Descending sort.

Quick filters

Quick Filters contain several frequently used filters:

Assigned to me - show only those projects, tasks or jobs that are assigned to you.
Owned by me - show only those projects or tasks that are owned by you.
Not completed - show only those projects, tasks or jobs that have a status other than completed.
AWS storages - show only AWS cloud storages
Azure storages - show only Azure cloud storages
Google cloud storages - show only Google cloud storages

Date and time selection

When creating a Last updated rule, you can select the date and time by using the selection window.

You can select the year and month using the arrows or by clicking on the year and month. To select a day, click on it in the calendar, To select the time, you can select the hours and minutes using the scrolling list. Or you can select the current date and time by clicking the Now button. To apply, click Ok.

30 - Review

Guide to using the Review mode for task validation.

A special mode to check the annotation allows you to point to an object or area in the frame containing an error. Review mode is not available in 3D tasks.

Review

To conduct a review, you need to change the stage to validation for the desired job on the task page and assign a user who will conduct the check. Now the job will open in a fashion review. You can also switch to the Review mode using the UI switcher on the top panel.

Review mode is a UI mode, there is a special Issue tool which you can use to identify objects or areas in the frame and describe the issue.

To do this, first click Open an issue icon on the controls sidebar:
Then click on a place in the frame to highlight the place or highlight the area by holding the left mouse button and describe the issue. To select an object, right-click on it and select Open an issue or select one of several quick issues. The object or area will be shaded in red.
The created issue will appear in the workspace and in the Issues tab on the objects sidebar.
Once all the issues are marked, save the annotation, open the menu and select job state rejected or completed.

After the review, other users will be able to see the issues, comment on each issue and change the status of the issue to Resolved.

After the issues are fixed select Finish the job from the menu to finish the task. Or you can switch stage to acceptance on the task page.

Resolve issues

After review, you may see the issues in the Issues tab in the object sidebar.

You can use the arrows on the Issues tab to navigate the frames that contain issues.
In the workspace you can click on issue, you can send a comment on the issue or, if the issue is resolved, change the status to Resolve. You can remove the issue by clicking Remove (if your account have the appropriate permissions).
If few issues were created in one place you can access them by hovering over issue and scrolling the mouse wheel.

If the issue is resolved, you can reopen the issue by clicking the Reopen button.

31 - Contextual images

Contextual images of the task

Contextual images are additional images that provide context or additional information related to the primary image.

Use them to add extra contextual about the object to improve the accuracy of annotation.

Contextual images are available for 2D and 3D tasks.

See:

Folder structure
Data format
Contextual images

Folder structure

To add contextual images to the task, you need to organize the images folder.

Before uploading the archive to CVAT, do the following:

In the folder with the images for annotation, create a folder: related_images.
Add to the related_images a subfolder with the same name as the primary image to which it should be linked.
Place the contextual image(s) within the subfolder created in step 2.
Add folder to the archive.
Create task.

Data format

Example file structure for 2D and 3D tasks:

  root_directory
    image_1_to_be_annotated.jpg
    image_2_to_be_annotated.jpg
    related_images/
      image_1_to_be_annotated_jpg/
        context_image_for_image_1.jpg
      image_2_to_be_annotated_jpg/
        context_image_for_image_2.jpg
     subdirectory_example/
        image_3_to_be_annotated.jpg
         related_images/
          image_3_to_be_annotated_jpg/
             context_image_for_image_3.jpg

 root_directory
    image_1_to_be_annotated.pcd
    image_2_to_be_annotated.pcd
     related_images/
        image_1_to_be_annotated_pcd/
           context_image_for_image_1.jpg
        image_2_to_be_annotated_pcd/
           context_image_for_image_2.jpg

 /any_directory
    pointcloud.pcd
    pointcloud.jpg
/any_other_directory
    /any_subdirectory
        pointcloud.pcd
        pointcloud.png

 /image_00
    /data
        /0000000000.png
        /0000000001.png
        /0000000002.png
        /0000000003.png
/image_01
    /data
        /0000000000.png
        /0000000001.png
        /0000000002.png
        /0000000003.png
/image_02
    /data
        /0000000000.png
        /0000000001.png
        /0000000002.png
        /0000000003.png
/image_N
    /data
        /0000000000.png
        /0000000001.png
        /0000000002.png
        /0000000003.png
/velodyne_points
    /data
        /0000000000.bin
        /0000000001.bin
        /0000000002.bin
        /0000000003.bin

For KITTI: image_00, image_01, image_02, image_N, (where N is any number <= 12) are context images.
For 3D option 3: a regular image file placed near a .pcd file with the same name is considered to be a context image.

For more general information about 3D data formats, see 3D data formats.

Contextual images

The maximum amount of contextual images is twelve.

By default they will be positioned on the right side of the main image.

Note: By default, only three contextual images will be visible.

contex_images_1

When you add contextual images to the set, small toolbar will appear on the top of the screen, with the following elements:

Element	Description
	Fit views. Click to restore the layout to its original appearance. If you’ve expanded any images in the layout, they will returned to their original size. This won’t affect the number of context images on the screen.
	Add new image. Click to add context image to the layout.
	Reload layout. Click to reload layout to the default view. Note, that this action can change the number of context images resetting them back to three.

Element

Description

Fit views. Click to restore the layout to its original appearance.

If you’ve expanded any images in the layout, they will returned to their original size.

This won’t affect the number of context images on the screen.

Add new image. Click to add context image to the layout.

Reload layout. Click to reload layout to the default view.

Note, that this action can change the number of context images resetting them back to three.

Each context image has the following elements:

contex_images_2

Element	Description
1	Full screen. Click to expand the contextual image in to the full screen mode. Click again to revert contextual image to windowed mode.
2	Move contextual image. Hold and move contextual image to the other place on the screen.
3	Name. Unique contextual image name
4	Select contextual image. Click to open a horisontal listview of all available contextual images. Click on one to select.
5	Close. Click to remove image from contextual images menu.
6	Extend Hold and pull to extend the image.

32 - Shape grouping

Grouping multiple shapes during annotation.

This feature allows us to group several shapes.

You may use the Group Shapes button or shortcuts:

G — start selection / end selection in group mode
Esc — close group mode
Shift+G — reset group for selected shapes

You may select shapes clicking on them or selecting an area.

Grouped shapes will have group_id filed in dumped annotation.

Also you may switch color distribution from an instance (default) to a group. You have to switch Color By Group checkbox for that.

Shapes that don’t have group_id, will be highlighted in white.

33 - Dataset Manifest

Overview

When we create a new task in CVAT, we need to specify where to get the input data from. CVAT allows to use different data sources, including local file uploads, a mounted file share on the server, cloud storages and remote URLs. In some cases CVAT needs to have extra information about the input data. This information can be provided in Dataset manifest files. They are mainly used when working with cloud storages to reduce the amount of network traffic used and speed up the task creation process. However, they can also be used in other cases, which will be explained below.

A dataset manifest file is a text file in the JSONL format. These files can be created automatically with the special command-line tool, or manually, following the manifest file format specification.

How and when to use manifest files

Manifest files can be used in the following cases:

A video file or a set of images is used as the data source and the caching mode is enabled. Read more
The data is located in a cloud storage. Read more
The predefined file sorting method is specified. Read more

The predefined sorting method

Independently of the file source being used, when the predefined sorting method is selected in the task configuration, the source files will be ordered according to the .jsonl manifest file, if it is found in the input list of files. If a manifest is not found, the order provided in the input file list is used.

For image archives (e.g. .zip), a manifest file (*.jsonl) is required when using the predefined file ordering. A manifest file must be provided next to the archive in the input list of files, it must not be inside the archive.

If there are multiple manifest files in the input file list, an error will be raised.

How to generate manifest files

CVAT provides a dedicated Python tool to generate manifest files. The source code can be found here.

Using the tool is the recommended way to create manifest files for you data. The data must be available locally to the tool to generate manifest.

Usage

usage: create.py [-h] [--force] [--output-dir .] source

positional arguments:
  source                Source paths

optional arguments:
  -h, --help            show this help message and exit
  --force               Use this flag to prepare the manifest file for video data
                        if by default the video does not meet the requirements
                        and a manifest file is not prepared
  --output-dir OUTPUT_DIR
                        Directory where the manifest file will be saved

Use the script from a Docker image

This is the recommended way to use the tool.

The script can be used from the cvat/server image:

docker run -it --rm -u "$(id -u)":"$(id -g)" \
  -v "${PWD}":"/local" \
  --entrypoint python3 \
  cvat/server \
  utils/dataset_manifest/create.py --output-dir /local /local/<path/to/sources>

Make sure to adapt the command to your file locations.

Use the script directly

Ubuntu 20.04

Install dependencies:

# General
sudo apt-get update && sudo apt-get --no-install-recommends install -y \
    python3-dev python3-pip python3-venv pkg-config

# Library components
sudo apt-get install --no-install-recommends -y \
    libavformat-dev libavcodec-dev libavdevice-dev \
    libavutil-dev libswscale-dev libswresample-dev libavfilter-dev

Create an environment and install the necessary python modules:

python3 -m venv .env
. .env/bin/activate
pip install -U pip
pip install -r utils/dataset_manifest/requirements.in

Please note that if used with video this way, the results may be different from what would the server decode. It is related to the ffmpeg library version. For this reason, using the Docker-based version of the tool is recommended.

Examples

Create a dataset manifest in the current directory with video which contains enough keyframes:

python utils/dataset_manifest/create.py ~/Documents/video.mp4

Create a dataset manifest with video which does not contain enough keyframes:

python utils/dataset_manifest/create.py --force --output-dir ~/Documents ~/Documents/video.mp4

Create a dataset manifest with images:

python utils/dataset_manifest/create.py --output-dir ~/Documents ~/Documents/images/

Create a dataset manifest with pattern (may be used *, ?, []):

python utils/dataset_manifest/create.py --output-dir ~/Documents "/home/${USER}/Documents/**/image*.jpeg"

Create a dataset manifest using Docker image:

docker run -it --rm -u "$(id -u)":"$(id -g)" \
  -v ~/Documents/data/:${HOME}/manifest/:rw \
  --entrypoint '/usr/bin/bash' \
  cvat/server \
  utils/dataset_manifest/create.py --output-dir ~/manifest/ ~/manifest/images/

File format

The dataset manifest files are text files in JSONL format. These files have 2 sub-formats: for video and for images and 3d data.

Each top-level entry enclosed in curly braces must use 1 string, no empty strings is allowed. The formatting in the descriptions below is only for demonstration.

Dataset manifest for video

The file describes a single video.

pts - time at which the frame should be shown to the user checksum - md5 hash sum for the specific image/frame decoded

{ "version": <string, version id> }
{ "type": "video" }
{ "properties": {
  "name": <string, filename>,
  "resolution": [<int, width>, <int, height>],
  "length": <int, frame count>
}}
{
  "number": <int, frame number>,
  "pts": <int, frame pts>,
  "checksum": <string, md5 frame hash>
} (repeatable)

Dataset manifest for images and other data types

The file describes an ordered set of images and 3d point clouds.

name - file basename and leading directories from the dataset root checksum - md5 hash sum for the specific image/frame decoded

{ "version": <string, version id> }
{ "type": "images" }
{
  "name": <string, image filename>,
  "extension": <string, . + file extension>,
  "width": <int, width>,
  "height": <int, height>,
  "meta": <dict, optional>,
  "checksum": <string, md5 hash, optional>
} (repeatable)

Example files

Manifest for a video

{"version":"1.0"}
{"type":"video"}
{"properties":{"name":"video.mp4","resolution":[1280,720],"length":778}}
{"number":0,"pts":0,"checksum":"17bb40d76887b56fe8213c6fded3d540"}
{"number":135,"pts":486000,"checksum":"9da9b4d42c1206d71bf17a7070a05847"}
{"number":270,"pts":972000,"checksum":"a1c3a61814f9b58b00a795fa18bb6d3e"}
{"number":405,"pts":1458000,"checksum":"18c0803b3cc1aa62ac75b112439d2b62"}
{"number":540,"pts":1944000,"checksum":"4551ecea0f80e95a6c32c32e70cac59e"}
{"number":675,"pts":2430000,"checksum":"0e72faf67e5218c70b506445ac91cdd7"}

Manifest for a dataset with images

{"version":"1.0"}
{"type":"images"}
{"name":"image1","extension":".jpg","width":720,"height":405,"meta":{"related_images":[]},"checksum":"548918ec4b56132a5cff1d4acabe9947"}
{"name":"image2","extension":".jpg","width":183,"height":275,"meta":{"related_images":[]},"checksum":"4b4eefd03cc6a45c1c068b98477fb639"}
{"name":"image3","extension":".jpg","width":301,"height":167,"meta":{"related_images":[]},"checksum":"0e454a6f4a13d56c82890c98be063663"}

34 - Data preparation on the fly

Description

Data on the fly processing is a way of working with data, the main idea of which is as follows: when creating a task, the minimum necessary meta information is collected. This meta information allows in the future to create necessary chunks when receiving a request from a client.

Generated chunks are stored in a cache of the limited size with a policy of evicting less popular items.

When a request is received from a client, the required chunk is searched for in the cache. If the chunk does not exist yet, it is created using prepared meta information and then put into the cache.

This method of working with data allows:

reduce the task creation time.
store data in a cache of the limited size with a policy of evicting less popular items.

Unfortunately, this method has several drawbacks:

The first access to the data will take more time.
It will not work for some videos, even if they have a valid manifest file. If there are not enough keyframes in the video for smooth video decoding, the task data chunks will be created with the default method, i.e. during the task creation.
If the data has not been cached yet, and is not reachable during the access time, it cannot be retrieved.

How to use

To enable or disable this feature for a new task, use the Use Cache toggle in the task configuration.

Uploading a manifest with data

When creating a task, you can upload a manifest.jsonl file along with the video or dataset with images. You can see how to prepare it here.

35 - Serverless tutorial

Introduction

Leveraging the power of computers to solve daily routine problems, fix mistakes, and find information has become second nature. It is therefore natural to use computing power in annotating datasets. There are multiple publicly available DL models for classification, object detection, and semantic segmentation which can be used for data annotation. Whilst some of these publicly available DL models can be found on CVAT, it is relatively simple to integrate your privately trained ML/DL model into CVAT.

With the imperfection of the world, alongside the unavailability of a silver bullet that can solve all our problems; publicly available DL models cannot be used when we want to detect niche or specific objects on which these publicly available models were not trained. As annotation requirements can be sometimes strict, automatically annotated objects cannot be accepted as it is, and it is easier to annotate them from scratch. With these limitations in mind, a DL solution that can perfectly annotate 50% of your data equates to reducing manual annotation by half.

Since we know DL models can help us to annotate faster, how then do we use them? In CVAT all such DL models are implemented as serverless functions using the Nuclio serverless platform. There are multiple implemented functions that can be found in the serverless directory such as Mask RCNN, Faster RCNN, SiamMask, Inside Outside Guidance, Deep Extreme Cut, etc. Follow the installation guide to build and deploy these serverless functions. See the user guide to understand how to use these functions in the UI to automatically annotate data.

What is a serverless function and why is it used for automatic annotation in CVAT? Let’s assume that you have a DL model and want to use it for AI-assisted annotation. The naive approach is to implement a Python script which uses the DL model to prepare a file with annotations in a public format like MS COCO or Pascal VOC. After that you can upload the annotation file into CVAT. It works but it is not user-friendly. How to make CVAT run the script for you?

You can pack the script with your DL model into a container which provides a standard interface for interacting with it. One way to do that is to use the function as a service approach. Your script becomes a function inside cloud infrastructure which can be called over HTTP. The Nuclio serverless platform helps us to implement and manage such functions.

CVAT supports Nuclio out of the box if it is built properly. See the installation guide for instructions. Thus if you deploy a serverless function, the CVAT server can see it and call it with appropriate arguments. Of course there are some tricks how to create serverless functions for CVAT and we will discuss them in next sections of the tutorial.

Using builtin DL models in practice

In the tutorial it is assumed that you already have the cloned CVAT GitHub repo. To build CVAT with serverless support you need to run docker compose command with specific configuration files. In the case it is docker-compose.serverless.yml. It has necessary instructions how to build and deploy Nuclio platform as a docker container and enable corresponding support in CVAT.

docker compose -f docker-compose.yml -f docker-compose.dev.yml -f components/serverless/docker-compose.serverless.yml up -d --build

docker compose -f docker-compose.yml -f docker-compose.dev.yml -f components/serverless/docker-compose.serverless.yml ps

   Name                 Command                  State                            Ports
-------------------------------------------------------------------------------------------------------------
cvat         /usr/bin/supervisord             Up             8080/tcp
cvat_db      docker-entrypoint.sh postgres    Up             5432/tcp
cvat_proxy   /docker-entrypoint.sh /bin ...   Up             0.0.0.0:8080->80/tcp,:::8080->80/tcp
cvat_redis   docker-entrypoint.sh redis ...   Up             6379/tcp
cvat_ui      /docker-entrypoint.sh ngin ...   Up             80/tcp
nuclio       /docker-entrypoint.sh sh - ...   Up (healthy)   80/tcp, 0.0.0.0:8070->8070/tcp,:::8070->8070/tcp

Next step is to deploy builtin serverless functions using Nuclio command line tool (aka nuctl). It is assumed that you followed the installation guide and nuctl is already installed on your operating system. Run the following command to check that it works. In the beginning you should not have any deployed serverless functions.

nuctl get functions

No functions found

Let’s see on examples how to use DL models for annotation in different computer vision tasks.

Tracking using SiamMask

In this use case a user needs to annotate all individual objects on a video as tracks. Basically for every object we need to know its location on every frame.

First step is to deploy SiamMask. The deployment process can depend on your operating system. On Linux you can use serverless/deploy_cpu.sh auxiliary script, but below we are using nuctl directly.

nuctl create project cvat

nuctl deploy --project-name cvat --path "./serverless/pytorch/foolwood/siammask/nuclio" --platform local

21.05.07 13:00:22.233                     nuctl (I) Deploying function {"name": ""}
21.05.07 13:00:22.233                     nuctl (I) Building {"versionInfo": "Label: 1.5.16, Git commit: ae43a6a560c2bec42d7ccfdf6e8e11a1e3cc3774, OS: linux, Arch: amd64, Go version: go1.14.3", "name": ""}
21.05.07 13:00:22.652                     nuctl (I) Cleaning up before deployment {"functionName": "pth-foolwood-siammask"}
21.05.07 13:00:22.705                     nuctl (I) Staging files and preparing base images
21.05.07 13:00:22.706                     nuctl (I) Building processor image {"imageName": "cvat/pth.foolwood.siammask:latest"}
21.05.07 13:00:22.706     nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/handler-builder-python-onbuild:1.5.16-amd64"}
21.05.07 13:00:26.351     nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/uhttpc:0.0.1-amd64"}
21.05.07 13:00:29.819            nuctl.platform (I) Building docker image {"image": "cvat/pth.foolwood.siammask:latest"}
21.05.07 13:00:30.103            nuctl.platform (I) Pushing docker image into registry {"image": "cvat/pth.foolwood.siammask:latest", "registry": ""}
21.05.07 13:00:30.103            nuctl.platform (I) Docker image was successfully built and pushed into docker registry {"image": "cvat/pth.foolwood.siammask:latest"}
21.05.07 13:00:30.104                     nuctl (I) Build complete {"result": {"Image":"cvat/pth.foolwood.siammask:latest","UpdatedFunctionConfig":{"metadata":{"name":"pth-foolwood-siammask","namespace":"nuclio","labels":{"nuclio.io/project-name":"cvat"},"annotations":{"framework":"pytorch","name":"SiamMask","spec":"","type":"tracker"}},"spec":{"description":"Fast Online Object Tracking and Segmentation","handler":"main:handler","runtime":"python:3.6","env":[{"name":"PYTHONPATH","value":"/opt/nuclio/SiamMask:/opt/nuclio/SiamMask/experiments/siammask_sharp"}],"resources":{},"image":"cvat/pth.foolwood.siammask:latest","targetCPU":75,"triggers":{"myHttpTrigger":{"class":"","kind":"http","name":"myHttpTrigger","maxWorkers":2,"workerAvailabilityTimeoutMilliseconds":10000,"attributes":{"maxRequestBodySize":33554432}}},"build":{"image":"cvat/pth.foolwood.siammask","baseImage":"continuumio/miniconda3","directives":{"preCopy":[{"kind":"WORKDIR","value":"/opt/nuclio"},{"kind":"RUN","value":"conda create -y -n siammask python=3.6"},{"kind":"SHELL","value":"[\"conda\", \"run\", \"-n\", \"siammask\", \"/bin/bash\", \"-c\"]"},{"kind":"RUN","value":"git clone https://github.com/foolwood/SiamMask.git"},{"kind":"RUN","value":"pip install -r SiamMask/requirements.txt jsonpickle"},{"kind":"RUN","value":"conda install -y gcc_linux-64"},{"kind":"RUN","value":"cd SiamMask \u0026\u0026 bash make.sh \u0026\u0026 cd -"},{"kind":"RUN","value":"wget -P SiamMask/experiments/siammask_sharp http://www.robots.ox.ac.uk/~qwang/SiamMask_DAVIS.pth"},{"kind":"ENTRYPOINT","value":"[\"conda\", \"run\", \"-n\", \"siammask\"]"}]},"codeEntryType":"image"},"platform":{"attributes":{"mountMode":"volume","restartPolicy":{"maximumRetryCount":3,"name":"always"}}},"readinessTimeoutSeconds":60,"securityContext":{},"eventTimeout":"30s"}}}}
21.05.07 13:00:31.387            nuctl.platform (I) Waiting for function to be ready {"timeout": 60}
21.05.07 13:00:32.796                     nuctl (I) Function deploy complete {"functionName": "pth-foolwood-siammask", "httpPort": 49155}

nuctl get functions

  NAMESPACE |         NAME          | PROJECT | STATE | NODE PORT | REPLICAS
  nuclio    | pth-foolwood-siammask | cvat    | ready |     49155 | 1/1

Let’s see how it works in the UI. Go to the models tab and check that you can see SiamMask in the list. If you cannot, it means that there are some problems. Go to one of our public channels and ask for help.

Models list with SiamMask

After that, go to the new task page and create a task with this video file. You can choose any task name, any labels, and even another video file if you like. In this case, the Remote sources option was used to specify the video file. Press submit button at the end to finish the process.

Create a video annotation task

Open the task and use AI tools to start tracking an object. Draw a bounding box around an object, and sequentially switch through the frame and correct the restrictive box if necessary.

Start tracking an object

Finally you will get bounding boxes.

SiamMask results

SiamMask model is more optimized to work on Nvidia GPUs. For more information about deploying the model for the GPU, read on.

Object detection using YOLO-v3

First of all let’s deploy the DL model. The deployment process is similar for all serverless functions. Need to run nuctl deploy command with appropriate arguments. To simplify the process, you can use serverless/deploy_cpu.sh command. Inference of the serverless function is optimized for CPU using Intel OpenVINO framework.

serverless/deploy_cpu.sh serverless/openvino/omz/public/yolo-v3-tf/

Deploying serverless/openvino/omz/public/yolo-v3-tf function...
21.07.12 15:55:17.314                     nuctl (I) Deploying function {"name": ""}
21.07.12 15:55:17.314                     nuctl (I) Building {"versionInfo": "Label: 1.5.16, Git commit: ae43a6a560c2bec42d7ccfdf6e8e11a1e3cc3774, OS: linux, Arch: amd64, Go version: go1.14.3", "name": ""}
21.07.12 15:55:17.682                     nuctl (I) Cleaning up before deployment {"functionName": "openvino-omz-public-yolo-v3-tf"}
21.07.12 15:55:17.739                     nuctl (I) Staging files and preparing base images
21.07.12 15:55:17.743                     nuctl (I) Building processor image {"imageName": "cvat/openvino.omz.public.yolo-v3-tf:latest"}
21.07.12 15:55:17.743     nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/handler-builder-python-onbuild:1.5.16-amd64"}
21.07.12 15:55:21.048     nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/uhttpc:0.0.1-amd64"}
21.07.12 15:55:24.595            nuctl.platform (I) Building docker image {"image": "cvat/openvino.omz.public.yolo-v3-tf:latest"}
21.07.12 15:55:30.359            nuctl.platform (I) Pushing docker image into registry {"image": "cvat/openvino.omz.public.yolo-v3-tf:latest", "registry": ""}
21.07.12 15:55:30.359            nuctl.platform (I) Docker image was successfully built and pushed into docker registry {"image": "cvat/openvino.omz.public.yolo-v3-tf:latest"}
21.07.12 15:55:30.359                     nuctl (I) Build complete {"result": {"Image":"cvat/openvino.omz.public.yolo-v3-tf:latest","UpdatedFunctionConfig":{"metadata":{"name":"openvino-omz-public-yolo-v3-tf","namespace":"nuclio","labels":{"nuclio.io/project-name":"cvat"},"annotations":{"framework":"openvino","name":"YOLO v3","spec":"[\n  { \"id\": 0, \"name\": \"person\" },\n  { \"id\": 1, \"name\": \"bicycle\" },\n  { \"id\": 2, \"name\": \"car\" },\n  { \"id\": 3, \"name\": \"motorbike\" },\n  { \"id\": 4, \"name\": \"aeroplane\" },\n  { \"id\": 5, \"name\": \"bus\" },\n  { \"id\": 6, \"name\": \"train\" },\n  { \"id\": 7, \"name\": \"truck\" },\n  { \"id\": 8, \"name\": \"boat\" },\n  { \"id\": 9, \"name\": \"traffic light\" },\n  { \"id\": 10, \"name\": \"fire hydrant\" },\n  { \"id\": 11, \"name\": \"stop sign\" },\n  { \"id\": 12, \"name\": \"parking meter\" },\n  { \"id\": 13, \"name\": \"bench\" },\n  { \"id\": 14, \"name\": \"bird\" },\n  { \"id\": 15, \"name\": \"cat\" },\n  { \"id\": 16, \"name\": \"dog\" },\n  { \"id\": 17, \"name\": \"horse\" },\n  { \"id\": 18, \"name\": \"sheep\" },\n  { \"id\": 19, \"name\": \"cow\" },\n  { \"id\": 20, \"name\": \"elephant\" },\n  { \"id\": 21, \"name\": \"bear\" },\n  { \"id\": 22, \"name\": \"zebra\" },\n  { \"id\": 23, \"name\": \"giraffe\" },\n  { \"id\": 24, \"name\": \"backpack\" },\n  { \"id\": 25, \"name\": \"umbrella\" },\n  { \"id\": 26, \"name\": \"handbag\" },\n  { \"id\": 27, \"name\": \"tie\" },\n  { \"id\": 28, \"name\": \"suitcase\" },\n  { \"id\": 29, \"name\": \"frisbee\" },\n  { \"id\": 30, \"name\": \"skis\" },\n  { \"id\": 31, \"name\": \"snowboard\" },\n  { \"id\": 32, \"name\": \"sports ball\" },\n  { \"id\": 33, \"name\": \"kite\" },\n  { \"id\": 34, \"name\": \"baseball bat\" },\n  { \"id\": 35, \"name\": \"baseball glove\" },\n  { \"id\": 36, \"name\": \"skateboard\" },\n  { \"id\": 37, \"name\": \"surfboard\" },\n  { \"id\": 38, \"name\": \"tennis racket\" },\n  { \"id\": 39, \"name\": \"bottle\" },\n  { \"id\": 40, \"name\": \"wine glass\" },\n  { \"id\": 41, \"name\": \"cup\" },\n  { \"id\": 42, \"name\": \"fork\" },\n  { \"id\": 43, \"name\": \"knife\" },\n  { \"id\": 44, \"name\": \"spoon\" },\n  { \"id\": 45, \"name\": \"bowl\" },\n  { \"id\": 46, \"name\": \"banana\" },\n  { \"id\": 47, \"name\": \"apple\" },\n  { \"id\": 48, \"name\": \"sandwich\" },\n  { \"id\": 49, \"name\": \"orange\" },\n  { \"id\": 50, \"name\": \"broccoli\" },\n  { \"id\": 51, \"name\": \"carrot\" },\n  { \"id\": 52, \"name\": \"hot dog\" },\n  { \"id\": 53, \"name\": \"pizza\" },\n  { \"id\": 54, \"name\": \"donut\" },\n  { \"id\": 55, \"name\": \"cake\" },\n  { \"id\": 56, \"name\": \"chair\" },\n  { \"id\": 57, \"name\": \"sofa\" },\n  { \"id\": 58, \"name\": \"pottedplant\" },\n  { \"id\": 59, \"name\": \"bed\" },\n  { \"id\": 60, \"name\": \"diningtable\" },\n  { \"id\": 61, \"name\": \"toilet\" },\n  { \"id\": 62, \"name\": \"tvmonitor\" },\n  { \"id\": 63, \"name\": \"laptop\" },\n  { \"id\": 64, \"name\": \"mouse\" },\n  { \"id\": 65, \"name\": \"remote\" },\n  { \"id\": 66, \"name\": \"keyboard\" },\n  { \"id\": 67, \"name\": \"cell phone\" },\n  { \"id\": 68, \"name\": \"microwave\" },\n  { \"id\": 69, \"name\": \"oven\" },\n  { \"id\": 70, \"name\": \"toaster\" },\n  { \"id\": 71, \"name\": \"sink\" },\n  { \"id\": 72, \"name\": \"refrigerator\" },\n  { \"id\": 73, \"name\": \"book\" },\n  { \"id\": 74, \"name\": \"clock\" },\n  { \"id\": 75, \"name\": \"vase\" },\n  { \"id\": 76, \"name\": \"scissors\" },\n  { \"id\": 77, \"name\": \"teddy bear\" },\n  { \"id\": 78, \"name\": \"hair drier\" },\n  { \"id\": 79, \"name\": \"toothbrush\" }\n]\n","type":"detector"}},"spec":{"description":"YOLO v3 via Intel OpenVINO","handler":"main:handler","runtime":"python:3.6","env":[{"name":"NUCLIO_PYTHON_EXE_PATH","value":"/opt/nuclio/common/openvino/python3"}],"resources":{},"image":"cvat/openvino.omz.public.yolo-v3-tf:latest","targetCPU":75,"triggers":{"myHttpTrigger":{"class":"","kind":"http","name":"myHttpTrigger","maxWorkers":2,"workerAvailabilityTimeoutMilliseconds":10000,"attributes":{"maxRequestBodySize":33554432}}},"volumes":[{"volume":{"name":"volume-1","hostPath":{"path":"/home/nmanovic/Workspace/cvat/serverless/common"}},"volumeMount":{"name":"volume-1","mountPath":"/opt/nuclio/common"}}],"build":{"image":"cvat/openvino.omz.public.yolo-v3-tf","baseImage":"openvino/ubuntu18_dev:2020.2","directives":{"preCopy":[{"kind":"USER","value":"root"},{"kind":"WORKDIR","value":"/opt/nuclio"},{"kind":"RUN","value":"ln -s /usr/bin/pip3 /usr/bin/pip"},{"kind":"RUN","value":"/opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/downloader.py --name yolo-v3-tf -o /opt/nuclio/open_model_zoo"},{"kind":"RUN","value":"/opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader/converter.py --name yolo-v3-tf --precisions FP32 -d /opt/nuclio/open_model_zoo -o /opt/nuclio/open_model_zoo"}]},"codeEntryType":"image"},"platform":{"attributes":{"mountMode":"volume","restartPolicy":{"maximumRetryCount":3,"name":"always"}}},"readinessTimeoutSeconds":60,"securityContext":{},"eventTimeout":"30s"}}}}
21.07.12 15:55:31.496            nuctl.platform (I) Waiting for function to be ready {"timeout": 60}
21.07.12 15:55:32.894                     nuctl (I) Function deploy complete {"functionName": "openvino-omz-public-yolo-v3-tf", "httpPort": 49156}

Again, go to models tab and check that you can see YOLO v3 in the list. If you cannot by a reason it means that there are some problems. Go to one of our public channels and ask for help.

Let us reuse the task which you created for testing SiamMask serverless function above. Choose the magic wand tool, go to the Detectors tab, and select YOLO v3 model. Press Annotate button and after a couple of seconds you should see detection results. Do not forget to save annotations.

YOLO v3 results

Also it is possible to run a detector for the whole annotation task. Thus CVAT will run the serverless function on every frame of the task and submit results directly into database. For more details please read the guide.

Objects segmentation using Mask-RCNN

If you have a detector, which returns polygons, you can segment objects. One of such detectors is Mask-RCNN. There are several implementations of the detector available out of the box:

serverless/openvino/omz/public/mask_rcnn_inception_resnet_v2_atrous_coco is optimized using Intel OpenVINO framework and works well if it is run on an Intel CPU.
serverless/tensorflow/matterport/mask_rcnn/ is optimized for GPU.

The deployment process for a serverless function optimized for GPU is similar. Just need to run serverless/deploy_gpu.sh script. It runs mostly the same commands but utilize function-gpu.yaml configuration file instead of function.yaml internally. See next sections if you want to understand the difference.

Note: Please do not run several GPU functions at the same time. In many cases it will not work out of the box. For now you should manually schedule different functions on different GPUs and it requires source code modification. Nuclio autoscaler does not support the local platform (docker).

serverless/deploy_gpu.sh serverless/tensorflow/matterport/mask_rcnn

Deploying serverless/tensorflow/matterport/mask_rcnn function...
21.07.12 16:48:48.995                     nuctl (I) Deploying function {"name": ""}
21.07.12 16:48:48.995                     nuctl (I) Building {"versionInfo": "Label: 1.5.16, Git commit: ae43a6a560c2bec42d7ccfdf6e8e11a1e3cc3774, OS: linux, Arch: amd64, Go version: go1.14.3", "name": ""}
21.07.12 16:48:49.356                     nuctl (I) Cleaning up before deployment {"functionName": "tf-matterport-mask-rcnn"}
21.07.12 16:48:49.470                     nuctl (I) Function already exists, deleting function containers {"functionName": "tf-matterport-mask-rcnn"}
21.07.12 16:48:50.247                     nuctl (I) Staging files and preparing base images
21.07.12 16:48:50.248                     nuctl (I) Building processor image {"imageName": "cvat/tf.matterport.mask_rcnn:latest"}
21.07.12 16:48:50.249     nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/handler-builder-python-onbuild:1.5.16-amd64"}
21.07.12 16:48:53.674     nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/uhttpc:0.0.1-amd64"}
21.07.12 16:48:57.424            nuctl.platform (I) Building docker image {"image": "cvat/tf.matterport.mask_rcnn:latest"}
21.07.12 16:48:57.763            nuctl.platform (I) Pushing docker image into registry {"image": "cvat/tf.matterport.mask_rcnn:latest", "registry": ""}
21.07.12 16:48:57.764            nuctl.platform (I) Docker image was successfully built and pushed into docker registry {"image": "cvat/tf.matterport.mask_rcnn:latest"}
21.07.12 16:48:57.764                     nuctl (I) Build complete {"result": {"Image":"cvat/tf.matterport.mask_rcnn:latest","UpdatedFunctionConfig":{"metadata":{"name":"tf-matterport-mask-rcnn","namespace":"nuclio","labels":{"nuclio.io/project-name":"cvat"},"annotations":{"framework":"tensorflow","name":"Mask RCNN via Tensorflow","spec":"[\n  { \"id\": 0, \"name\": \"BG\" },\n  { \"id\": 1, \"name\": \"person\" },\n  { \"id\": 2, \"name\": \"bicycle\" },\n  { \"id\": 3, \"name\": \"car\" },\n  { \"id\": 4, \"name\": \"motorcycle\" },\n  { \"id\": 5, \"name\": \"airplane\" },\n  { \"id\": 6, \"name\": \"bus\" },\n  { \"id\": 7, \"name\": \"train\" },\n  { \"id\": 8, \"name\": \"truck\" },\n  { \"id\": 9, \"name\": \"boat\" },\n  { \"id\": 10, \"name\": \"traffic_light\" },\n  { \"id\": 11, \"name\": \"fire_hydrant\" },\n  { \"id\": 12, \"name\": \"stop_sign\" },\n  { \"id\": 13, \"name\": \"parking_meter\" },\n  { \"id\": 14, \"name\": \"bench\" },\n  { \"id\": 15, \"name\": \"bird\" },\n  { \"id\": 16, \"name\": \"cat\" },\n  { \"id\": 17, \"name\": \"dog\" },\n  { \"id\": 18, \"name\": \"horse\" },\n  { \"id\": 19, \"name\": \"sheep\" },\n  { \"id\": 20, \"name\": \"cow\" },\n  { \"id\": 21, \"name\": \"elephant\" },\n  { \"id\": 22, \"name\": \"bear\" },\n  { \"id\": 23, \"name\": \"zebra\" },\n  { \"id\": 24, \"name\": \"giraffe\" },\n  { \"id\": 25, \"name\": \"backpack\" },\n  { \"id\": 26, \"name\": \"umbrella\" },\n  { \"id\": 27, \"name\": \"handbag\" },\n  { \"id\": 28, \"name\": \"tie\" },\n  { \"id\": 29, \"name\": \"suitcase\" },\n  { \"id\": 30, \"name\": \"frisbee\" },\n  { \"id\": 31, \"name\": \"skis\" },\n  { \"id\": 32, \"name\": \"snowboard\" },\n  { \"id\": 33, \"name\": \"sports_ball\" },\n  { \"id\": 34, \"name\": \"kite\" },\n  { \"id\": 35, \"name\": \"baseball_bat\" },\n  { \"id\": 36, \"name\": \"baseball_glove\" },\n  { \"id\": 37, \"name\": \"skateboard\" },\n  { \"id\": 38, \"name\": \"surfboard\" },\n  { \"id\": 39, \"name\": \"tennis_racket\" },\n  { \"id\": 40, \"name\": \"bottle\" },\n  { \"id\": 41, \"name\": \"wine_glass\" },\n  { \"id\": 42, \"name\": \"cup\" },\n  { \"id\": 43, \"name\": \"fork\" },\n  { \"id\": 44, \"name\": \"knife\" },\n  { \"id\": 45, \"name\": \"spoon\" },\n  { \"id\": 46, \"name\": \"bowl\" },\n  { \"id\": 47, \"name\": \"banana\" },\n  { \"id\": 48, \"name\": \"apple\" },\n  { \"id\": 49, \"name\": \"sandwich\" },\n  { \"id\": 50, \"name\": \"orange\" },\n  { \"id\": 51, \"name\": \"broccoli\" },\n  { \"id\": 52, \"name\": \"carrot\" },\n  { \"id\": 53, \"name\": \"hot_dog\" },\n  { \"id\": 54, \"name\": \"pizza\" },\n  { \"id\": 55, \"name\": \"donut\" },\n  { \"id\": 56, \"name\": \"cake\" },\n  { \"id\": 57, \"name\": \"chair\" },\n  { \"id\": 58, \"name\": \"couch\" },\n  { \"id\": 59, \"name\": \"potted_plant\" },\n  { \"id\": 60, \"name\": \"bed\" },\n  { \"id\": 61, \"name\": \"dining_table\" },\n  { \"id\": 62, \"name\": \"toilet\" },\n  { \"id\": 63, \"name\": \"tv\" },\n  { \"id\": 64, \"name\": \"laptop\" },\n  { \"id\": 65, \"name\": \"mouse\" },\n  { \"id\": 66, \"name\": \"remote\" },\n  { \"id\": 67, \"name\": \"keyboard\" },\n  { \"id\": 68, \"name\": \"cell_phone\" },\n  { \"id\": 69, \"name\": \"microwave\" },\n  { \"id\": 70, \"name\": \"oven\" },\n  { \"id\": 71, \"name\": \"toaster\" },\n  { \"id\": 72, \"name\": \"sink\" },\n  { \"id\": 73, \"name\": \"refrigerator\" },\n  { \"id\": 74, \"name\": \"book\" },\n  { \"id\": 75, \"name\": \"clock\" },\n  { \"id\": 76, \"name\": \"vase\" },\n  { \"id\": 77, \"name\": \"scissors\" },\n  { \"id\": 78, \"name\": \"teddy_bear\" },\n  { \"id\": 79, \"name\": \"hair_drier\" },\n  { \"id\": 80, \"name\": \"toothbrush\" }\n]\n","type":"detector"}},"spec":{"description":"Mask RCNN optimized for GPU","handler":"main:handler","runtime":"python:3.6","env":[{"name":"MASK_RCNN_DIR","value":"/opt/nuclio/Mask_RCNN"}],"resources":{"limits":{"nvidia.com/gpu":"1"}},"image":"cvat/tf.matterport.mask_rcnn:latest","targetCPU":75,"triggers":{"myHttpTrigger":{"class":"","kind":"http","name":"myHttpTrigger","maxWorkers":1,"workerAvailabilityTimeoutMilliseconds":10000,"attributes":{"maxRequestBodySize":33554432}}},"volumes":[{"volume":{"name":"volume-1","hostPath":{"path":"/home/nmanovic/Workspace/cvat/serverless/common"}},"volumeMount":{"name":"volume-1","mountPath":"/opt/nuclio/common"}}],"build":{"functionConfigPath":"serverless/tensorflow/matterport/mask_rcnn/nuclio/function-gpu.yaml","image":"cvat/tf.matterport.mask_rcnn","baseImage":"tensorflow/tensorflow:1.15.5-gpu-py3","directives":{"postCopy":[{"kind":"WORKDIR","value":"/opt/nuclio"},{"kind":"RUN","value":"apt update \u0026\u0026 apt install --no-install-recommends -y git curl"},{"kind":"RUN","value":"git clone --depth 1 https://github.com/matterport/Mask_RCNN.git"},{"kind":"RUN","value":"curl -L https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5 -o Mask_RCNN/mask_rcnn_coco.h5"},{"kind":"RUN","value":"pip3 install numpy cython pyyaml keras==2.1.0 scikit-image Pillow"}]},"codeEntryType":"image"},"platform":{"attributes":{"mountMode":"volume","restartPolicy":{"maximumRetryCount":3,"name":"always"}}},"readinessTimeoutSeconds":60,"securityContext":{},"eventTimeout":"30s"}}}}
21.07.12 16:48:59.071            nuctl.platform (I) Waiting for function to be ready {"timeout": 60}
21.07.12 16:49:00.437                     nuctl (I) Function deploy complete {"functionName": "tf-matterport-mask-rcnn", "httpPort": 49155}

Now you should be able to annotate objects using segmentation masks.

Mask RCNN results

Adding your own DL models

Choose a DL model

For the tutorial I will choose a popular AI library with a lot of models inside. In your case it can be your own model. If it is based on detectron2 it will be easy to integrate. Just follow the tutorial.

Detectron2 is Facebook AI Research’s next generation library that provides state-of-the-art detection and segmentation algorithms. It is the successor of Detectron and maskrcnn-benchmark. It supports a number of computer vision research projects and production applications in Facebook.

Clone the repository somewhere. I assume that all other experiments will be run from the cloned detectron2 directory.

git clone https://github.com/facebookresearch/detectron2
cd detectron2

Run local experiments

Let’s run a detection model locally. First of all need to install requirements for the library.

In my case I have Ubuntu 20.04 with python 3.8.5. I installed PyTorch 1.8.1 for Linux with pip, python, and CPU inside a virtual environment. Follow opencv-python installation guide to get the library for demo and visualization.

python3 -m venv .detectron2
. .detectron2/bin/activate
pip install torch==1.8.1+cpu torchvision==0.9.1+cpu torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
pip install opencv-python

Install the detectron2 library from your local clone (you should be inside detectron2 directory).

python -m pip install -e .

After the library from Facebook AI Research is installed, we can run a couple of experiments. See the official tutorial for more examples. I decided to experiment with RetinaNet. First step is to download model weights.

curl -O https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/retinanet_R_101_FPN_3x/190397697/model_final_971ab9.pkl

To run experiments let’s download an image with cats from wikipedia.

curl -O https://upload.wikimedia.org/wikipedia/commons/thumb/0/0b/Cat_poster_1.jpg/1920px-Cat_poster_1.jpg

Finally let’s run the DL model inference on CPU. If all is fine, you will see a window with cats and bounding boxes around them with scores.

python demo/demo.py --config-file configs/COCO-Detection/retinanet_R_101_FPN_3x.yaml \
  --input 1920px-Cat_poster_1.jpg --opts MODEL.WEIGHTS model_final_971ab9.pkl MODEL.DEVICE cpu

Cats detected by RetinaNet R101

Next step is to minimize demo/demo.py script and keep code which is necessary to load, run, and interpret output of the model only. Let’s hard code parameters and remove argparse. Keep only code which is responsible for working with an image. There is no common advice how to minimize some code.

Finally you should get something like the code below which has fixed config, read a predefined image, initialize predictor, and run inference. As the final step it prints all detected bounding boxes with scores and labels.

from detectron2.config import get_cfg
from detectron2.data.detection_utils import read_image
from detectron2.engine.defaults import DefaultPredictor
from detectron2.data.datasets.builtin_meta import COCO_CATEGORIES

CONFIG_FILE = "configs/COCO-Detection/retinanet_R_101_FPN_3x.yaml"
CONFIG_OPTS = ["MODEL.WEIGHTS", "model_final_971ab9.pkl", "MODEL.DEVICE", "cpu"]
CONFIDENCE_THRESHOLD = 0.5

def setup_cfg():
    cfg = get_cfg()
    cfg.merge_from_file(CONFIG_FILE)
    cfg.merge_from_list(CONFIG_OPTS)
    cfg.MODEL.RETINANET.SCORE_THRESH_TEST = CONFIDENCE_THRESHOLD
    cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = CONFIDENCE_THRESHOLD
    cfg.MODEL.PANOPTIC_FPN.COMBINE.INSTANCES_CONFIDENCE_THRESH = CONFIDENCE_THRESHOLD
    cfg.freeze()
    return cfg


if __name__ == "__main__":
    cfg = setup_cfg()
    input = "1920px-Cat_poster_1.jpg"
    img = read_image(input, format="BGR")
    predictor = DefaultPredictor(cfg)
    predictions = predictor(img)
    instances = predictions['instances']
    pred_boxes = instances.pred_boxes
    scores = instances.scores
    pred_classes = instances.pred_classes
    for box, score, label in zip(pred_boxes, scores, pred_classes):
        label = COCO_CATEGORIES[int(label)]["name"]
        print(box.tolist(), float(score), label)

DL model as a serverless function

When we know how to run the DL model locally, we can prepare a serverless function which can be used by CVAT to annotate data. Let’s see how function.yaml will look like…

Let’s look at faster_rcnn_inception_v2_coco serverless function configuration as an example and try adapting it to our case. First of all let’s invent an unique name for the new function: pth-facebookresearch-detectron2-retinanet-r101. Section annotations describes our function for CVAT serverless subsystem:

annotations.name is a display name
annotations.type is a type of the serverless function. It can have several different values. Basically it affects input and output of the function. In our case it has detector type and it means that the integrated DL model can generate shapes with labels for an image.
annotations.framework is used for information only and can have arbitrary value. Usually it has values like OpenVINO, PyTorch, TensorFlow, etc.
annotations.spec describes the list of labels which the model supports. In the case the DL model was trained on MS COCO dataset and the list of labels correspond to the dataset.
spec.description is used to provide basic information for the model.

All other parameters are described in Nuclio documentation.

spec.handler is the entry point to your function.
spec.runtime is the name of the language runtime.
spec.eventTimeout is the global event timeout

Next step is to describe how to build our serverless function:

spec.build.image is the name of your docker image
spec.build.baseImage is the name of a base container image from which to build the function
spec.build.directives are commands to build your docker image

In our case we start from Ubuntu 20.04 base image, install curl to download weights for our model, git to clone detectron2 project from GitHub, and python together with pip. Repeat installation steps which we used to setup the DL model locally with minor modifications.

For Nuclio platform we have to specify a couple of more parameters:

spec.triggers.myHttpTrigger describes HTTP trigger to handle incoming HTTP requests.
spec.platform describes some important parameters to run your functions like restartPolicy and mountMode. Read Nuclio documentation for more details.

metadata:
  name: pth-facebookresearch-detectron2-retinanet-r101
  namespace: cvat
  annotations:
    name: RetinaNet R101
    type: detector
    framework: pytorch
    spec: |
      [
        { "id": 1, "name": "person" },
        { "id": 2, "name": "bicycle" },

        ...

        { "id":89, "name": "hair_drier" },
        { "id":90, "name": "toothbrush" }
      ]      

spec:
  description: RetinaNet R101 from Detectron2
  runtime: 'python:3.8'
  handler: main:handler
  eventTimeout: 30s

  build:
    image: cvat/pth.facebookresearch.detectron2.retinanet_r101
    baseImage: ubuntu:20.04

    directives:
      preCopy:
        - kind: ENV
          value: DEBIAN_FRONTEND=noninteractive
        - kind: RUN
          value: apt-get update && apt-get -y install curl git python3 python3-pip
        - kind: WORKDIR
          value: /opt/nuclio
        - kind: RUN
          value: pip3 install torch==1.8.1+cpu torchvision==0.9.1+cpu torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
        - kind: RUN
          value: pip3 install 'git+https://github.com/facebookresearch/detectron2@v0.4'
        - kind: RUN
          value: curl -O https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/retinanet_R_101_FPN_3x/190397697/model_final_971ab9.pkl
        - kind: RUN
          value: ln -s /usr/bin/pip3 /usr/local/bin/pip

  triggers:
    myHttpTrigger:
      maxWorkers: 2
      kind: 'http'
      workerAvailabilityTimeoutMilliseconds: 10000
      attributes:
        maxRequestBodySize: 33554432 # 32MB

  platform:
    attributes:
      restartPolicy:
        name: always
        maximumRetryCount: 3
      mountMode: volume

Full code can be found here: detectron2/retinanet/nuclio/function.yaml

Next step is to adapt our source code which we implemented to run the DL model locally to requirements of Nuclio platform. First step is to load the model into memory using init_context(context) function. Read more about the function in Best Practices and Common Pitfalls.

After that we need to accept incoming HTTP requests, run inference, reply with detection results. For the process our entry point is responsible which we specified in our function specification handler(context, event). Again in accordance to function specification the entry point should be located inside main.py.


def init_context(context):
    context.logger.info("Init context...  0%")

    cfg = get_config('COCO-Detection/retinanet_R_101_FPN_3x.yaml')
    cfg.merge_from_list(CONFIG_OPTS)
    cfg.MODEL.RETINANET.SCORE_THRESH_TEST = CONFIDENCE_THRESHOLD
    cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = CONFIDENCE_THRESHOLD
    cfg.MODEL.PANOPTIC_FPN.COMBINE.INSTANCES_CONFIDENCE_THRESH = CONFIDENCE_THRESHOLD
    cfg.freeze()
    predictor = DefaultPredictor(cfg)

    context.user_data.model_handler = predictor

    context.logger.info("Init context...100%")

def handler(context, event):
    context.logger.info("Run retinanet-R101 model")
    data = event.body
    buf = io.BytesIO(base64.b64decode(data["image"]))
    threshold = float(data.get("threshold", 0.5))
    image = convert_PIL_to_numpy(Image.open(buf), format="BGR")

    predictions = context.user_data.model_handler(image)

    instances = predictions['instances']
    pred_boxes = instances.pred_boxes
    scores = instances.scores
    pred_classes = instances.pred_classes
    results = []
    for box, score, label in zip(pred_boxes, scores, pred_classes):
        label = COCO_CATEGORIES[int(label)]["name"]
        if score >= threshold:
            results.append({
                "confidence": str(float(score)),
                "label": label,
                "points": box.tolist(),
                "type": "rectangle",
            })

    return context.Response(body=json.dumps(results), headers={},
        content_type='application/json', status_code=200)

Full code can be found here: detectron2/retinanet/nuclio/main.py

Deploy RetinaNet serverless function

To use the new serverless function you have to deploy it using nuctl command. The actual deployment process is described in automatic annotation guide.

./serverless/deploy_cpu.sh ./serverless/pytorch/facebookresearch/detectron2/retinanet/

21.07.21 15:20:31.011                     nuctl (I) Deploying function {"name": ""}
21.07.21 15:20:31.011                     nuctl (I) Building {"versionInfo": "Label: 1.5.16, Git commit: ae43a6a560c2bec42d7ccfdf6e8e11a1e3cc3774, OS: linux, Arch: amd64, Go version: go1.14.3", "name": ""}
21.07.21 15:20:31.407                     nuctl (I) Cleaning up before deployment {"functionName": "pth-facebookresearch-detectron2-retinanet-r101"}
21.07.21 15:20:31.497                     nuctl (I) Function already exists, deleting function containers {"functionName": "pth-facebookresearch-detectron2-retinanet-r101"}
21.07.21 15:20:31.914                     nuctl (I) Staging files and preparing base images
21.07.21 15:20:31.915                     nuctl (I) Building processor image {"imageName": "cvat/pth.facebookresearch.detectron2.retinanet_r101:latest"}
21.07.21 15:20:31.916     nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/handler-builder-python-onbuild:1.5.16-amd64"}
21.07.21 15:20:34.495     nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/uhttpc:0.0.1-amd64"}
21.07.21 15:20:37.524            nuctl.platform (I) Building docker image {"image": "cvat/pth.facebookresearch.detectron2.retinanet_r101:latest"}
21.07.21 15:20:37.852            nuctl.platform (I) Pushing docker image into registry {"image": "cvat/pth.facebookresearch.detectron2.retinanet_r101:latest", "registry": ""}
21.07.21 15:20:37.853            nuctl.platform (I) Docker image was successfully built and pushed into docker registry {"image": "cvat/pth.facebookresearch.detectron2.retinanet_r101:latest"}
21.07.21 15:20:37.853                     nuctl (I) Build complete {"result": {"Image":"cvat/pth.facebookresearch.detectron2.retinanet_r101:latest","UpdatedFunctionConfig":{"metadata":{"name":"pth-facebookresearch-detectron2-retinanet-r101","namespace":"nuclio","labels":{"nuclio.io/project-name":"cvat"},"annotations":{"framework":"pytorch","name":"RetinaNet R101","spec":"[\n  { \"id\": 1, \"name\": \"person\" },\n  { \"id\": 2, \"name\": \"bicycle\" },\n  { \"id\": 3, \"name\": \"car\" },\n  { \"id\": 4, \"name\": \"motorcycle\" },\n  { \"id\": 5, \"name\": \"airplane\" },\n  { \"id\": 6, \"name\": \"bus\" },\n  { \"id\": 7, \"name\": \"train\" },\n  { \"id\": 8, \"name\": \"truck\" },\n  { \"id\": 9, \"name\": \"boat\" },\n  { \"id\":10, \"name\": \"traffic_light\" },\n  { \"id\":11, \"name\": \"fire_hydrant\" },\n  { \"id\":13, \"name\": \"stop_sign\" },\n  { \"id\":14, \"name\": \"parking_meter\" },\n  { \"id\":15, \"name\": \"bench\" },\n  { \"id\":16, \"name\": \"bird\" },\n  { \"id\":17, \"name\": \"cat\" },\n  { \"id\":18, \"name\": \"dog\" },\n  { \"id\":19, \"name\": \"horse\" },\n  { \"id\":20, \"name\": \"sheep\" },\n  { \"id\":21, \"name\": \"cow\" },\n  { \"id\":22, \"name\": \"elephant\" },\n  { \"id\":23, \"name\": \"bear\" },\n  { \"id\":24, \"name\": \"zebra\" },\n  { \"id\":25, \"name\": \"giraffe\" },\n  { \"id\":27, \"name\": \"backpack\" },\n  { \"id\":28, \"name\": \"umbrella\" },\n  { \"id\":31, \"name\": \"handbag\" },\n  { \"id\":32, \"name\": \"tie\" },\n  { \"id\":33, \"name\": \"suitcase\" },\n  { \"id\":34, \"name\": \"frisbee\" },\n  { \"id\":35, \"name\": \"skis\" },\n  { \"id\":36, \"name\": \"snowboard\" },\n  { \"id\":37, \"name\": \"sports_ball\" },\n  { \"id\":38, \"name\": \"kite\" },\n  { \"id\":39, \"name\": \"baseball_bat\" },\n  { \"id\":40, \"name\": \"baseball_glove\" },\n  { \"id\":41, \"name\": \"skateboard\" },\n  { \"id\":42, \"name\": \"surfboard\" },\n  { \"id\":43, \"name\": \"tennis_racket\" },\n  { \"id\":44, \"name\": \"bottle\" },\n  { \"id\":46, \"name\": \"wine_glass\" },\n  { \"id\":47, \"name\": \"cup\" },\n  { \"id\":48, \"name\": \"fork\" },\n  { \"id\":49, \"name\": \"knife\" },\n  { \"id\":50, \"name\": \"spoon\" },\n  { \"id\":51, \"name\": \"bowl\" },\n  { \"id\":52, \"name\": \"banana\" },\n  { \"id\":53, \"name\": \"apple\" },\n  { \"id\":54, \"name\": \"sandwich\" },\n  { \"id\":55, \"name\": \"orange\" },\n  { \"id\":56, \"name\": \"broccoli\" },\n  { \"id\":57, \"name\": \"carrot\" },\n  { \"id\":58, \"name\": \"hot_dog\" },\n  { \"id\":59, \"name\": \"pizza\" },\n  { \"id\":60, \"name\": \"donut\" },\n  { \"id\":61, \"name\": \"cake\" },\n  { \"id\":62, \"name\": \"chair\" },\n  { \"id\":63, \"name\": \"couch\" },\n  { \"id\":64, \"name\": \"potted_plant\" },\n  { \"id\":65, \"name\": \"bed\" },\n  { \"id\":67, \"name\": \"dining_table\" },\n  { \"id\":70, \"name\": \"toilet\" },\n  { \"id\":72, \"name\": \"tv\" },\n  { \"id\":73, \"name\": \"laptop\" },\n  { \"id\":74, \"name\": \"mouse\" },\n  { \"id\":75, \"name\": \"remote\" },\n  { \"id\":76, \"name\": \"keyboard\" },\n  { \"id\":77, \"name\": \"cell_phone\" },\n  { \"id\":78, \"name\": \"microwave\" },\n  { \"id\":79, \"name\": \"oven\" },\n  { \"id\":80, \"name\": \"toaster\" },\n  { \"id\":81, \"name\": \"sink\" },\n  { \"id\":83, \"name\": \"refrigerator\" },\n  { \"id\":84, \"name\": \"book\" },\n  { \"id\":85, \"name\": \"clock\" },\n  { \"id\":86, \"name\": \"vase\" },\n  { \"id\":87, \"name\": \"scissors\" },\n  { \"id\":88, \"name\": \"teddy_bear\" },\n  { \"id\":89, \"name\": \"hair_drier\" },\n  { \"id\":90, \"name\": \"toothbrush\" }\n]\n","type":"detector"}},"spec":{"description":"RetinaNet R101 from Detectron2","handler":"main:handler","runtime":"python:3.8","resources":{},"image":"cvat/pth.facebookresearch.detectron2.retinanet_r101:latest","targetCPU":75,"triggers":{"myHttpTrigger":{"class":"","kind":"http","name":"myHttpTrigger","maxWorkers":2,"workerAvailabilityTimeoutMilliseconds":10000,"attributes":{"maxRequestBodySize":33554432}}},"volumes":[{"volume":{"name":"volume-1","hostPath":{"path":"/home/nmanovic/Workspace/cvat/serverless/common"}},"volumeMount":{"name":"volume-1","mountPath":"/opt/nuclio/common"}}],"build":{"image":"cvat/pth.facebookresearch.detectron2.retinanet_r101","baseImage":"ubuntu:20.04","directives":{"preCopy":[{"kind":"ENV","value":"DEBIAN_FRONTEND=noninteractive"},{"kind":"RUN","value":"apt-get update \u0026\u0026 apt-get -y install curl git python3 python3-pip"},{"kind":"WORKDIR","value":"/opt/nuclio"},{"kind":"RUN","value":"pip3 install torch==1.8.1+cpu torchvision==0.9.1+cpu torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html"},{"kind":"RUN","value":"pip3 install 'git+https://github.com/facebookresearch/detectron2@v0.4'"},{"kind":"RUN","value":"curl -O https://dl.fbaipublicfiles.com/detectron2/COCO-Detection/retinanet_R_101_FPN_3x/190397697/model_final_971ab9.pkl"},{"kind":"RUN","value":"ln -s /usr/bin/pip3 /usr/local/bin/pip"}]},"codeEntryType":"image"},"platform":{"attributes":{"mountMode":"volume","restartPolicy":{"maximumRetryCount":3,"name":"always"}}},"readinessTimeoutSeconds":60,"securityContext":{},"eventTimeout":"30s"}}}}
21.07.21 15:20:39.042            nuctl.platform (I) Waiting for function to be ready {"timeout": 60}
21.07.21 15:20:40.480                     nuctl (I) Function deploy complete {"functionName": "pth-facebookresearch-detectron2-retinanet-r101", "httpPort": 49153}

Advanced capabilities

Optimize using GPU

To optimize a function for a specific device (e.g. GPU), basically you just need to modify instructions above to run the function on the target device. In most cases it will be necessary to modify installation instructions only.

For RetinaNet R101 which was added above modifications will look like:

--- function.yaml	2021-06-25 21:06:51.603281723 +0300
+++ function-gpu.yaml	2021-07-07 22:38:53.454202637 +0300
@@ -90,7 +90,7 @@
       ]

 spec:
-  description: RetinaNet R101 from Detectron2
+  description: RetinaNet R101 from Detectron2 optimized for GPU
   runtime: 'python:3.8'
   handler: main:handler
   eventTimeout: 30s
@@ -108,7 +108,7 @@
         - kind: WORKDIR
           value: /opt/nuclio
         - kind: RUN
-          value: pip3 install torch==1.8.1+cpu torchvision==0.9.1+cpu torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
+          value: pip3 install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
         - kind: RUN
           value: git clone https://github.com/facebookresearch/detectron2
         - kind: RUN
@@ -120,12 +120,16 @@

   triggers:
     myHttpTrigger:
-      maxWorkers: 2
+      maxWorkers: 1
       kind: 'http'
       workerAvailabilityTimeoutMilliseconds: 10000
       attributes:
         maxRequestBodySize: 33554432 # 32MB

+  resources:
+    limits:
+      nvidia.com/gpu: 1
+
   platform:
     attributes:
       restartPolicy:

Note: GPU has very limited amount of memory and it doesn’t allow to run multiple serverless functions in parallel for now using free open-source Nuclio version on the local platform because scaling to zero feature is absent. Theoretically it is possible to run different functions on different GPUs, but it requires to change source code on corresponding serverless functions to choose a free GPU.

Debugging a serverless function

Let’s say you have a problem with your serverless function and want to debug it. Of course you can use context.logger.info or similar methods to print the intermediate state of your function. Another way is to debug using Visual Studio Code. Please see instructions below to setup your environment step by step.

Let’s modify our function.yaml to include debugpy package and specify that maxWorkers count is 1. Otherwise both workers will try to use the same port and it will lead to an exception in python code.

        - kind: RUN
          value: pip3 install debugpy

  triggers:
    myHttpTrigger:
      maxWorkers: 1

Change main.py to listen to a port (e.g. 5678). Insert code below in the beginning of your file with entry point.

import debugpy
debugpy.listen(5678)

After these changes deploy the serverless function once again. For serverless/pytorch/facebookresearch/detectron2/retinanet/nuclio/ you should run the command below:

serverless/deploy_cpu.sh serverless/pytorch/facebookresearch/detectron2/retinanet

To debug python code inside a container you have to publish the port (in this tutorial it is 5678). Nuclio deploy command doesn’t support that and we have to workaround it using SSH port forwarding.

Install SSH server on your host machine using sudo apt install openssh-server
In /etc/ssh/sshd_config host file set GatewayPorts yes
Restart ssh service to apply changes using sudo systemctl restart ssh.service

Next step is to install ssh client inside the container and run port forwarding. In the snippet below instead of user and ipaddress provide username and IP address of your host (usually IP address starts from 192.168.). You will need to confirm that you want to connect to your host computer and enter your password. Keep the terminal open after that.

docker exec -it nuclio-nuclio-pth-facebookresearch-detectron2-retinanet-r101 /bin/bash
apt update && apt install -y ssh
ssh -R 5678:localhost:5678 user@ipaddress

See how the latest command looks like in my case:

root@2d6cceec8f70:/opt/nuclio# ssh -R 5678:localhost:5678 nmanovic@192.168.50.188
The authenticity of host '192.168.50.188 (192.168.50.188)' can't be established.
ECDSA key fingerprint is SHA256:0sD6IWi+FKAhtUXr2TroHqyjcnYRIGLLx/wkGaZeRuo.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '192.168.50.188' (ECDSA) to the list of known hosts.
nmanovic@192.168.50.188's password:
Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 5.8.0-53-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

223 updates can be applied immediately.
132 of these updates are standard security updates.
To see these additional updates run: apt list --upgradable

Your Hardware Enablement Stack (HWE) is supported until April 2025.
Last login: Fri Jun 25 16:39:04 2021 from 172.17.0.5
[setupvars.sh] OpenVINO environment initialized
nmanovic@nmanovic-dl-node:~$

Finally, add the configuration below into your launch.json. Open Visual Studio Code and run Serverless Debug configuration, set a breakpoint in main.py and try to call the serverless function from CVAT UI. The breakpoint should be triggered in Visual Studio Code and it should be possible to inspect variables and debug code.

{
  "name": "Serverless Debug",
  "type": "python",
  "request": "attach",
  "connect": {
    "host": "localhost",
    "port": 5678
  },
  "pathMappings": [
    {
      "localRoot": "${workspaceFolder}/serverless/pytorch/facebookresearch/detectron2/retinanet/nuclio",
      "remoteRoot": "/opt/nuclio"
    }
  ]
}

VS Code debug RetinaNet

Note: In case of changes in the source code, need to re-deploy the function and initiate port forwarding again.

Troubleshooting

First of all need to check that you are using the recommended version of Nuclio framework. In my case it is 1.5.16 but you need to check the installation manual.

nuctl version

Client version:
"Label: 1.5.16, Git commit: ae43a6a560c2bec42d7ccfdf6e8e11a1e3cc3774, OS: linux, Arch: amd64, Go version: go1.14.3"

Check that Nuclio dashboard is running and its version corresponds to nuctl.

docker ps --filter NAME=^nuclio$

CONTAINER ID   IMAGE                                   COMMAND                  CREATED       STATUS                    PORTS                                               NAMES
7ab0c076c927   quay.io/nuclio/dashboard:1.5.16-amd64   "/docker-entrypoint.…"   6 weeks ago   Up 46 minutes (healthy)   80/tcp, 0.0.0.0:8070->8070/tcp, :::8070->8070/tcp   nuclio

Be sure that the model, which doesn’t work, is healthy. In my case Inside Outside Guidance is not running.

docker ps --filter NAME=iog

CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

Let’s run it. Go to the root of CVAT repository and run the deploying command.

serverless/deploy_cpu.sh serverless/pytorch/shiyinzhang/iog

Deploying serverless/pytorch/shiyinzhang/iog function...
21.07.06 12:49:08.763                     nuctl (I) Deploying function {"name": ""}
21.07.06 12:49:08.763                     nuctl (I) Building {"versionInfo": "Label: 1.5.16, Git commit: ae43a6a560c2bec42d7ccfdf6e8e11a1e3cc3774, OS: linux, Arch: amd64, Go version: go1.14.3", "name": ""}
21.07.06 12:49:09.085                     nuctl (I) Cleaning up before deployment {"functionName": "pth-shiyinzhang-iog"}
21.07.06 12:49:09.162                     nuctl (I) Function already exists, deleting function containers {"functionName": "pth-shiyinzhang-iog"}
21.07.06 12:49:09.230                     nuctl (I) Staging files and preparing base images
21.07.06 12:49:09.232                     nuctl (I) Building processor image {"imageName": "cvat/pth.shiyinzhang.iog:latest"}
21.07.06 12:49:09.232     nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/handler-builder-python-onbuild:1.5.16-amd64"}
21.07.06 12:49:12.525     nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/uhttpc:0.0.1-amd64"}
21.07.06 12:49:16.222            nuctl.platform (I) Building docker image {"image": "cvat/pth.shiyinzhang.iog:latest"}
21.07.06 12:49:16.555            nuctl.platform (I) Pushing docker image into registry {"image": "cvat/pth.shiyinzhang.iog:latest", "registry": ""}
21.07.06 12:49:16.555            nuctl.platform (I) Docker image was successfully built and pushed into docker registry {"image": "cvat/pth.shiyinzhang.iog:latest"}
21.07.06 12:49:16.555                     nuctl (I) Build complete {"result": {"Image":"cvat/pth.shiyinzhang.iog:latest","UpdatedFunctionConfig":{"metadata":{"name":"pth-shiyinzhang-iog","namespace":"nuclio","labels":{"nuclio.io/project-name":"cvat"},"annotations":{"framework":"pytorch","min_pos_points":"1","name":"IOG","spec":"","startswith_box":"true","type":"interactor"}},"spec":{"description":"Interactive Object Segmentation with Inside-Outside Guidance","handler":"main:handler","runtime":"python:3.6","env":[{"name":"PYTHONPATH","value":"/opt/nuclio/iog"}],"resources":{},"image":"cvat/pth.shiyinzhang.iog:latest","targetCPU":75,"triggers":{"myHttpTrigger":{"class":"","kind":"http","name":"myHttpTrigger","maxWorkers":2,"workerAvailabilityTimeoutMilliseconds":10000,"attributes":{"maxRequestBodySize":33554432}}},"volumes":[{"volume":{"name":"volume-1","hostPath":{"path":"/home/nmanovic/Workspace/cvat/serverless/common"}},"volumeMount":{"name":"volume-1","mountPath":"/opt/nuclio/common"}}],"build":{"image":"cvat/pth.shiyinzhang.iog","baseImage":"continuumio/miniconda3","directives":{"preCopy":[{"kind":"WORKDIR","value":"/opt/nuclio"},{"kind":"RUN","value":"conda create -y -n iog python=3.6"},{"kind":"SHELL","value":"[\"conda\", \"run\", \"-n\", \"iog\", \"/bin/bash\", \"-c\"]"},{"kind":"RUN","value":"conda install -y -c anaconda curl"},{"kind":"RUN","value":"conda install -y pytorch=0.4 torchvision=0.2 -c pytorch"},{"kind":"RUN","value":"conda install -y -c conda-forge pycocotools opencv scipy"},{"kind":"RUN","value":"git clone https://github.com/shiyinzhang/Inside-Outside-Guidance.git iog"},{"kind":"WORKDIR","value":"/opt/nuclio/iog"},{"kind":"ENV","value":"fileid=1Lm1hhMhhjjnNwO4Pf7SC6tXLayH2iH0l"},{"kind":"ENV","value":"filename=IOG_PASCAL_SBD.pth"},{"kind":"RUN","value":"curl -c ./cookie -s -L \"https://drive.google.com/uc?export=download\u0026id=${fileid}\""},{"kind":"RUN","value":"echo \"/download/ {print \\$NF}\" \u003e confirm_code.awk"},{"kind":"RUN","value":"curl -Lb ./cookie \"https://drive.google.com/uc?export=download\u0026confirm=`awk -f confirm_code.awk ./cookie`\u0026id=${fileid}\" -o ${filename}"},{"kind":"WORKDIR","value":"/opt/nuclio"},{"kind":"ENTRYPOINT","value":"[\"conda\", \"run\", \"-n\", \"iog\"]"}]},"codeEntryType":"image"},"platform":{"attributes":{"mountMode":"volume","restartPolicy":{"maximumRetryCount":3,"name":"always"}}},"readinessTimeoutSeconds":60,"securityContext":{},"eventTimeout":"30s"}}}}
21.07.06 12:49:17.422     nuctl.platform.docker (W) Failed to run container {"err": "stdout:\n1373cb432a178a3606685b5975e40a0755bc7958786c182304f5d1bbc0873ceb\ndocker: Error response from daemon: driver failed programming external connectivity on endpoint nuclio-nuclio-pth-shiyinzhang-iog (df68e7b4a60e553ee3079f1f1622b050cc958bd50f2cd359a20164d8a417d0ea): Bind for 0.0.0.0:49154 failed: port is already allocated.\n\nstderr:\n", "errVerbose": "\nError - exit status 125\n    /nuclio/pkg/cmdrunner/shellrunner.go:96\n\nCall stack:\nstdout:\n1373cb432a178a3606685b5975e40a0755bc7958786c182304f5d1bbc0873ceb\ndocker: Error response from daemon: driver failed programming external connectivity on endpoint nuclio-nuclio-pth-shiyinzhang-iog (df68e7b4a60e553ee3079f1f1622b050cc958bd50f2cd359a20164d8a417d0ea): Bind for 0.0.0.0:49154 failed: port is already allocated.\n\nstderr:\n\n    /nuclio/pkg/cmdrunner/shellrunner.go:96\nstdout:\n1373cb432a178a3606685b5975e40a0755bc7958786c182304f5d1bbc0873ceb\ndocker: Error response from daemon: driver failed programming external connectivity on endpoint nuclio-nuclio-pth-shiyinzhang-iog (df68e7b4a60e553ee3079f1f1622b050cc958bd50f2cd359a20164d8a417d0ea): Bind for 0.0.0.0:49154 failed: port is already allocated.\n\nstderr:\n", "errCauses": [{"error": "exit status 125"}], "stdout": "1373cb432a178a3606685b5975e40a0755bc7958786c182304f5d1bbc0873ceb\ndocker: Error response from daemon: driver failed programming external connectivity on endpoint nuclio-nuclio-pth-shiyinzhang-iog (df68e7b4a60e553ee3079f1f1622b050cc958bd50f2cd359a20164d8a417d0ea): Bind for 0.0.0.0:49154 failed: port is already allocated.\n", "stderr": ""}
21.07.06 12:49:17.422                     nuctl (W) Failed to create a function; setting the function status {"err": "Failed to run a Docker container", "errVerbose": "\nError - exit status 125\n    /nuclio/pkg/cmdrunner/shellrunner.go:96\n\nCall stack:\nstdout:\n1373cb432a178a3606685b5975e40a0755bc7958786c182304f5d1bbc0873ceb\ndocker: Error response from daemon: driver failed programming external connectivity on endpoint nuclio-nuclio-pth-shiyinzhang-iog (df68e7b4a60e553ee3079f1f1622b050cc958bd50f2cd359a20164d8a417d0ea): Bind for 0.0.0.0:49154 failed: port is already allocated.\n\nstderr:\n\n    /nuclio/pkg/cmdrunner/shellrunner.go:96\nFailed to run a Docker container\n    /nuclio/pkg/platform/local/platform.go:653\nFailed to run a Docker container", "errCauses": [{"error": "stdout:\n1373cb432a178a3606685b5975e40a0755bc7958786c182304f5d1bbc0873ceb\ndocker: Error response from daemon: driver failed programming external connectivity on endpoint nuclio-nuclio-pth-shiyinzhang-iog (df68e7b4a60e553ee3079f1f1622b050cc958bd50f2cd359a20164d8a417d0ea): Bind for 0.0.0.0:49154 failed: port is already allocated.\n\nstderr:\n", "errorVerbose": "\nError - exit status 125\n    /nuclio/pkg/cmdrunner/shellrunner.go:96\n\nCall stack:\nstdout:\n1373cb432a178a3606685b5975e40a0755bc7958786c182304f5d1bbc0873ceb\ndocker: Error response from daemon: driver failed programming external connectivity on endpoint nuclio-nuclio-pth-shiyinzhang-iog (df68e7b4a60e553ee3079f1f1622b050cc958bd50f2cd359a20164d8a417d0ea): Bind for 0.0.0.0:49154 failed: port is already allocated.\n\nstderr:\n\n    /nuclio/pkg/cmdrunner/shellrunner.go:96\nstdout:\n1373cb432a178a3606685b5975e40a0755bc7958786c182304f5d1bbc0873ceb\ndocker: Error response from daemon: driver failed programming external connectivity on endpoint nuclio-nuclio-pth-shiyinzhang-iog (df68e7b4a60e553ee3079f1f1622b050cc958bd50f2cd359a20164d8a417d0ea): Bind for 0.0.0.0:49154 failed: port is already allocated.\n\nstderr:\n", "errorCauses": [{"error": "exit status 125"}]}]}

Error - exit status 125
    /nuclio/pkg/cmdrunner/shellrunner.go:96

Call stack:
stdout:
1373cb432a178a3606685b5975e40a0755bc7958786c182304f5d1bbc0873ceb
docker: Error response from daemon: driver failed programming external connectivity on endpoint nuclio-nuclio-pth-shiyinzhang-iog (df68e7b4a60e553ee3079f1f1622b050cc958bd50f2cd359a20164d8a417d0ea): Bind for 0.0.0.0:49154 failed: port is already allocated.

stderr:

    /nuclio/pkg/cmdrunner/shellrunner.go:96
Failed to run a Docker container
    /nuclio/pkg/platform/local/platform.go:653
Failed to deploy function
    ...//nuclio/pkg/platform/abstract/platform.go:182
  NAMESPACE |                      NAME                      | PROJECT | STATE | NODE PORT | REPLICAS
  nuclio    | openvino-dextr                                 | cvat    | ready |     49154 | 1/1
  nuclio    | pth-foolwood-siammask                          | cvat    | ready |     49155 | 1/1
  nuclio    | pth-facebookresearch-detectron2-retinanet-r101 | cvat    | ready |     49155 | 1/1
  nuclio    | pth-shiyinzhang-iog                            | cvat    | error |         0 | 1/1

In this case the container was built some time ago and the port 49154 was assigned by Nuclio. Now the port is used by openvino-dextr as we can see in logs. To prove our hypothesis just need to run a couple of docker commands:

docker container ls -a | grep iog

eb0c1ee46630   cvat/pth.shiyinzhang.iog:latest                              "conda run -n iog pr…"   9 minutes ago       Created                                                                          nuclio-nuclio-pth-shiyinzhang-iog

docker inspect eb0c1ee46630 | grep 49154

            "Error": "driver failed programming external connectivity on endpoint nuclio-nuclio-pth-shiyinzhang-iog (02384290f91b2216162b1603322dadee426afe7f439d3d090f598af5d4863b2d): Bind for 0.0.0.0:49154 failed: port is already allocated",
                        "HostPort": "49154"

To solve the problem let’s just remove the previous container for the function. In this case it is eb0c1ee46630. After that the deploying command works as expected.

docker container rm eb0c1ee46630

eb0c1ee46630

serverless/deploy_cpu.sh serverless/pytorch/shiyinzhang/iog

Deploying serverless/pytorch/shiyinzhang/iog function...
21.07.06 13:09:52.934                     nuctl (I) Deploying function {"name": ""}
21.07.06 13:09:52.934                     nuctl (I) Building {"versionInfo": "Label: 1.5.16, Git commit: ae43a6a560c2bec42d7ccfdf6e8e11a1e3cc3774, OS: linux, Arch: amd64, Go version: go1.14.3", "name": ""}
21.07.06 13:09:53.282                     nuctl (I) Cleaning up before deployment {"functionName": "pth-shiyinzhang-iog"}
21.07.06 13:09:53.341                     nuctl (I) Staging files and preparing base images
21.07.06 13:09:53.342                     nuctl (I) Building processor image {"imageName": "cvat/pth.shiyinzhang.iog:latest"}
21.07.06 13:09:53.342     nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/handler-builder-python-onbuild:1.5.16-amd64"}
21.07.06 13:09:56.633     nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/uhttpc:0.0.1-amd64"}
21.07.06 13:10:00.163            nuctl.platform (I) Building docker image {"image": "cvat/pth.shiyinzhang.iog:latest"}
21.07.06 13:10:00.452            nuctl.platform (I) Pushing docker image into registry {"image": "cvat/pth.shiyinzhang.iog:latest", "registry": ""}
21.07.06 13:10:00.452            nuctl.platform (I) Docker image was successfully built and pushed into docker registry {"image": "cvat/pth.shiyinzhang.iog:latest"}
21.07.06 13:10:00.452                     nuctl (I) Build complete {"result": {"Image":"cvat/pth.shiyinzhang.iog:latest","UpdatedFunctionConfig":{"metadata":{"name":"pth-shiyinzhang-iog","namespace":"nuclio","labels":{"nuclio.io/project-name":"cvat"},"annotations":{"framework":"pytorch","min_pos_points":"1","name":"IOG","spec":"","startswith_box":"true","type":"interactor"}},"spec":{"description":"Interactive Object Segmentation with Inside-Outside Guidance","handler":"main:handler","runtime":"python:3.6","env":[{"name":"PYTHONPATH","value":"/opt/nuclio/iog"}],"resources":{},"image":"cvat/pth.shiyinzhang.iog:latest","targetCPU":75,"triggers":{"myHttpTrigger":{"class":"","kind":"http","name":"myHttpTrigger","maxWorkers":2,"workerAvailabilityTimeoutMilliseconds":10000,"attributes":{"maxRequestBodySize":33554432}}},"volumes":[{"volume":{"name":"volume-1","hostPath":{"path":"/home/nmanovic/Workspace/cvat/serverless/common"}},"volumeMount":{"name":"volume-1","mountPath":"/opt/nuclio/common"}}],"build":{"image":"cvat/pth.shiyinzhang.iog","baseImage":"continuumio/miniconda3","directives":{"preCopy":[{"kind":"WORKDIR","value":"/opt/nuclio"},{"kind":"RUN","value":"conda create -y -n iog python=3.6"},{"kind":"SHELL","value":"[\"conda\", \"run\", \"-n\", \"iog\", \"/bin/bash\", \"-c\"]"},{"kind":"RUN","value":"conda install -y -c anaconda curl"},{"kind":"RUN","value":"conda install -y pytorch=0.4 torchvision=0.2 -c pytorch"},{"kind":"RUN","value":"conda install -y -c conda-forge pycocotools opencv scipy"},{"kind":"RUN","value":"git clone https://github.com/shiyinzhang/Inside-Outside-Guidance.git iog"},{"kind":"WORKDIR","value":"/opt/nuclio/iog"},{"kind":"ENV","value":"fileid=1Lm1hhMhhjjnNwO4Pf7SC6tXLayH2iH0l"},{"kind":"ENV","value":"filename=IOG_PASCAL_SBD.pth"},{"kind":"RUN","value":"curl -c ./cookie -s -L \"https://drive.google.com/uc?export=download\u0026id=${fileid}\""},{"kind":"RUN","value":"echo \"/download/ {print \\$NF}\" \u003e confirm_code.awk"},{"kind":"RUN","value":"curl -Lb ./cookie \"https://drive.google.com/uc?export=download\u0026confirm=`awk -f confirm_code.awk ./cookie`\u0026id=${fileid}\" -o ${filename}"},{"kind":"WORKDIR","value":"/opt/nuclio"},{"kind":"ENTRYPOINT","value":"[\"conda\", \"run\", \"-n\", \"iog\"]"}]},"codeEntryType":"image"},"platform":{"attributes":{"mountMode":"volume","restartPolicy":{"maximumRetryCount":3,"name":"always"}}},"readinessTimeoutSeconds":60,"securityContext":{},"eventTimeout":"30s"}}}}
21.07.06 13:10:01.604            nuctl.platform (I) Waiting for function to be ready {"timeout": 60}
21.07.06 13:10:02.976                     nuctl (I) Function deploy complete {"functionName": "pth-shiyinzhang-iog", "httpPort": 49159}
  NAMESPACE |                      NAME                      | PROJECT | STATE | NODE PORT | REPLICAS
  nuclio    | openvino-dextr                                 | cvat    | ready |     49154 | 1/1
  nuclio    | pth-foolwood-siammask                          | cvat    | ready |     49155 | 1/1
  nuclio    | pth-saic-vul-fbrs                              | cvat    | ready |     49156 | 1/1
  nuclio    | pth-facebookresearch-detectron2-retinanet-r101 | cvat    | ready |     49155 | 1/1
  nuclio    | pth-shiyinzhang-iog                            | cvat    | ready |     49159 | 1/1

When you investigate an issue with a serverless function, it is extremely useful to look at logs. Just run a couple of commands like docker logs <container>.

docker logs cvat

2021-07-06 13:44:54,699 DEBG 'runserver' stderr output:
[Tue Jul 06 13:44:54.699431 2021] [wsgi:error] [pid 625:tid 140010969868032] [remote 172.28.0.3:40972] [2021-07-06 13:44:54,699] ERROR django.request: Internal Server Error: /api/lambda/functions/pth-shiyinzhang-iog

2021-07-06 13:44:54,700 DEBG 'runserver' stderr output:
[Tue Jul 06 13:44:54.699712 2021] [wsgi:error] [pid 625:tid 140010969868032] [remote 172.28.0.3:40972] ERROR - 2021-07-06 13:44:54,699 - log - Internal Server Error: /api/lambda/functions/pth-shiyinzhang-iog

docker container ls --filter name=iog

CONTAINER ID   IMAGE                             COMMAND                  CREATED       STATUS                 PORTS                                         NAMES
3b6ef9a9f3e2   cvat/pth.shiyinzhang.iog:latest   "conda run -n iog pr…"   4 hours ago   Up 4 hours (healthy)   0.0.0.0:49159->8080/tcp, :::49159->8080/tcp   nuclio-nuclio-pth-shiyinzhang-iog

docker logs nuclio-nuclio-pth-shiyinzhang-iog

If before model deployment you see that the NODE PORT is 0, you need to assign it manually. Add the port: 32001 attribute to the function.yaml file of each model, before you deploy the model. Different ports should be prescribed for different models.

triggers:
myHttpTrigger:
    maxWorkers: 1
    kind: 'http'
    workerAvailabilityTimeoutMilliseconds: 10000
    attributes:
+     port: 32001
      maxRequestBodySize: 33554432 # 32MB

Installation serverless functions on Windows 10 with using the Ubuntu subsystem

If you encounter a problem running serverless functions on Windows 10, you can use the Ubuntu subsystem, for this do the following:

Install WSL 2 and Docker Desktop as described in installation manual
Install Ubuntu 18.04 from Microsoft store.
Enable integration for Ubuntu-18.04 in the settings of Docker Desktop in the Resources WSL integration tab:
Then you can download and install nuctl on Ubuntu, using the automatic annotation guide.
Install git and clone repository on Ubuntu, as described in the installation manual.
After that, run the commands from this tutorial through Ubuntu.

Advanced

1 - Projects page

Projects page

Filter

Supported properties for projects list

Create a project

2 - Organization

Personal workspace

Create new organization

Switching between organizations

Organization page

Invite members into organization

Delete organization

3 - Search

4 - Shape mode (advanced)

5 - Track mode (advanced)

6 - 3D Object annotation (advanced)

Moving an object

Copying

Image of the projection window

7 - Attribute annotation mode (advanced)

8 - Annotation with rectangles

Rotation rectangle

Annotation with rectangle by 4 points

9 - Annotation with polygons

9.1 - Manual drawing

9.2 - Drawing using automatic borders

9.3 - Edit polygon

9.4 - Track mode with polygons

9.5 - Creating masks

Cutting holes in polygons

Creating masks

Class colors

10 - Annotation with polylines

11 - Annotation with points

11.1 - Points in shape mode

11.2 - Linear interpolation with one point

12 - Annotation with ellipses

13 - Annotation with cuboids

13.1 - Creating the cuboid

Drawing cuboid by 4 points

Drawing cuboid from rectangle

13.2 - Editing the cuboid

14 - Annotation with skeletons

14.1 - Creating the skeleton

Initial skeleton setup

Drawing a skeleton from rectangle

14.2 - Editing the skeleton

Editing skeletons on the canvas

Editing skeletons on the sidebar

15 - Annotation with brush tool

Brush tool menu

Annotation with brush

Annotation with polygon-to-mask

Remove underlying pixels

AI Tools

Import and export

16 - Annotation with tags

17 - Models

18 - Annotation quality & Honeypot

Ground truth job

Managing Ground Truth jobs: Import, Export, and Deletion

Import

Export

Delete

Assessing data quality with Ground truth jobs

Quality data

Annotation quality settings

GT conflicts in the CVAT interface

Annotation quality & Honeypot video tutorial

19 - OpenCV and AI Tools

Interactors

AI tools: annotate with interactors

AI tools: add extra points

AI tools: delete points

OpenCV: intelligent scissors

Settings

Interactors models

Detectors

Labels matching