This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Getting Started

1: CVAT Overview
2: Vocabulary
3: Shortcuts

1 - CVAT Overview

CVAT is an enterprise-grade platform for managing high-quality visual datasets for computer vision applications. It offers advanced tools for image, video, and 3D annotation, built-in quality assurance (QA), automation, and secure team collaboration.

Backed by an active open-source community and trusted by thousands of organizations worldwide, CVAT helps organizations streamline data labeling for faster, more accurate model development.

Products and services

CVAT comes in three editions: CVAT Community, CVAT Online, and CVAT Enterprise.

CVAT Community

Free edition you can deploy on-premises or in your own cloud
Full annotation toolset, import/export formats, and core workflow
Ideal for technical teams comfortable managing infrastructure
Installation & Setup Guide →
GitHub repository

CVAT Online

Hosted cloud edition with automatic updates, maintenance, and managed infrastructure.
Available under multiple subscription tiers (Free, Solo, Team) for individual and collaborative work.
Designed for fast onboarding, built-in collaboration and flexible storage
Pricing & Plans →
Try for free: app.cvat.ai

CVAT Enterprise

For large organisations and regulated environments.
Includes advanced features such as SSO/LDAP, audit logs, dedicated support, and custom SLAs.
Managed deployment options, on-premises or private cloud.
Pricing & Plans →

Labeling as a Service

If you prefer not to build your own annotation team, we offer expert annotation services using CVAT.
Scalable annotation across projects, with QA built-in and reporting dashboards.
Ideal for one-time annotation projects and recurring workflows alike.

Learn more about CVAT Labeling Services →

Supported data & formats

CVAT supports a wide range of file formats and includes comprehensive built-in annotation tools for various computer vision tasks.

Input:

Image: All formats supported by the Python Pillow library, including JPEG, PNG, BMP, GIF, PPMand TIFF
Video: all formats, supported by ffmpeg, including MP4, AVI, and MOV
3D: .pcd, .bin

For more information about dataset formats, see Dataset Management.

Manual annotation

CVAT supports several tools and modes for manually labeling images, videos, and 3D data.

These tools define how the editor behaves, how shapes are created, and what geometric types you can use during annotation.

Annotation modes

Annotation modes control how the annotation workspace behaves and which actions are available:

Standard mode – full access to all annotation tools and object editing.
Attribute annotation mode – focus on editing object attributes, such as color, size, etc. without changing shapes.
Single shape mode – create one shape and automatically exit drawing.
Tag annotation mode – add frame-level tags without drawing shapes.
Review mode – review and validate existing annotations.

Creating shapes

When drawing objects on frames, you can choose how shapes behave over time:

Shape – creates a single shape on the current frame.

Track – creates a sequence of shapes linked as the same object across multiple frames.

CVAT also supports different drawing methods, such as defining shapes by two opposite points or by placing four corner points for extra control.

Shape tools

Shapes represent the geometry used to annotate objects. CVAT supports multiple shape types for different tasks:

Shape	Use case
Rectangles	Best for simple object detection where objects have a box-like shape, such as detecting windows in a building.
Polygons	Suited for complex shapes in images, like outlining geographical features in maps or detailed product shapes.
Polylines	Great for annotating linear objects like roads, pathways, or limbs in pose estimation.
Ellipses	A tool for creating segmentation masks for circular or oval objects like plates, balls, or eyes.
Cuboids	A tool for creating 3D segmentation masks that capture object volume and position, useful for autonomous driving or robotics.
Skeletons	A tool for creating segmentation masks of articulated structures, ideal for human pose estimation, animation, and movement analysis.
Brush Tool	A tool for creating detailed, free-form segmentation masks where pixel-level precision is required, such as in medical imaging.
Tags	Useful for image and video classification tasks, like identifying scenes or themes in a dataset.

Automated annotation

CVAT provides a set of AI-powered tools that speed up annotation by automatically detecting, segmenting, or tracking objects on images and videos. These tools work with built-in models (such as SAM/SAM2), pre-trained models from native integrations like Hugging Face and Roboflow, as well as custom or third-party models you deploy through CVAT AI Agents (including YOLO and other frameworks).

Below is a detailed table of the supported models and the platforms they operate on:

Algorithm Name	Category	Framework	CPU Support	GPU Support
Segment Anything	Interactor	PyTorch	✔️	✔️
Faster RCNN	Detector	OpenVINO	✔️
Mask RCNN	Detector	OpenVINO	✔️
YOLO v3	Detector	OpenVINO	✔️
YOLO v7	Detector	ONNX	✔️	✔️
Object Reidentification	ReID	OpenVINO	✔️
Semantic Segmentation for ADAS	Detector	OpenVINO	✔️
Text Detection v4	Detector	OpenVINO	✔️
SiamMask	Tracker	PyTorch	✔️	✔️
TransT	Tracker	PyTorch	✔️	✔️
Inside-Outside Guidance	Interactor	PyTorch	✔️
Faster RCNN	Detector	TensorFlow	✔️	✔️
RetinaNet	Detector	PyTorch	✔️	✔️
Face Detection	Detector	OpenVINO	✔️

Useful links

Name	Description
Self-hosted Installation Guide	Start here to install self-hosted solution on your premises.
Dataset Management Framework	Specifically for the Self-Hosted version, this framework and CLI tool are essential for building, transforming, and analyzing datasets.
Server API	The CVAT server offers a HTTP REST API for interactions. This section explains how client applications, whether they are command line tools, browsers, or scripts, interact with CVAT through HTTP requests and responses.
Python SDK	The CVAT SDK is a Python library providing access to server interactions and additional functionalities like data validation and serialization.
Command Line Tool	This tool offers a straightforward command line interface for managing CVAT tasks. Currently featuring basic functionalities, it has the potential to develop into a more advanced administration tool for CVAT.
XML Annotation Format	Detailed documentation on the XML format used for annotations in CVAT essential for understanding data structure and compatibility.
AWS Deployment Guide	A step-by-step guide for deploying CVAT on Amazon Web Services, covering all necessary procedures and tips.
Frequently Asked Questions	This section addresses common queries and provides helpful answers and insights about using CVAT.

Integrations

CVAT is a global tool, trusted and utilized by teams worldwide. Below is a list of key companies that contribute significantly to our product support or are an integral part of our ecosystem.

Service	Available In	Description
Human Protocol	CVAT Online, CVAT Community, CVAT Enterprise	Incorporates CVAT to augment annotation services within the Human Protocol framework, enhancing its capabilities in data labeling.
FiftyOne	CVAT Online, CVAT Community, CVAT Enterprise	An open-source tool for dataset management and model analysis in computer vision, FiftyOne is closely integrated with CVAT to enhance annotation capabilities and label refinement.
Hugging Face, Roboflow	CVAT Online	In CVAT Online, models from Hugging Face and Roboflow can be added to enhance computer vision tasks. For more information, see Integration with Hugging Face and Roboflow

License information

CVAT includes the following licenses:

License Type	Applicable To	Description
MIT License	CVAT Community, CVAT Enterprise	This code is distributed under the MIT License, a permissive free software license that allows for broad use, modification, and distribution.
LGPL License (FFmpeg)	CVAT Online, CVAT Community, CVAT Enterprise	Incorporates LGPL-licensed components from the FFmpeg project. Users should verify if their use of FFmpeg requires additional licenses. CVAT.ai Corporation does not provide these licenses and is not liable for any related licensing fees.
Commercial License	CVAT Enterprise	For commercial use of the Enterprise solution of CVAT, a separate commercial license is applicable. This is tailored for businesses and commercial entities.
Terms of Use	CVAT Online, CVAT Community, CVAT Enterprise	Outlines the terms of use and confidential information handling for CVAT. Important for understanding the legal framework of using the platform.
Privacy Policy	CVAT Online, CVAT Community, CVAT Enterprise	Our Privacy Policy governs your visit to https://cvat.ai and your use of https://app.cvat.ai, and explains how we collect, safeguard and disclose information that results from your use of our Service.

Get in touch

To get in touch, use one of the following channels:

Type of inquiry	Applicable to	Description
Commercial Inquiries	CVAT Online, CVAT Enterprise, Labeling Services	Request a quote for CVAT Enterprise, CVAT Online Team subscription or order our labeling services.
General Inquiries	All products and services	Reach out to discuss partnership, co-marketing or investment opportunities with CVAT team.
CVAT Online Customer Support	CVAT Online (Pro and Team plans)	Chat with us about product support, resolve billing questions, or provide feedback.
CVAT Community Customer Support	CVAT Community	Report a bug or submit a feature request in out GitHub repository.

2 - Vocabulary

List of terms pertaining to annotation in CVAT.

Label

Label is a type of an annotated object (e.g. person, car, vehicle, etc.)

Example of a label in interface

Attribute

Attribute is a property of an annotated object (e.g. color, model, quality, etc.). There are two types of attributes:

Unique

Unique immutable and can’t be changed from frame to frame (e.g. age, gender, color, etc.)

Example of a unique attribute

Temporary

Temporary mutable and can be changed on any frame (e.g. quality, pose, truncated, etc.)

Example of a temporary attribute

Track

Track is a set of shapes on different frames which corresponds to one object. Tracks are created in Track mode

Example of a track in interface

Annotation

Annotation is a set of shapes and tracks. There are several types of annotations:

Manual which is created by a person
Semi-automatic which is created mainly automatically, but the user provides some data (e.g. interpolation)
Automatic which is created automatically without a person in the loop

Approximation

Approximation allows you to reduce the number of points in the polygon. Can be used to reduce the annotation file and to facilitate editing polygons.

Example of an applied approximation

Trackable

Trackable object will be tracked automatically if the previous frame was a latest keyframe for the object. More details in the section trackers.

Example of a trackable object in interface

Mode

Interpolation

Mode for video annotation, which uses track objects. Only objects on keyframes are manually annotation, and intermediate frames are linearly interpolated.

Related sections:

Track mode

Annotation

Mode for images annotation, which uses shape objects.

Related sections:

Shape mode

Dimension

Depends on the task data type that is defined when the task is created.

2D

The data format of 2d tasks are images and videos. Related sections:

Creating an annotation task

3D

The data format of 3d tasks is a cloud of points. Data formats for a 3D task

Related sections:

State

State of the job. The state can be changed by an assigned user in the menu inside the job. There are several possible states: new, in progress, rejected, completed.

Stage

Stage of the job. The stage is specified with the drop-down list on the task page. There are three stages: annotation, validation or acceptance. This value affects the task progress bar.

Subset

A project can have subsets. Subsets are groups for tasks that make it easier to work with the dataset. It could be test, train, validation or custom subset.

Credentials

Under credentials is understood Key & secret key, Account name and token, Anonymous access, Key file. Used to attach cloud storage.

Resource

Under resource is understood bucket name or container name. Used to attach cloud storage.

3 - Shortcuts

List of available keyboard shortcuts and notes about their customization.

CVAT provides a wide range of customizable shortcuts, with many UI elements offering shortcut hints when hovered over with the mouse.

Example of a shortcut tip in user interface

These shortcuts are organized by scopes. Some are global, meaning they work across the entire application, while others are specific to certain sections or workspaces. This approach allows reusing the same shortcuts in different scopes, depending on whether they might conflict. For example, global shortcuts must be unique since they apply across all pages and workspaces. However, similar shortcuts can be used in different workspaces, like having the same shortcuts in both the Standard Workspace and the Standard 3D Workspace, as these two do not coexist.

Scope	Shortcut Conflicts
Global	Must be unique across all scopes, as they apply universally.
Annotation Page	Must be unique across all scopes, except Labels Editor.
Standard Workspace	Must be unique across itself, Annotation Page and Global Scope.
Standard 3D Workspace	Must be unique across itself, Annotation Page and Global Scope.
Attribute Annotation Workspace	Must be unique across itself, Annotation Page and Global Scope.
Review Workspace	Must be unique across itself, Annotation Page and Global Scope.
Tag Annotation Workspace	Must be unique across itself, Annotation Page and Global Scope.
Control Sidebar	Must be unique across itself, all workspaces, Annotation Page and Global Scope.
Objects Sidebar	Must be unique across itself, all workspaces, Annotation Page and Global Scope.
Labels Editor	Must be unique across itself and Global Scope.

Shortcuts Customization

You can customize shortcuts in CVAT settings.

Open Settings:
Go to the Shortcuts tab:
You’ll see the shortcuts customization menu:
As it can be seen there is a warning, that some shortcuts are reserved by a browser and cannot be overridden in CVAT, there isn’t a specific list available for such combinations, but shortcuts such as ctrl + tab (switching tabs) or ctrl + w (closing tabs) etc, are reserved by the browser and shortcuts such as alt + f4 (closing the window) are usually reserved by your operating system.
All sections collapsible, so you can easily navigate through the list of shortcuts. Here is the Global scope expanded:
To add a custom shortcut all you have to do is to click the input field and start pressing the sequence you want to assign to the action. As an example f3 has been set here for Show Shortcuts along with f1:
Shortcuts can be any combination of modifiers (ctrl, shift or alt) and up to one non-modifier key e.g. ctrl+shift+f1 etc.
If you try to add a shortcut that is already in use, you will get a warning message:
If pressed cancel it will remain the same otherwise the conflicting shortcut will be unset.
If you want to reset all the shortcuts to default, you can do so by clicking the Restore Defaults button at the top of the shortcut settings.