This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Getting Started

1 - CVAT Overview

CVAT is an enterprise-grade platform for managing high-quality visual datasets for computer vision applications. It offers advanced tools for image, video, and 3D annotation, built-in quality assurance (QA), automation, and secure team collaboration.

Backed by an active open-source community and trusted by thousands of organizations worldwide, CVAT helps organizations streamline data labeling for faster, more accurate model development.


Products and services

CVAT comes in three editions: CVAT Community, CVAT Online, and CVAT Enterprise.

CVAT Community

  • Free edition you can deploy on-premises or in your own cloud
  • Full annotation toolset, import/export formats, and core workflow
  • Ideal for technical teams comfortable managing infrastructure
  • Installation & Setup Guide →
  • GitHub repository

CVAT Online

  • Hosted cloud edition with automatic updates, maintenance, and managed infrastructure.
  • Available under multiple subscription tiers (Free, Solo, Team) for individual and collaborative work.
  • Designed for fast onboarding, built-in collaboration and flexible storage
  • Pricing & Plans →
  • Try for free: app.cvat.ai

CVAT Enterprise

  • For large organisations and regulated environments.
  • Includes advanced features such as SSO/LDAP, audit logs, dedicated support, and custom SLAs.
  • Managed deployment options, on-premises or private cloud.
  • Pricing & Plans →

Labeling as a Service

  • If you prefer not to build your own annotation team, we offer expert annotation services using CVAT.
  • Scalable annotation across projects, with QA built-in and reporting dashboards.
  • Ideal for one-time annotation projects and recurring workflows alike.

Learn more about CVAT Labeling Services →

Supported data & formats

CVAT supports a wide range of file formats and includes comprehensive built-in annotation tools for various computer vision tasks.

Input:

  • Image: All formats supported by the Python Pillow library, including JPEG, PNG, BMP, GIF, PPMand TIFF
  • Video: all formats, supported by ffmpeg, including MP4, AVI, and MOV
  • 3D: .pcd, .bin

For more information about dataset formats, see Dataset Management.

Manual annotation

CVAT supports several tools and modes for manually labeling images, videos, and 3D data.

These tools define how the editor behaves, how shapes are created, and what geometric types you can use during annotation.

Annotation modes

Annotation modes control how the annotation workspace behaves and which actions are available:

  • Standard mode – full access to all annotation tools and object editing.
  • Attribute annotation mode – focus on editing object attributes, such as color, size, etc. without changing shapes.
  • Single shape mode – create one shape and automatically exit drawing.
  • Tag annotation mode – add frame-level tags without drawing shapes.
  • Review mode – review and validate existing annotations.

Creating shapes

When drawing objects on frames, you can choose how shapes behave over time:

Shape – creates a single shape on the current frame.

Track – creates a sequence of shapes linked as the same object across multiple frames.

CVAT also supports different drawing methods, such as defining shapes by two opposite points or by placing four corner points for extra control.

Shape tools

Shapes represent the geometry used to annotate objects. CVAT supports multiple shape types for different tasks:

Shape Use case
Rectangles Best for simple object detection where objects have a box-like shape, such as detecting windows in a building.
Polygons Suited for complex shapes in images, like outlining geographical features in maps or detailed product shapes.
Polylines Great for annotating linear objects like roads, pathways, or limbs in pose estimation.
Ellipses A tool for creating segmentation masks for circular or oval objects like plates, balls, or eyes.
Cuboids A tool for creating 3D segmentation masks that capture object volume and position, useful for autonomous driving or robotics.
Skeletons A tool for creating segmentation masks of articulated structures, ideal for human pose estimation, animation, and movement analysis.
Brush Tool A tool for creating detailed, free-form segmentation masks where pixel-level precision is required, such as in medical imaging.
Tags Useful for image and video classification tasks, like identifying scenes or themes in a dataset.

Automated annotation

CVAT provides a set of AI-powered tools that speed up annotation by automatically detecting, segmenting, or tracking objects on images and videos. These tools work with built-in models (such as SAM/SAM2), pre-trained models from native integrations like Hugging Face and Roboflow, as well as custom or third-party models you deploy through CVAT AI Agents (including YOLO and other frameworks).

Below is a detailed table of the supported models and the platforms they operate on:

Algorithm Name Category Framework CPU Support GPU Support
Segment Anything Interactor PyTorch ✔️ ✔️
Deep Extreme Cut Interactor OpenVINO ✔️
Faster RCNN Detector OpenVINO ✔️
Mask RCNN Detector OpenVINO ✔️
YOLO v3 Detector OpenVINO ✔️
YOLO v7 Detector ONNX ✔️ ✔️
Object Reidentification ReID OpenVINO ✔️
Semantic Segmentation for ADAS Detector OpenVINO ✔️
Text Detection v4 Detector OpenVINO ✔️
SiamMask Tracker PyTorch ✔️ ✔️
TransT Tracker PyTorch ✔️ ✔️
Inside-Outside Guidance Interactor PyTorch ✔️
Faster RCNN Detector TensorFlow ✔️ ✔️
RetinaNet Detector PyTorch ✔️ ✔️
Face Detection Detector OpenVINO ✔️
Name Description
Self-hosted Installation Guide Start here to install self-hosted solution on your premises.
Dataset Management Framework Specifically for the Self-Hosted version, this framework and CLI tool are essential for building, transforming, and analyzing datasets.
Server API The CVAT server offers a HTTP REST API for interactions. This section explains how client applications, whether they are command line tools, browsers, or scripts, interact with CVAT through HTTP requests and responses.
Python SDK The CVAT SDK is a Python library providing access to server interactions and additional functionalities like data validation and serialization.
Command Line Tool This tool offers a straightforward command line interface for managing CVAT tasks. Currently featuring basic functionalities, it has the potential to develop into a more advanced administration tool for CVAT.
XML Annotation Format Detailed documentation on the XML format used for annotations in CVAT essential for understanding data structure and compatibility.
AWS Deployment Guide A step-by-step guide for deploying CVAT on Amazon Web Services, covering all necessary procedures and tips.
Frequently Asked Questions This section addresses common queries and provides helpful answers and insights about using CVAT.

Integrations

CVAT is a global tool, trusted and utilized by teams worldwide. Below is a list of key companies that contribute significantly to our product support or are an integral part of our ecosystem.

Service Available In Description
Human Protocol CVAT Online, CVAT Community, CVAT Enterprise Incorporates CVAT to augment annotation services within the Human Protocol framework, enhancing its capabilities in data labeling.
FiftyOne CVAT Online, CVAT Community, CVAT Enterprise An open-source tool for dataset management and model analysis in computer vision, FiftyOne is closely integrated with CVAT to enhance annotation capabilities and label refinement.
Hugging Face, Roboflow CVAT Online In CVAT Online, models from Hugging Face and Roboflow can be added to enhance computer vision tasks. For more information, see Integration with Hugging Face and Roboflow

License information

CVAT includes the following licenses:

License Type Applicable To Description
MIT License CVAT Community, CVAT Enterprise This code is distributed under the MIT License, a permissive free software license that allows for broad use, modification, and distribution.
LGPL License (FFmpeg) CVAT Online, CVAT Community, CVAT Enterprise Incorporates LGPL-licensed components from the FFmpeg project. Users should verify if their use of FFmpeg requires additional licenses. CVAT.ai Corporation does not provide these licenses and is not liable for any related licensing fees.
Commercial License CVAT Enterprise For commercial use of the Enterprise solution of CVAT, a separate commercial license is applicable. This is tailored for businesses and commercial entities.
Terms of Use CVAT Online, CVAT Community, CVAT Enterprise Outlines the terms of use and confidential information handling for CVAT. Important for understanding the legal framework of using the platform.
Privacy Policy CVAT Online, CVAT Community, CVAT Enterprise Our Privacy Policy governs your visit to https://cvat.ai and your use of https://app.cvat.ai, and explains how we collect, safeguard and disclose information that results from your use of our Service.

Get in touch

To get in touch, use one of the following channels:

Type of inquiry Applicable to Description
Commercial Inquiries CVAT Online, CVAT Enterprise, Labeling Services Request a quote for CVAT Enterprise, CVAT Online Team subscription or order our labeling services.
General Inquiries All products and services Reach out to discuss partnership, co-marketing or investment opportunities with CVAT team.
CVAT Online Customer Support CVAT Online (Pro and Team plans) Chat with us about product support, resolve billing questions, or provide feedback.
CVAT Community Customer Support CVAT Community Report a bug or submit a feature request in out GitHub repository.

2 - Vocabulary

List of terms pertaining to annotation in CVAT.

Label

Label is a type of an annotated object (e.g. person, car, vehicle, etc.)

Example of a label in interface


Attribute

Attribute is a property of an annotated object (e.g. color, model, quality, etc.). There are two types of attributes:

Unique

Unique immutable and can’t be changed from frame to frame (e.g. age, gender, color, etc.)

Example of a unique attribute

Temporary

Temporary mutable and can be changed on any frame (e.g. quality, pose, truncated, etc.)

Example of a temporary attribute


Track

Track is a set of shapes on different frames which corresponds to one object. Tracks are created in Track mode

Example of a track in interface


Annotation

Annotation is a set of shapes and tracks. There are several types of annotations:

  • Manual which is created by a person
  • Semi-automatic which is created mainly automatically, but the user provides some data (e.g. interpolation)
  • Automatic which is created automatically without a person in the loop

Approximation

Approximation allows you to reduce the number of points in the polygon. Can be used to reduce the annotation file and to facilitate editing polygons.

Example of an applied approximation


Trackable

Trackable object will be tracked automatically if the previous frame was a latest keyframe for the object. More details in the section trackers.

Example of a trackable object in interface


Mode

Interpolation

Mode for video annotation, which uses track objects. Only objects on keyframes are manually annotation, and intermediate frames are linearly interpolated.

Related sections:

Annotation

Mode for images annotation, which uses shape objects.

Related sections:


Dimension

Depends on the task data type that is defined when the task is created.

2D

The data format of 2d tasks are images and videos. Related sections:

3D

The data format of 3d tasks is a cloud of points. Data formats for a 3D task

Related sections:


State

State of the job. The state can be changed by an assigned user in the menu inside the job. There are several possible states: new, in progress, rejected, completed.


Stage

Stage of the job. The stage is specified with the drop-down list on the task page. There are three stages: annotation, validation or acceptance. This value affects the task progress bar.


Subset

A project can have subsets. Subsets are groups for tasks that make it easier to work with the dataset. It could be test, train, validation or custom subset.


Credentials

Under credentials is understood Key & secret key, Account name and token, Anonymous access, Key file. Used to attach cloud storage.


Resource

Under resource is understood bucket name or container name. Used to attach cloud storage.

3 - Shortcuts

List of available keyboard shortcuts and notes about their customization.

CVAT provides a wide range of customizable shortcuts, with many UI elements offering shortcut hints when hovered over with the mouse.

Example of a shortcut tip in user interface

These shortcuts are organized by scopes. Some are global, meaning they work across the entire application, while others are specific to certain sections or workspaces. This approach allows reusing the same shortcuts in different scopes, depending on whether they might conflict. For example, global shortcuts must be unique since they apply across all pages and workspaces. However, similar shortcuts can be used in different workspaces, like having the same shortcuts in both the Standard Workspace and the Standard 3D Workspace, as these two do not coexist.

Scope Shortcut Conflicts
Global Must be unique across all scopes, as they apply universally.
Annotation Page Must be unique across all scopes, except Labels Editor.
Standard Workspace Must be unique across itself, Annotation Page and Global Scope.
Standard 3D Workspace Must be unique across itself, Annotation Page and Global Scope.
Attribute Annotation Workspace Must be unique across itself, Annotation Page and Global Scope.
Review Workspace Must be unique across itself, Annotation Page and Global Scope.
Tag Annotation Workspace Must be unique across itself, Annotation Page and Global Scope.
Control Sidebar Must be unique across itself, all workspaces, Annotation Page and Global Scope.
Objects Sidebar Must be unique across itself, all workspaces, Annotation Page and Global Scope.
Labels Editor Must be unique across itself and Global Scope.

Shortcuts Customization

You can customize shortcuts in CVAT settings.

  • Open Settings:
    User menu with highlighted “Settings” option

  • Go to the Shortcuts tab:
    “Settings” section with highlighted “Shortcuts” tab

  • You’ll see the shortcuts customization menu:
    “Shortcuts” tab with customization menu

  • As it can be seen there is a warning, that some shortcuts are reserved by a browser and cannot be overridden in CVAT, there isn’t a specific list available for such combinations, but shortcuts such as ctrl + tab (switching tabs) or ctrl + w (closing tabs) etc, are reserved by the browser and shortcuts such as alt + f4 (closing the window) are usually reserved by your operating system.

  • All sections collapsible, so you can easily navigate through the list of shortcuts. Here is the Global scope expanded:
    Expanded “General” section in “Shortcuts” tab

  • To add a custom shortcut all you have to do is to click the input field and start pressing the sequence you want to assign to the action. As an example f3 has been set here for Show Shortcuts along with f1:
    Example of two custom shortcuts

  • Shortcuts can be any combination of modifiers (ctrl, shift or alt) and up to one non-modifier key e.g. ctrl+shift+f1 etc.
    Example of adding a shortcut

  • If you try to add a shortcut that is already in use, you will get a warning message:
    Conflicting shortcuts window

  • If pressed cancel it will remain the same otherwise the conflicting shortcut will be unset.
    Example of a result of resolving shortcuts conflict

  • If you want to reset all the shortcuts to default, you can do so by clicking the Restore Defaults button at the top of the shortcut settings.
    “Shortcuts” tab with highlighted “Restore defaults” button