Automatic annotation

Automatic annotation of tasks

Automatic annotation in CVAT is a tool that you can use to automatically pre-annotate your data with pre-trained models.

CVAT can use models from the following sources:

Pre-installed models.
Models integrated from Hugging Face and Roboflow.
Self-hosted models deployed with Nuclio.

The following table describes the available options:

	Self-hosted	Cloud
Price	Free	See Pricing
Models	You have to add models	You can use pre-installed models
Hugging Face & Roboflow integration	Not supported	Supported

See:

Running Automatic annotation
Labels matching
Models
Adding models from Hugging Face and Roboflow

Running Automatic annotation

To start automatic annotation, do the following:

On the top menu, click Tasks.
Find the task you want to annotate and click Action > Automatic annotation.
In the Automatic annotation dialog, from the drop-down list, select a model.
Match the labels of the model and the task.
(Optional) In case you need the model to return masks as polygons, switch toggle Return masks as polygons.
(Optional) In case you need to remove all previous annotations, switch toggle Clean old annotations.
Click Annotate.

CVAT will show the progress of annotation on the progress bar.

Progress bar

You can stop the automatic annotation at any moment by clicking cancel.

Labels matching

Each model is trained on a dataset and supports only the dataset’s labels.

For example:

DL model has the label car.
Your task (or project) has the label vehicle.

To annotate, you need to match these two labels to give CVAT a hint that, in this case, car = vehicle.

If you have a label that is not on the list of DL labels, you will not be able to match them.

For this reason, supported DL models are suitable only for certain labels.

To check the list of labels for each model, see Models papers and official documentation.

Models

Automatic annotation uses pre-installed and added models.

For self-hosted solutions, you need to install Automatic Annotation first and add models.

List of pre-installed models:

Model	Description
Attributed face detection	Three OpenVINO models work together: Face Detection 0205: face detector based on MobileNetV2 as a backbone with a FCOS head for indoor and outdoor scenes shot by a front-facing camera. Emotions recognition retail 0003: fully convolutional network for recognition of five emotions (‘neutral’, ‘happy’, ‘sad’, ‘surprise’, ‘anger’). Age gender recognition retail 0013: fully convolutional network for simultaneous Age/Gender recognition. The network can recognize the age of people in the [18 - 75] years old range; it is not applicable for children since their faces were not in the training set.
RetinaNet R101	RetinaNet is a one-stage object detection model that utilizes a focal loss function to address class imbalance during training. Focal loss applies a modulating term to the cross entropy loss to focus learning on hard negative examples. RetinaNet is a single, unified network composed of a backbone network and two task-specific subnetworks. For more information, see: Site: RetinaNET
Text detection	Text detector based on PixelLink architecture with MobileNetV2, depth_multiplier=1.4 as a backbone for indoor/outdoor scenes. For more information, see: Site: OpenVINO Text detection 004
YOLO v3	YOLO v3 is a family of object detection architectures and models pre-trained on the COCO dataset. For more information, see: Site: YOLO v3
YOLO v7	YOLOv7 is an advanced object detection model that outperforms other detectors in terms of both speed and accuracy. It can process frames at a rate ranging from 5 to 160 frames per second (FPS) and achieves the highest accuracy with 56.8% average precision (AP) among real-time object detectors running at 30 FPS or higher on the V100 graphics processing unit (GPU). For more information, see: GitHub: YOLO v7 Paper: YOLO v7

Adding models from Hugging Face and Roboflow

In case you did not find the model you need, you can add a model of your choice from Hugging Face or Roboflow.

Note, that you cannot add models from Hugging Face and Roboflow to self-hosted CVAT.

For more information, see Streamline annotation by integrating Hugging Face and Roboflow models.

This video demonstrates the process: