Published on
Monday, May 8, 2023

Review YOLO-NAS - Search for a better YOLO

929 words5 min read
  • avatar
    Viet Anh

Deci AI has used their Neural Architecture Search engine, AutoNAC to create a new YOLO-NAS model. This model surpasses all other SOTA YOLOs in terms of speed and accuracy, including YOLOv5, YOLOv6, YOLOv7, and recently launched YOLOv8. This post will review this exciting new model.

Efficiency Frontier plot for object detection on the COCO2017 dataset (validation) comparing YOLO-NAS vs. other YOLO architectures.

YOLO-NAS is approximately 0.5 mAP points more accurate and 10-20% faster than equivalent versions of YOLOv8 and YOLOv7.

Advancement 1: Architecture Search for YOLO

AutoNAC™, Deci's proprietary Neural Architecture Search technology, was responsible for generating the YOLO-NAS model. This framework is designed to optimize the inference speed and accuracy of deep neural networks considering the given task and hardware constraints. By using AutoNAC™, Deci was able to create a new YOLO architecture that is more accurate and faster than SOTA YOLO models, including YOLOv8.

Advancement 2: Ready for Post-Training Optimization

VGG is a very famous traditional CNN architecture. It is known for its simplicity and effectiveness. However, it is not very efficient in terms of speed and memory. To solve this problem, RepVGG was proposed. RepVGG is a simple yet powerful architecture that is based on VGG. It is designed to be more efficient in terms of speed and memory. With re-parameterization, RepVGG can be trained with multi-branch architecture and then converted to a single branch for faster inference.


Sketch of RepVGG architecture - image from the original paper.

By using RepVGG, the architecture of YOLO-NAS can be optimized after training by re-parameterization and is also compatible with Post-training Quantization. This is a very important feature for production deployment. The model can be trained with full precision and then optimized for inference speed and memory usage.

Advancement 3: Quantization-Aware Training

Post-training Quantization enables users to create a highly efficient quantized integer model for inference. However, despite careful post-training calibration, model accuracies may be compromised to the extent that is unacceptable. When this occurs, post-training calibration alone is insufficient for generating a quantized integer model. Instead, it becomes necessary to train the model in a manner that accounts for the quantization effect. This is where Quantization Aware Training comes in, as it has the capability to model the quantization effect during training.

Steps in Quantization-Aware Training. Source:

Utilizing quantization-aware blocks and selective Quantization, YOLO-NAS employs an architecture that optimizes its performance. The design of this model includes adaptive Quantization, which skips Quantization in certain layers depending on the balance between accuracy loss and latency/throughput improvement. When the model is converted to its INT8 quantized version, YOLO-NAS experiences a smaller precision drop compared to other models, losing only 0.51, 0.65, and 0.45 points of mAP for its S, M, and L variants, respectively. This is in contrast to other models that experience a loss of 1-2 mAP points during Quantization. These innovative techniques contribute to an architecture with exceptional object detection capabilities and superior performance.

Advancement 4: Training Strategy

The training process of YOLO-NAS was enhanced by different techniques such as Preuso-labeled Data, Knowledge Distillation, and Distribution Focal Loss. With these techniques, they can create SOTA pre-trained models with the Object365 dataset, in 25-40 epochs, depending on the model variant.

Distribution Focal Loss utilizes box regression as a classification task by discretizing box predictions into a set of finite values. It then predicts probability distributions over these values, which are ultimately transformed into final predictions via a weighted sum.

Knowledge Distillation is a technique that transfers knowledge from a large model to a smaller model. By this method, the lightweight models can achieve better performance by learning the knowledge (probability distribution) from the large model instead of from the dataset only. In the case of YOLO-NAS, the student model learns from both the classification and DFL prediction of the teacher model.

Knowledge Distillation Mechanism


Deci AI released super-gradients library - the easy way to use YOLO-NAS. You can try it on your own data and see the results.

import super_gradients

yolo_nas ="yolo_nas_l", pretrained_weights="coco").cuda()

Some results from YOLO-NAS L:


Football image from Canva


Street from Unsplash

Training your own models

YOLO-NAS provides source code and documentation for finetuning the model and training it from scratch. The training code supports both quantization-aware training and post-training Quantization.

License and Commercial Use

The source code for YOLO-NAS is available under the Apache 2.0 license - integrated into the super-gradients library. However, the pre-trained weights are available for non-commercial use only. Read more at YOLO-NAS WEIGHTS LICENSE. Therefore, if you are going to use this model for a commercial project, you may need to check the license and retrain the model from scratch.


YOLO-NAS is a new state-of-the-art object detection model that is faster and more accurate than previous YOLO models. It is also more efficient in terms of memory usage and inference speed. This project was developed "with production use in mind" as they support inference engines like NVIDIA TensorRT seamlessly. Let's try this model today for your projects and see how it performs!