Inference Benchmarks

This page contains performance benchmarks for various models running with Inference on different hardware platforms.

NVIDIA L4 GPU

Object Detection

Model TypeSizeInference/sec (ONNX)Inference/sec (TRT)Improvement
rfdetr-nano384x384103.5299.52.9x
rfdetr-small512x51257.4253.44.4x
rfdetr-medium576x57662.8201.83.2x
rfdetr-large704x70436.9160.54.3x
rfdetr-xlarge700x70018.196.15.3x
rfdetr-2xlarge880x88017.474.14.3x

Segmentation

Model TypeSizeInference/sec (ONNX)Inference/sec (TRT)Improvement
rfdetr-seg-nano312x31251.9105.32.0x
rfdetr-seg-small384x38457.5126.72.2x
rfdetr-seg-medium432x43239.799.72.5x
rfdetr-seg-large504x50432.893.22.8x
rfdetr-seg-xlarge624x6241768.84.0x
rfdetr-seg-2xlarge768x76810.7595.5x

Classification

Model TypeSizeInference/sec (ONNX)Inference/sec (TRT)Improvement
ResNet50224x224358.6600.81.7x
ViT224x224238306.51.3x

Jetson Orin NX

Object Detection

Model TypeSizeInference/sec (ONNX)Inference/sec (TRT)Improvement
rfdetr-nano384x38421.278.53.7x
rfdetr-small512x51213.952.53.8x
rfdetr-medium576x57611444.0x
yolov8n-640640x64035.6892.5x
yolov8s-640640x64026.369.52.6x
yolov8m-640640x64013.544.53.3x
yolov8l-640640x640932.53.6x
yolov8x-640640x6406.4223.4x

Benchmark Methodology

All benchmarks were conducted using the Roboflow Inference CLI. All were used single image (batch size=1) and on 500 iterations.

  • Inference/sec (ONNX) -- Standard ONNX runtime:

    inference benchmark python-package-speed -m [model]
    
  • Inference/sec (TRT) -- TensorRT-optimized adapters (supported in inference 1.0 and later):

    USE_INFERENCE_MODELS=TRUE inference benchmark python-package-speed -m [model]