Inference Benchmarks
This page contains performance benchmarks for various models running with Inference on different hardware platforms.
NVIDIA L4 GPU
Object Detection
| Model Type | Size | Inference/sec (ONNX) | Inference/sec (TRT) | Improvement |
|---|---|---|---|---|
| rfdetr-nano | 384x384 | 103.5 | 299.5 | 2.9x |
| rfdetr-small | 512x512 | 57.4 | 253.4 | 4.4x |
| rfdetr-medium | 576x576 | 62.8 | 201.8 | 3.2x |
| rfdetr-large | 704x704 | 36.9 | 160.5 | 4.3x |
| rfdetr-xlarge | 700x700 | 18.1 | 96.1 | 5.3x |
| rfdetr-2xlarge | 880x880 | 17.4 | 74.1 | 4.3x |
Segmentation
| Model Type | Size | Inference/sec (ONNX) | Inference/sec (TRT) | Improvement |
|---|---|---|---|---|
| rfdetr-seg-nano | 312x312 | 51.9 | 105.3 | 2.0x |
| rfdetr-seg-small | 384x384 | 57.5 | 126.7 | 2.2x |
| rfdetr-seg-medium | 432x432 | 39.7 | 99.7 | 2.5x |
| rfdetr-seg-large | 504x504 | 32.8 | 93.2 | 2.8x |
| rfdetr-seg-xlarge | 624x624 | 17 | 68.8 | 4.0x |
| rfdetr-seg-2xlarge | 768x768 | 10.7 | 59 | 5.5x |
Classification
| Model Type | Size | Inference/sec (ONNX) | Inference/sec (TRT) | Improvement |
|---|---|---|---|---|
| ResNet50 | 224x224 | 358.6 | 600.8 | 1.7x |
| ViT | 224x224 | 238 | 306.5 | 1.3x |
Jetson Orin NX
Object Detection
| Model Type | Size | Inference/sec (ONNX) | Inference/sec (TRT) | Improvement |
|---|---|---|---|---|
| rfdetr-nano | 384x384 | 21.2 | 78.5 | 3.7x |
| rfdetr-small | 512x512 | 13.9 | 52.5 | 3.8x |
| rfdetr-medium | 576x576 | 11 | 44 | 4.0x |
| yolov8n-640 | 640x640 | 35.6 | 89 | 2.5x |
| yolov8s-640 | 640x640 | 26.3 | 69.5 | 2.6x |
| yolov8m-640 | 640x640 | 13.5 | 44.5 | 3.3x |
| yolov8l-640 | 640x640 | 9 | 32.5 | 3.6x |
| yolov8x-640 | 640x640 | 6.4 | 22 | 3.4x |
Benchmark Methodology
All benchmarks were conducted using the Roboflow Inference CLI. All were used single image (batch size=1) and on 500 iterations.
Inference/sec (ONNX) -- Standard ONNX runtime:
inference benchmark python-package-speed -m [model]Inference/sec (TRT) -- TensorRT-optimized adapters (supported in
inference 1.0and later):USE_INFERENCE_MODELS=TRUE inference benchmark python-package-speed -m [model]