Inference Benchmarks

This page contains performance benchmarks for various models running with Inference on different hardware platforms.

NVIDIA L4 GPU

Model Type	Size	Inference/sec (ONNX)	Inference/sec (TRT)	Improvement
rfdetr-nano	384x384	103.5	299.5	2.9x
rfdetr-small	512x512	57.4	253.4	4.4x
rfdetr-medium	576x576	62.8	201.8	3.2x
rfdetr-large	704x704	36.9	160.5	4.3x
rfdetr-xlarge	700x700	18.1	96.1	5.3x
rfdetr-2xlarge	880x880	17.4	74.1	4.3x

Model Type	Size	Inference/sec (ONNX)	Inference/sec (TRT)	Improvement
rfdetr-seg-nano	312x312	51.9	105.3	2.0x
rfdetr-seg-small	384x384	57.5	126.7	2.2x
rfdetr-seg-medium	432x432	39.7	99.7	2.5x
rfdetr-seg-large	504x504	32.8	93.2	2.8x
rfdetr-seg-xlarge	624x624	17	68.8	4.0x
rfdetr-seg-2xlarge	768x768	10.7	59	5.5x

Model Type	Size	Inference/sec (ONNX)	Inference/sec (TRT)	Improvement
ResNet50	224x224	358.6	600.8	1.7x
ViT	224x224	238	306.5	1.3x

Model Type	Size	Inference/sec (ONNX)	Inference/sec (TRT)	Improvement
rfdetr-nano	384x384	21.2	78.5	3.7x
rfdetr-small	512x512	13.9	52.5	3.8x
rfdetr-medium	576x576	11	44	4.0x
yolov8n-640	640x640	35.6	89	2.5x
yolov8s-640	640x640	26.3	69.5	2.6x
yolov8m-640	640x640	13.5	44.5	3.3x
yolov8l-640	640x640	9	32.5	3.6x
yolov8x-640	640x640	6.4	22	3.4x

All benchmarks were conducted using the Roboflow Inference CLI. All were used single image (batch size=1) and on 500 iterations.

Inference/sec (ONNX) -- Standard ONNX runtime:

inference benchmark python-package-speed -m [model]

Inference/sec (TRT) -- TensorRT-optimized adapters (supported in inference 1.0 and later):
```
USE_INFERENCE_MODELS=TRUE inference benchmark python-package-speed -m [model]
```