Supported Models
Roboflow Inference supports a wide range of models for computer vision tasks. You can run fine-tuned models trained on your own data, pre-trained models from Roboflow Universe, and foundation models for zero-shot tasks.
Quick Start
Using a model with Inference is straightforward:
from inference import get_model
model = get_model(model_id="rfdetr-small")
results = model.infer("https://media.roboflow.com/inference/people-walking.jpg")
Inference provides convenient aliases for common pre-trained models. See the Pre-Trained Aliases page for a full list of available model IDs.
Model Categories
Fine-Tuned Models
Fine-tuned models are trained on your own data using Roboflow or external training pipelines. Inference supports the following fine-tuned model architectures:
| Architecture | Tasks |
|---|---|
| RF-DETR | Object Detection |
| YOLO26 | Object Detection, Instance Segmentation, Keypoint Detection |
| YOLOv11 | Object Detection, Image Segmentation, Keypoint Detection |
| YOLOv10 | Object Detection |
| YOLOv9 | Object Detection |
| YOLOv8 | Object Detection, Classification, Image Segmentation, Keypoint Detection |
| YOLOv7 | Classification |
| YOLOv5 | Object Detection, Classification, Image Segmentation |
| YOLO-NAS | Object Detection, Keypoint Detection |
Foundation Models
Foundation models are pre-trained on large datasets and can be used out of the box for a variety of tasks without additional training. See the Foundation Models overview for more details.
| Model | Use Case |
|---|---|
| CLIP | Image classification, embedding similarity |
| Grounding DINO | Zero-shot object detection |
| SAM 3 | Open-vocabulary segmentation |
| SAM 2 | Interactive image segmentation |
| SAM | Image segmentation |
| Florence-2 | Object detection, captioning, OCR, and more |
| YOLO-World | Zero-shot object detection |
| PaliGemma | VQA, object detection, segmentation |
| Qwen 3.5-VL | Multimodal understanding, VQA |
| SmolVLM2 | VQA, document OCR |
| Moondream2 | Image captioning, VQA |
| DocTR | OCR |
| TrOCR | OCR (single-line text) |
| GLM-OCR | OCR |
| OWLv2 | Few-shot object detection |
| Perception Encoder | Image/text embeddings |
| Depth Estimation | Depth maps |
| Gaze Detection | Gaze estimation |
Universe Models
Roboflow Universe hosts over 50,000 pre-trained models shared by the community. You can use any of these models with Inference.
Pre-Trained Model Aliases
Inference provides convenient IDs for common pre-trained models that do not require an API key. See the full list on the Pre-Trained Aliases page.