Inference Server
The inference Python package is the core library that powers Roboflow's computer vision deployment stack. It provides model loading, pre/post-processing, GPU/CPU optimization, and Workflows execution, callable directly from Python.
The Inference Server wraps this package and exposes it over HTTP (distributed as a Docker image with all dependencies installed), but you can also use inference directly in your own scripts and applications.
Multi-Backend Support
Inference 1.0 supports multiple inference runtime backends: ONNX, TensorRT, Hugging Face, and PyTorch. It automatically selects the fastest available backend for your hardware. For example, if you have an NVIDIA GPU or running on a Jetson device and a TensorRT engine is available for the model on your platform, Inference will use TensorRT by default.
Installation
pip install inference
For GPU support:
pip install inference-gpu
Quick Example
get_model() loads a model by its ID, downloads the weights, and returns a model object you can call .infer() on.
from inference import get_model
model = get_model(model_id="rfdetr-small")
results = model.infer("https://media.roboflow.com/inference/people-walking.jpg")
To use models that require an API key, set the ROBOFLOW_API_KEY environment variable or pass it directly:
model = get_model(model_id="my-project/1", api_key="ROBOFLOW_API_KEY")
Inference Pipeline
InferencePipeline provides a streaming interface for running inference on video sources -- webcams, RTSP streams, video files, and more. See the Inference Pipeline page for details.
from inference import InferencePipeline
from inference.core.interfaces.stream.sinks import render_boxes
pipeline = InferencePipeline.init(
model_id="rfdetr-large",
video_reference="https://storage.googleapis.com/com-roboflow-marketing/inference/people-walking.mp4",
on_prediction=render_boxes,
api_key="ROBOFLOW_API_KEY",
)
pipeline.start()
pipeline.join()