Inference Server

The inference Python package is the core library that powers Roboflow's computer vision deployment stack. It provides model loading, pre/post-processing, GPU/CPU optimization, and Workflows execution, callable directly from Python.

The Inference Server wraps this package and exposes it over HTTP (distributed as a Docker image with all dependencies installed), but you can also use inference directly in your own scripts and applications.

Multi-Backend Support

Inference 1.0 supports multiple inference runtime backends: ONNX, TensorRT, Hugging Face, and PyTorch. It automatically selects the fastest available backend for your hardware. For example, if you have an NVIDIA GPU or running on a Jetson device and a TensorRT engine is available for the model on your platform, Inference will use TensorRT by default.

Installation

pip install inference

For GPU support:

pip install inference-gpu

Quick Example

get_model() loads a model by its ID, downloads the weights, and returns a model object you can call .infer() on.

from inference import get_model

model = get_model(model_id="rfdetr-small")
results = model.infer("https://media.roboflow.com/inference/people-walking.jpg")

To use models that require an API key, set the ROBOFLOW_API_KEY environment variable or pass it directly:

model = get_model(model_id="my-project/1", api_key="ROBOFLOW_API_KEY")

Inference Pipeline

InferencePipeline provides a streaming interface for running inference on video sources -- webcams, RTSP streams, video files, and more. See the Inference Pipeline page for details.

from inference import InferencePipeline
from inference.core.interfaces.stream.sinks import render_boxes

pipeline = InferencePipeline.init(
    model_id="rfdetr-large",
    video_reference="https://storage.googleapis.com/com-roboflow-marketing/inference/people-walking.mp4",
    on_prediction=render_boxes,
    api_key="ROBOFLOW_API_KEY",
)

pipeline.start()
pipeline.join()

Further Reading