Self-Hosted Deployment

You can deploy models and Workflows on your own hardware with Roboflow Inference, our edge inference server. This is ideal if you need your models to run in real time on the edge.

You will need an internet connection to set up a model on your hardware. You will also need a persistent internet connection if you are deploying models on devices managed by Device Manager.

Installation

Choose the installation method that matches your platform:

The preferred way to use Inference is via Docker. This works on Linux, macOS, Jetson, and other Docker-capable devices.

Install Docker (and NVIDIA Container Toolkit for GPU acceleration if you have a CUDA-enabled GPU). Then install and run inference-cli:

pip install inference-cli && inference server start

This automatically chooses and configures the optimal container for your machine.

The --dev flag starts a companion Jupyter notebook server with a quickstart guide on localhost:9002:

inference server start --dev

Device-Specific Documentation

Special installation notes and performance tips by device are available in the Installation guide:

Using Your Server

Once you have the Inference server running, you can access it via its API or using the Python Inference SDK.

Install the Python Inference SDK:

pip install inference-sdk

Run an example model comparison Workflow on an Inference Server running on your local machine:

from inference_sdk import InferenceHTTPClient

client = InferenceHTTPClient(
    api_url="http://localhost:9001", # use local inference server
    # api_key="<YOUR API KEY>" # optional to access your private data and models
)

result = client.run_workflow(
    workspace_name="roboflow-docs",
    workflow_id="model-comparison",
    images={
        "image": "https://media.roboflow.com/workflows/examples/bleachers.jpg"
    },
    parameters={
        "model1": "rfdetr-small",
        "model2": "rfdetr-medium"
    }
)

print(result)