Install Inference Server

Choose the installation method that matches your platform:

The preferred way to use Inference is via Docker (see Why Docker). This works on Linux, macOS, Jetson, and other Docker-capable devices.

Install Docker (and NVIDIA Container Toolkit for GPU acceleration if you have a CUDA-enabled GPU). Then install and run the inference-cli:

pip install inference-cli && inference server start

This automatically chooses and configures the optimal container for your machine.

The --dev flag starts a companion Jupyter notebook server with a quickstart guide on localhost:9002:

inference server start --dev

Device-Specific Documentation

Special installation notes and performance tips by device are available:

Using Your New Server

Once you have the Inference Server running, you can access it via its API or using the Python Inference SDK.

Install the Python Inference SDK:

pip install inference-sdk

Run an example model comparison Workflow on an Inference Server running on your local machine:

from inference_sdk import InferenceHTTPClient

client = InferenceHTTPClient(
    api_url="http://localhost:9001", # use local inference server
    # api_key="<YOUR API KEY>" # optional to access your private data and models
)

result = client.run_workflow(
    workspace_name="roboflow-docs",
    workflow_id="model-comparison",
    images={
        "image": "https://media.roboflow.com/workflows/examples/bleachers.jpg"
    },
    parameters={
        "model1": "rfdetr-small",
        "model2": "rfdetr-medium"
    }
)

print(result)