Install Inference Server
Choose the installation method that matches your platform:
The preferred way to use Inference is via Docker (see Why Docker). This works on Linux, macOS, Jetson, and other Docker-capable devices.
Install Docker (and NVIDIA Container Toolkit for GPU acceleration if you have a CUDA-enabled GPU). Then install and run the inference-cli:
pip install inference-cli && inference server start
This automatically chooses and configures the optimal container for your machine.
The --dev flag starts a companion Jupyter notebook server with a quickstart guide on localhost:9002:
inference server start --dev
Device-Specific Documentation
Special installation notes and performance tips by device are available:
Using Your New Server
Once you have the Inference Server running, you can access it via its API or using the Python Inference SDK.
Install the Python Inference SDK:
pip install inference-sdk
Run an example model comparison Workflow on an Inference Server running on your local machine:
from inference_sdk import InferenceHTTPClient
client = InferenceHTTPClient(
api_url="http://localhost:9001", # use local inference server
# api_key="<YOUR API KEY>" # optional to access your private data and models
)
result = client.run_workflow(
workspace_name="roboflow-docs",
workflow_id="model-comparison",
images={
"image": "https://media.roboflow.com/workflows/examples/bleachers.jpg"
},
parameters={
"model1": "rfdetr-small",
"model2": "rfdetr-medium"
}
)
print(result)