Python Inference SDK
The inference-sdk Python package provides client and utility implementations to interface with Roboflow Inference. You can use it to develop with Roboflow Inference regardless of whether you are using the Serverless / Hosted API, Dedicated Deployments, are self hosting, or deploying to an edge device.
The InferenceHTTPClient enables you to interact with an Inference Server over HTTP -- hosted either by Roboflow or on your own hardware.
pip install inference-sdk
Quickstart
You can run inference on images from URLs, file paths, PIL images, and NumPy arrays.
from inference_sdk import InferenceHTTPClient
import os
image_url = "https://media.roboflow.com/inference/soccer.jpg"
client = InferenceHTTPClient(
api_url="https://serverless.roboflow.com",
api_key=os.environ["API_KEY"],
)
results = client.infer(image_url, model_id="soccer-players-5fuqs/1")
print(results)
On the first request, the model weights will be downloaded and set up with your local inference server. This request may take some time depending on your network connection and the size of the model. Once your model has downloaded, subsequent requests will be much faster.
Self-Hosted Server
You can also self-host the Inference server and then change api_url in the InferenceHTTPClient:
client = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key=os.environ["API_KEY"],
)
AsyncIO Client
import asyncio
from inference_sdk import InferenceHTTPClient
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"
loop = asyncio.get_event_loop()
result = loop.run_until_complete(
CLIENT.infer_async(image_url, model_id="soccer-players-5fuqs/1")
)
Parallel / Batch Inference
You may want to predict against multiple images in a single call. There are two parameters of InferenceConfiguration that specify batching and parallelism options:
max_concurrent_requests— max number of concurrent requests that can be startedmax_batch_size— max number of elements that can be injected into a single request
from inference_sdk import InferenceHTTPClient
image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
predictions = CLIENT.infer([image_url] * 5, model_id="soccer-players-5fuqs/1")
print(predictions)
Methods that support batching / parallelism:
infer(...)andinfer_async(...)ocr_image(...)andocr_image_async(...)(enforcingmax_batch_size=1)detect_gazes(...)anddetect_gazes_async(...)get_clip_image_embeddings(...)andget_clip_image_embeddings_async(...)
Inference Against Stream
You can infer against video or a directory of images:
from inference_sdk import InferenceHTTPClient
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
for frame_id, frame, prediction in CLIENT.infer_on_stream("video.mp4", model_id="soccer-players-5fuqs/1"):
# frame_id is the number of frame
# frame - np.ndarray with video frame
# prediction - prediction from the model
pass
for file_path, image, prediction in CLIENT.infer_on_stream("local/dir/", model_id="soccer-players-5fuqs/1"):
# file_path - path to the image
# frame - np.ndarray with video frame
# prediction - prediction from the model
pass
What is Returned as Prediction?
inference_client returns plain Python dictionaries that are responses from the model serving API. Modification is done only in the context of the visualization key that keeps server-generated prediction visualization (it can be transcoded to the format of choice) and in terms of client-side re-scaling.