Serverless Hosted API

Models deployed to Roboflow have a REST API available through which you can run inference on images. This deployment method is ideal for environments where you have a persistent internet connection on your deployment device.

By far the easiest way to get started is with Roboflow's managed services. You can jump straight to building without having to set up any infrastructure.

You can use the Serverless Hosted API:

Inference Server

Our Serverless Hosted API is powered by the Inference Server. This means you can easily switch between our Serverless Hosted API and self-hosting option and vice versa, as shown below:

from inference_sdk import InferenceHTTPClient

CLIENT = InferenceHTTPClient(
    # api_url="http://localhost:9001" # Self-hosted Inference server
    api_url="https://serverless.roboflow.com", # Our Serverless hosted API
    api_key="API_KEY" # optional to access your private models and data
)

result = CLIENT.infer("image.jpg", model_id="model-id/1")
print(result)

Serverless vs Dedicated

The Serverless Hosted API supports running Workflows on pre-trained and fine-tuned models, chaining models, basic logic, visualizations, and external integrations.

It supports cloud-hosted VLMs like ChatGPT and Anthropic Claude, but does not support running heavy models like Florence-2 or SAM 2. It also does not support streaming video.

The Serverless API scales down to zero when you're not using it (and up to infinity under load) with quick (a couple of seconds) cold-start time. You pay per model inference with no minimums. Roboflow's free tier credits may be used.

For heavier workloads, consider Dedicated Deployments which provide single-tenant virtual machines with optional GPU support.

Limits

Our Serverless Hosted API supports file uploads up to 20MB. You may run into limitations with higher resolution images. Should you run into an issue, please reach out to your enterprise support contact or post a message to the forum.

Large Images

In cases where requests are too large, we recommend downsizing any attached images. This usually will not result in poor performance as images are downsized regardless after they've been received on our servers to the input size that the model architecture accepts.

Some of our SDKs, like the Python SDK, automatically downsize images to the model architecture's input size before they are sent to the API.