Deploying Inference to Cloud

You can deploy Roboflow Inference containers to virtual machines in the cloud. These VMs are configured to run CPU or GPU-based Inference servers under the hood, so you don't have to deal with OS/GPU drivers/docker installations.

The Inference CLI currently supports deploying the Roboflow Inference container images into a virtual machine running on Google (GCP) or Amazon Cloud (AWS).

The Roboflow Inference CLI assumes the corresponding cloud CLI is configured for the project you want to deploy the virtual machine into. Read instructions for setting up Google/GCP - gcloud cli or the Amazon/AWS aws cli.

Roboflow Inference cloud deploy is powered by the popular Skypilot project.

Warning

Make sure cloud-deploy extras are installed: pip install "inference-cli[cloud-deploy]"

Note

To check detail of the command, run inference cloud --help. Help is also available for each sub-command: inference cloud deploy --help.

inference cloud deploy

Deploy GPU or CPU inference to AWS or GCP:

# Deploy the Roboflow Inference GPU container into a GPU-enabled VM in AWS
inference cloud deploy --provider aws --compute-type gpu
# Deploy the Roboflow Inference CPU container into a CPU-only VM in GCP
inference cloud deploy --provider gcp --compute-type cpu

Note the "cluster name" printed after the deployment completes. This handle is used in many subsequent commands. The deploy command also prints helpful debug and cost information about your VM.

Deploying Inference into a cloud VM will also print out an endpoint of the form http://1.2.3.4:9001; you can now run inferences against this endpoint.

Warning

Port 9001 is automatically opened -- check with your security admin if this is acceptable for your cloud/project.

inference cloud status

To check the status of your deployment:

inference cloud status

Stop and start deployments

inference cloud start <deployment_handle>
# Stop the VM; you only pay for disk storage while the VM is stopped
inference cloud stop <deployment_handle>

inference cloud undeploy

To delete (undeploy) your deployment:

inference cloud undeploy <deployment_handle>

SSH into the cloud deployment

ssh <deployment_handle>

The required SSH key is automatically added to your ~/.ssh/config -- you don't need to configure this manually.

Cloud Deploy Customization

Roboflow Inference cloud deploy creates VMs based on internally tested templates. For advanced use cases, you can use your own sky yaml template:

inference cloud deploy --custom /path/to/sky-template.yaml

Download the standard template and modify it:

# This command will print out the standard gcp/cpu sky template
inference cloud deploy --dry-run --provider gcp --compute-type cpu

Roboflow Inference deploy currently supports AWS and GCP. Please open an issue on the Inference GitHub repository if you would like to see other cloud providers supported.