Install on NVIDIA Jetson
Overview
Jetson is NVIDIA's line of compact, power-efficient modules designed to run AI and deep learning workloads at the edge. They combine a GPU, CPU, and neural accelerators on a single board, making them ideal for robotics, drones, smart cameras, and other embedded applications where you need real-time computer vision or inference without a cloud connection. For more details, see NVIDIA's official Jetson overview.
Prerequisites
Disk Space: Allocate at least 10 GB free for the Roboflow Jetson image (8.14 GB).
JetPack Version: Must be running a supported JetPack (4.5, 4.6, 5.x, or 6.x).
Recommended Hardware: For best performance while running Inference, we recommend an NVIDIA Orin NX 16 GB or above.
Docker & NVIDIA Container Toolkit:
- Requires Docker + NVIDIA runtime so containers can access the GPU.
- Docker install: https://docs.docker.com/engine/install/ubuntu/
- NVIDIA Container Toolkit: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
We have specialized containers built with support for hardware acceleration on JetPack L4T. To automatically detect your JetPack version and use the right container with good default settings run:
pip install inference-cli
inference server start
Manually Starting the Container
If you want more control of the container settings you can also start it manually. Jetson devices with NVIDIA Jetpack are pre-configured with NVIDIA Container Runtime and will be hardware accelerated out of the box:
sudo docker run -d \
--name inference-server \
--runtime nvidia \
--read-only \
-p 9001:9001 \
--volume ~/.inference/cache:/tmp:rw \
--security-opt="no-new-privileges" \
--cap-drop="ALL" \
--cap-add="NET_BIND_SERVICE" \
roboflow/roboflow-inference-server-jetson-6.2.0:latest
TensorRT
You can optionally enable TensorRT, NVIDIA's model optimization runtime that will greatly increase your models' speed at the expense of a heavy compilation and optimization step (sometimes 15+ minutes) the first time you load each model.
Enable TensorRT by adding TensorrtExecutionProvider to the ONNXRUNTIME_EXECUTION_PROVIDERS environment variable.
sudo docker run -d \
--name inference-server \
--runtime nvidia \
--read-only \
-p 9001:9001 \
--volume ~/.inference/cache:/tmp:rw \
--security-opt="no-new-privileges" \
--cap-drop="ALL" \
--cap-add="NET_BIND_SERVICE" \
-e ONNXRUNTIME_EXECUTION_PROVIDERS="[TensorrtExecutionProvider,CUDAExecutionProvider,CPUExecutionProvider]" \
roboflow/roboflow-inference-server-jetson-6.2.0:latest
Docker Compose
If you are using Docker Compose for your application, the equivalent yaml is:
version: "3.9"
services:
inference-server:
container_name: inference-server
image: roboflow/roboflow-inference-server-jetson-6.2.0:latest
read_only: true
ports:
- "9001:9001"
volumes:
- "${HOME}/.inference/cache:/tmp:rw"
runtime: nvidia
# Optionally: uncomment the following lines to enable TensorRT:
# environment:
# ONNXRUNTIME_EXECUTION_PROVIDERS: "[TensorrtExecutionProvider,CUDAExecutionProvider,CPUExecutionProvider]"
security_opt:
- no-new-privileges
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE
Using Your New Server
See Using Your New Server for next steps.
Enterprise Considerations
A Helm Chart is available for enterprise cloud deployments. Enterprise networking solutions to support deployment in OT networks are also available upon request.
Roboflow also offers customized support and installation packages and a pre-configured Jetson-based edge device suitable for rapid prototyping. Contact our sales team if you're part of a large organization and interested in learning more.