Environment Variables

Inference behavior can be controlled by a set of environmental variables. All environmental variables are listed in inference/core/env.py.

Below is a list of environmental variables that require more in-depth explanation.

Environmental Variable	Description	Default
`ONNXRUNTIME_EXECUTION_PROVIDERS`	List of execution providers in priority order. A warning message will be displayed if a provider is not supported on user platform.	See env.py
`SAM2_MAX_EMBEDDING_CACHE_SIZE`	The number of SAM2 embeddings held in memory (GPU memory). Each embedding takes 16777216 bytes.	100
`SAM2_MAX_LOGITS_CACHE_SIZE`	The number of SAM2 logits held in memory (CPU memory). Each logit takes 262144 bytes.	1000
`DISABLE_SAM2_LOGITS_CACHE`	If True, disables SAM2 logits caching. Useful for debugging or minimizing memory usage, but may reduce performance for repeated similar requests.	False
`ENABLE_WORKFLOWS_PROFILING`	If True, allows the server to output Workflows profiler traces to the client. When running with `InferencePipeline`, it enables profiling.	False
`WORKFLOWS_PROFILER_BUFFER_SIZE`	Size of profiler buffer (number of consecutive Workflows Execution Engine `run(...)` invocations to trace).	64
`RUNS_ON_JETSON`	Boolean flag indicating if Inference runs on a Jetson device. Set to `True` in all Docker builds for Jetson architecture.	False
`WORKFLOWS_DEFINITION_CACHE_EXPIRY`	Number of seconds to cache Workflows definitions as a result of `get_workflow_specification(...)` function call.	900 (15 minutes)
`DOCKER_SOCKET_PATH`	Path to the local socket mounted to the container. If provided, enables pooling docker container stats from the docker daemon socket.	Not Set
`ENABLE_PROMETHEUS`	Boolean flag to enable Prometheus `/metrics` endpoint.	True for Docker images
`ENABLE_STREAM_API`	Flag to enable Stream Management API in the inference server.	False
`STREAM_API_PRELOADED_PROCESSES`	Number of idle processes warmed up ready to be workers for `InferencePipeline`. Helps speed up worker process start on GPU.	0
`TRANSIENT_ROBOFLOW_API_ERRORS`	List of (comma-separated) HTTP codes from the Roboflow API that should be retried (only for GET endpoints).	None
`RETRY_CONNECTION_ERRORS_TO_ROBOFLOW_API`	Flag to decide if connection errors for the Roboflow API should be retried (only for GET endpoints).	False
`ROBOFLOW_API_REQUEST_TIMEOUT`	Timeout (in seconds, as integer) for requests to the Roboflow API.	None
`TRANSIENT_ROBOFLOW_API_ERRORS_RETRIES`	Number of times transient errors to the Roboflow API will be retried (only for GET endpoints).	3
`TRANSIENT_ROBOFLOW_API_ERRORS_RETRY_INTERVAL`	Delay interval of retries of Roboflow API requests (only for GET endpoints).	3
`METRICS_ENABLED`	Flag to control Roboflow Model Monitoring.	True
`MODEL_VALIDATION_DISABLED`	Flag that can make model loading faster by skipping trial inference.	False
`DISABLE_VERSION_CHECK`	Flag to disable Inference version check in background thread.	False
`USE_INFERENCE_MODELS`	Flag to select `inference-models` backend.	False
`MAX_INFERENCE_MODELS_CACHE_SIZE_MB`	When set with value > 0, enables `inference-models` cache watchdog to run scheduled cycles of verification for disk space occupied by models and prunes oldest/biggest model artifacts to prevent running out of space. Only applicable when `USE_INFERENCE_MODELS=True`.	-1
`INFERENCE_MODELS_CACHE_WATCHDOG_INTERVAL_MINUTES`	Frequency of `inference-models` cache watchdog cycles. Minimum is 15 minutes.	60