Environment Variables

Inference behavior can be controlled by a set of environmental variables. All environmental variables are listed in inference/core/env.py.

Below is a list of environmental variables that require more in-depth explanation.

Environmental VariableDescriptionDefault
ONNXRUNTIME_EXECUTION_PROVIDERSList of execution providers in priority order. A warning message will be displayed if a provider is not supported on user platform.See env.py
SAM2_MAX_EMBEDDING_CACHE_SIZEThe number of SAM2 embeddings held in memory (GPU memory). Each embedding takes 16777216 bytes.100
SAM2_MAX_LOGITS_CACHE_SIZEThe number of SAM2 logits held in memory (CPU memory). Each logit takes 262144 bytes.1000
DISABLE_SAM2_LOGITS_CACHEIf True, disables SAM2 logits caching. Useful for debugging or minimizing memory usage, but may reduce performance for repeated similar requests.False
ENABLE_WORKFLOWS_PROFILINGIf True, allows the server to output Workflows profiler traces to the client. When running with InferencePipeline, it enables profiling.False
WORKFLOWS_PROFILER_BUFFER_SIZESize of profiler buffer (number of consecutive Workflows Execution Engine run(...) invocations to trace).64
RUNS_ON_JETSONBoolean flag indicating if Inference runs on a Jetson device. Set to True in all Docker builds for Jetson architecture.False
WORKFLOWS_DEFINITION_CACHE_EXPIRYNumber of seconds to cache Workflows definitions as a result of get_workflow_specification(...) function call.900 (15 minutes)
DOCKER_SOCKET_PATHPath to the local socket mounted to the container. If provided, enables pooling docker container stats from the docker daemon socket.Not Set
ENABLE_PROMETHEUSBoolean flag to enable Prometheus /metrics endpoint.True for Docker images
ENABLE_STREAM_APIFlag to enable Stream Management API in the inference server.False
STREAM_API_PRELOADED_PROCESSESNumber of idle processes warmed up ready to be workers for InferencePipeline. Helps speed up worker process start on GPU.0
TRANSIENT_ROBOFLOW_API_ERRORSList of (comma-separated) HTTP codes from the Roboflow API that should be retried (only for GET endpoints).None
RETRY_CONNECTION_ERRORS_TO_ROBOFLOW_APIFlag to decide if connection errors for the Roboflow API should be retried (only for GET endpoints).False
ROBOFLOW_API_REQUEST_TIMEOUTTimeout (in seconds, as integer) for requests to the Roboflow API.None
TRANSIENT_ROBOFLOW_API_ERRORS_RETRIESNumber of times transient errors to the Roboflow API will be retried (only for GET endpoints).3
TRANSIENT_ROBOFLOW_API_ERRORS_RETRY_INTERVALDelay interval of retries of Roboflow API requests (only for GET endpoints).3
METRICS_ENABLEDFlag to control Roboflow Model Monitoring.True
MODEL_VALIDATION_DISABLEDFlag that can make model loading faster by skipping trial inference.False
DISABLE_VERSION_CHECKFlag to disable Inference version check in background thread.False
USE_INFERENCE_MODELSFlag to select inference-models backend.False
MAX_INFERENCE_MODELS_CACHE_SIZE_MBWhen set with value > 0, enables inference-models cache watchdog to run scheduled cycles of verification for disk space occupied by models and prunes oldest/biggest model artifacts to prevent running out of space. Only applicable when USE_INFERENCE_MODELS=True.-1
INFERENCE_MODELS_CACHE_WATCHDOG_INTERVAL_MINUTESFrequency of inference-models cache watchdog cycles. Minimum is 15 minutes.60