Environment Variables
Inference behavior can be controlled by a set of environmental variables. All environmental variables are listed in inference/core/env.py.
Below is a list of environmental variables that require more in-depth explanation.
| Environmental Variable | Description | Default |
|---|---|---|
ONNXRUNTIME_EXECUTION_PROVIDERS | List of execution providers in priority order. A warning message will be displayed if a provider is not supported on user platform. | See env.py |
SAM2_MAX_EMBEDDING_CACHE_SIZE | The number of SAM2 embeddings held in memory (GPU memory). Each embedding takes 16777216 bytes. | 100 |
SAM2_MAX_LOGITS_CACHE_SIZE | The number of SAM2 logits held in memory (CPU memory). Each logit takes 262144 bytes. | 1000 |
DISABLE_SAM2_LOGITS_CACHE | If True, disables SAM2 logits caching. Useful for debugging or minimizing memory usage, but may reduce performance for repeated similar requests. | False |
ENABLE_WORKFLOWS_PROFILING | If True, allows the server to output Workflows profiler traces to the client. When running with InferencePipeline, it enables profiling. | False |
WORKFLOWS_PROFILER_BUFFER_SIZE | Size of profiler buffer (number of consecutive Workflows Execution Engine run(...) invocations to trace). | 64 |
RUNS_ON_JETSON | Boolean flag indicating if Inference runs on a Jetson device. Set to True in all Docker builds for Jetson architecture. | False |
WORKFLOWS_DEFINITION_CACHE_EXPIRY | Number of seconds to cache Workflows definitions as a result of get_workflow_specification(...) function call. | 900 (15 minutes) |
DOCKER_SOCKET_PATH | Path to the local socket mounted to the container. If provided, enables pooling docker container stats from the docker daemon socket. | Not Set |
ENABLE_PROMETHEUS | Boolean flag to enable Prometheus /metrics endpoint. | True for Docker images |
ENABLE_STREAM_API | Flag to enable Stream Management API in the inference server. | False |
STREAM_API_PRELOADED_PROCESSES | Number of idle processes warmed up ready to be workers for InferencePipeline. Helps speed up worker process start on GPU. | 0 |
TRANSIENT_ROBOFLOW_API_ERRORS | List of (comma-separated) HTTP codes from the Roboflow API that should be retried (only for GET endpoints). | None |
RETRY_CONNECTION_ERRORS_TO_ROBOFLOW_API | Flag to decide if connection errors for the Roboflow API should be retried (only for GET endpoints). | False |
ROBOFLOW_API_REQUEST_TIMEOUT | Timeout (in seconds, as integer) for requests to the Roboflow API. | None |
TRANSIENT_ROBOFLOW_API_ERRORS_RETRIES | Number of times transient errors to the Roboflow API will be retried (only for GET endpoints). | 3 |
TRANSIENT_ROBOFLOW_API_ERRORS_RETRY_INTERVAL | Delay interval of retries of Roboflow API requests (only for GET endpoints). | 3 |
METRICS_ENABLED | Flag to control Roboflow Model Monitoring. | True |
MODEL_VALIDATION_DISABLED | Flag that can make model loading faster by skipping trial inference. | False |
DISABLE_VERSION_CHECK | Flag to disable Inference version check in background thread. | False |
USE_INFERENCE_MODELS | Flag to select inference-models backend. | False |
MAX_INFERENCE_MODELS_CACHE_SIZE_MB | When set with value > 0, enables inference-models cache watchdog to run scheduled cycles of verification for disk space occupied by models and prunes oldest/biggest model artifacts to prevent running out of space. Only applicable when USE_INFERENCE_MODELS=True. | -1 |
INFERENCE_MODELS_CACHE_WATCHDOG_INTERVAL_MINUTES | Frequency of inference-models cache watchdog cycles. Minimum is 15 minutes. | 60 |