Workflows Benchmarks

This page compares direct model inference latency versus Workflow-wrapped inference latency for popular detection models. The goal is to quantify the overhead introduced by the Workflows execution engine.

Self-Hosted Results

All times are in milliseconds. Each model was warmed up before timing, then measured over 10 iterations.

ModelAvg Direct (ms)Stddev DirectAvg Workflow (ms)Stddev Workflow
rfdetr-nano23.860.3529.992.04
rfdetr-small29.500.2031.360.36
rfdetr-medium33.970.5434.780.23
rfdetr-large40.960.3846.001.75
rfdetr-xlarge73.370.2674.410.13
yolo26n-6407.680.049.610.43
yolo26s-64010.100.0812.481.04
yolo26m-64016.430.0518.100.58
yolo26l-64018.500.0920.010.38
yolo26x-64030.460.1433.510.17

Key Takeaways

  • Workflow overhead is minimal -- typically 1-5 ms on top of direct inference, depending on the model.
  • For larger models (e.g. rfdetr-xlarge), the workflow overhead becomes negligible relative to model inference time (~1.4%).
  • For smaller, faster models (e.g. yolo26n-640), the overhead is proportionally larger (~25%) but still under 2 ms in absolute terms.
  • Workflow overhead is CPU-bound (graph scheduling, input preparation, output routing), while model inference itself typically runs on the GPU (when available). As a result, the overhead stays relatively constant regardless of GPU speed.

Methodology

  • GPU-accelerated inference (results will vary by hardware).
  • Warmup: Each model and workflow engine is warmed up with one inference call before timing begins.
  • Iterations: 10 timed iterations per method, per model.
  • Direct inference: Uses get_model() and calls model.infer() directly.
  • Workflow inference: Wraps the same model in a minimal single-step workflow and runs it through the Execution Engine.

Cloud-Hosted Results

The following results were benchmarked on the hosted Serverless API. Some notes:

  • Compared to self-hosted results, Serverless server also needs to fetch the Workflow schema and model weights (whereas Serverless model inference only fetches weights).
  • There is a static 10-50ms latency overhead for Workflows.