Workflows Benchmarks

This page compares direct model inference latency versus Workflow-wrapped inference latency for popular detection models. The goal is to quantify the overhead introduced by the Workflows execution engine.

Self-Hosted Results

All times are in milliseconds. Each model was warmed up before timing, then measured over 10 iterations.

Model	Avg Direct (ms)	Stddev Direct	Avg Workflow (ms)	Stddev Workflow
rfdetr-nano	23.86	0.35	29.99	2.04
rfdetr-small	29.50	0.20	31.36	0.36
rfdetr-medium	33.97	0.54	34.78	0.23
rfdetr-large	40.96	0.38	46.00	1.75
rfdetr-xlarge	73.37	0.26	74.41	0.13
yolo26n-640	7.68	0.04	9.61	0.43
yolo26s-640	10.10	0.08	12.48	1.04
yolo26m-640	16.43	0.05	18.10	0.58
yolo26l-640	18.50	0.09	20.01	0.38
yolo26x-640	30.46	0.14	33.51	0.17

Key Takeaways

Workflow overhead is minimal -- typically 1-5 ms on top of direct inference, depending on the model.
For larger models (e.g. rfdetr-xlarge), the workflow overhead becomes negligible relative to model inference time (~1.4%).
For smaller, faster models (e.g. yolo26n-640), the overhead is proportionally larger (~25%) but still under 2 ms in absolute terms.
Workflow overhead is CPU-bound (graph scheduling, input preparation, output routing), while model inference itself typically runs on the GPU (when available). As a result, the overhead stays relatively constant regardless of GPU speed.

Methodology

GPU-accelerated inference (results will vary by hardware).
Warmup: Each model and workflow engine is warmed up with one inference call before timing begins.
Iterations: 10 timed iterations per method, per model.
Direct inference: Uses get_model() and calls model.infer() directly.
Workflow inference: Wraps the same model in a minimal single-step workflow and runs it through the Execution Engine.

Cloud-Hosted Results

The following results were benchmarked on the hosted Serverless API. Some notes:

Compared to self-hosted results, Serverless server also needs to fetch the Workflow schema and model weights (whereas Serverless model inference only fetches weights).
There is a static 10-50ms latency overhead for Workflows.