Workflow Profiling

This page covers profiling techniques and performance analysis for Workflow execution.

For benchmark data comparing direct inference vs. Workflow-wrapped inference, see the benchmarks page.

When optimizing Workflow performance, consider:

GPU vs. CPU bound operations: Model inference typically runs on the GPU, while Workflow overhead (graph scheduling, input preparation, output routing) is CPU-bound. The overhead stays relatively constant regardless of GPU speed.
Parallel execution: The Execution Engine runs independent steps in parallel by default. Ensure your workflow graph takes advantage of this by structuring independent branches where possible.
Batch processing: Blocks that process batches at once (using get_parameters_accepting_batches()) can significantly improve throughput for GPU-accelerated operations.
Memory considerations: When processing video or large batches, be mindful of memory usage. The Execution Engine maintains indices and caches for all batch-oriented data points.