Workflow Profiling
This page covers profiling techniques and performance analysis for Workflow execution.
For benchmark data comparing direct inference vs. Workflow-wrapped inference, see the benchmarks page.
When optimizing Workflow performance, consider:
- GPU vs. CPU bound operations: Model inference typically runs on the GPU, while Workflow overhead (graph scheduling, input preparation, output routing) is CPU-bound. The overhead stays relatively constant regardless of GPU speed.
- Parallel execution: The Execution Engine runs independent steps in parallel by default. Ensure your workflow graph takes advantage of this by structuring independent branches where possible.
- Batch processing: Blocks that process batches at once (using
get_parameters_accepting_batches()) can significantly improve throughput for GPU-accelerated operations. - Memory considerations: When processing video or large batches, be mindful of memory usage. The Execution Engine maintains indices and caches for all batch-oriented data points.