Workflow Profiling

This page covers profiling techniques and performance analysis for Workflow execution.

For benchmark data comparing direct inference vs. Workflow-wrapped inference, see the benchmarks page.

When optimizing Workflow performance, consider:

  • GPU vs. CPU bound operations: Model inference typically runs on the GPU, while Workflow overhead (graph scheduling, input preparation, output routing) is CPU-bound. The overhead stays relatively constant regardless of GPU speed.
  • Parallel execution: The Execution Engine runs independent steps in parallel by default. Ensure your workflow graph takes advantage of this by structuring independent branches where possible.
  • Batch processing: Blocks that process batches at once (using get_parameters_accepting_batches()) can significantly improve throughput for GPU-accelerated operations.
  • Memory considerations: When processing video or large batches, be mindful of memory usage. The Execution Engine maintains indices and caches for all batch-oriented data points.