Compilation of Workflow Definition

Compilation is a process that takes a document written in a programming language, checks its correctness, and transforms it into a format that the execution environment can understand.

A similar process happens in the Workflows ecosystem whenever you want to run a Workflow Definition. The Workflows Compiler performs several steps to transform a JSON document into a computation graph, which is then executed by the Workflows Execution Engine.

Note

This document covers the design of Execution Engine v1 (which is the current stable version). Please acknowledge information about versioning to understand the Execution Engine development cycle.

Stages of Compilation

Workflow compilation involves several stages:

  1. Loading available blocks: Gathering all the blocks that can be used in the workflow based on configuration of the execution environment.
  2. Compiling dynamic blocks: Turning dynamic blocks definitions into standard Workflow Blocks.
  3. Parsing the Workflow Definition: Reading and interpreting the JSON document that defines the workflow, detecting syntax errors.
  4. Building Workflow Execution Graph: Creating a graph that defines how data will flow through the workflow during execution and verifying Workflow integrity.
  5. Initializing Workflow steps from blocks: Setting up the individual workflow steps based on the available blocks, steps definitions and configuration of the execution environment.

Workflows Blocks Loading

As described in the blocks bundling guide, a group of Workflow blocks can be packaged into a workflow plugin. A plugin is essentially a standard Python library that, in its main module, exposes specific functions allowing Workflow Blocks to be dynamically loaded.

The Workflows Compiler and Execution Engine are designed to be independent of specific Workflow Blocks, and the Compiler has the ability to discover and load blocks from plugins.

Roboflow provides the roboflow_core plugin, which includes a set of basic Workflow Blocks that are always loaded by the Compiler, as both the Compiler and these blocks are bundled in the inference package.

For custom plugins, once they are installed in the Python environment, they need to be referenced using an environment variable called WORKFLOWS_PLUGINS. This variable should contain the names of the Python packages that contain the plugins, separated by commas.

export WORKFLOWS_PLUGINS="numpy_plugin,pandas_plugin"

Compilation of Dynamic Blocks

Note

The topic of dynamic Python blocks is covered in a separate docs page. To understand the content of this section you only need to know that there is a way to define Workflow Blocks in-place in a Workflow Definition — specifying both block manifest and Python code in a JSON document.

The Workflows Compiler can transform Dynamic Python Blocks, defined directly in a Workflow Definition, into full-fledged Workflow Blocks at runtime. Once this process is complete, the dynamic blocks are added to the pool of available Workflow Blocks.

Parsing Workflow Definition

Once all Workflow Blocks are loaded, the Compiler retrieves the manifest classes for each block. These manifests are pydantic data classes that define the structure of step entries in definitions. At this parsing stage, errors with Workflow Definitions are alerted, for example:

  • Usage of non-existing blocks
  • Invalid configuration of steps
  • Lack of required parameters for steps

Building Workflow Execution Graph

Building the Workflow Execution graph is the most critical stage of Workflow compilation.

Adding Vertices

First, each input, step, and output are added as vertices in the graph, with each vertex given a special label for future identification.

Adding Edges

After placing the vertices, edges are created between them based on the selectors defined in the Workflow. The Compiler examines the block manifests to determine which properties can accept selectors and the expected "kind" of those selectors. This enables the Compiler to detect errors such as:

  • Providing an output kind from one step that doesn't match the expected input kind of the next step.
  • Referring to non-existent steps or inputs.

Structural Validation

Once the graph is constructed, the Compiler checks for structural issues like cycles to ensure the graph can be executed properly.

Data Lineage Verification

Finally, data lineage properties are populated from input nodes and carried through the graph. Lineage is a list of identifiers that track the creation and nesting of batches through the steps, determining:

  • The source path of data.
  • dimensionality level of data.
  • Compatibility of different pieces of data that may be referred by a step.

Each time a new nested batch is created by a step, a unique identifier is added to the lineage of the output. This allows the Compiler to track and verify if the inputs across steps are compatible.

Note

The fundamental assumption of data lineage is that all batch-oriented inputs are granted the same lineage identifier — so implicitly it enforces all input batches to be fed with data that has corresponding data-points at corresponding positions in batches.

Batch-Orientation Compatibility

As outlined, Workflows define batch-oriented data and scalars. Since the default way for Workflow blocks to deal with the batches is to consume them element-by-element, there is no real difference between batch-oriented data and scalars in such case. The Execution Engine simply unpacks scalars from batches and passes them to each step.

The process may complicate when a block accepts batch input. The block is required to denote each input that must be provided batch-wise. When a violation is detected (for instance a scalar is provided for an input that requires batches or vice versa), an error is raised.

Initializing Workflow Steps from Blocks

When a block requires initialization parameters:

  • The block must declare the parameters it needs (detailed in the blocks development guide).
  • The values for these parameters must be provided from the environment where the Workflow is being executed.

Initialization parameters can be passed to the Compiler in two ways:

  • Explicitly: You provide specific values (numbers, strings, objects, etc.).
  • Implicitly: Default values are defined within the Workflows plugin.

The structure of the keys in explicitly provided init parameters is: {plugin_name}.{init_parameter_name}.

How Parameters Are Resolved

When the Compiler looks for a block's required init parameter, it follows this process:

  1. Exact Match: It first checks the explicitly provided parameters for an exact match to {plugin_name}.{init_parameter_name}.
  2. Default Parameters: If no match is found, it checks the plugin's default parameters.
  3. General Match: Finally, it looks for a general match with just {init_parameter_name} in the explicitly provided parameters.

This mechanism allows flexibility, as some block parameters can have default values while others must be provided explicitly. Additionally, it lets certain parameters be shared across different plugins.