Video Processing with Workflows

We have begun our journey into video processing using Workflows. Over time, we have expanded the number of video-specific blocks — including object tracker blocks for ByteTrack, SORT, and OC-SORT — and continue to dedicate efforts toward improving their performance and robustness. The current state of this work is as follows:

We have introduced the WorkflowVideoMetadata input to store metadata related to video frames, including declared FPS, measured FPS, timestamp, video source identifier, and file/stream flags. While this may not be the final approach for handling video metadata, it allows us to build stateful video-processing blocks at this stage. If your Workflow includes any blocks requiring input of kind video_metadata, you must define this input in your Workflow. The metadata functions as a batch-oriented parameter, treated by the Execution Engine in the same way as WorkflowImage.
The InferencePipeline supports video processing with Workflows by automatically injecting WorkflowVideoMetadata into the video_metadata field. This allows you to seamlessly run your Workflow using the InferencePipeline within the inference Python package.
We have initiated efforts to enable video processing management via the inference server API. This means that eventually, no custom scripts will be required to process video using Workflows and InferencePipeline. You will simply call an endpoint, specify the video source and the workflow, and the server will handle the rest — allowing you to focus on consuming the results.

Video Management API

This is an experimental feature, and breaking changes may be introduced over time. There is a list of known issues. Please visit the page to raise new issues or comment on existing ones.

Using the API

from inference_sdk import InferenceHTTPClient

client = InferenceHTTPClient(
    api_url="http://192.168.0.115:9001",
    api_key="<ROBOFLOW-API-KEY>"
)

# list active pipelines
client.list_inference_pipelines()

# start processing - single stream
client.start_inference_pipeline_with_workflow(
    video_reference=["rtsp://192.168.0.73:8554/live0.stream"],
    workspace_name="<YOUR-WORKSPACE>",
    workflow_id="<YOUR-WORKFLOW-ID>",
    results_buffer_size=5,
)

# start processing - one RTSP stream and one camera
client.start_inference_pipeline_with_workflow(
    video_reference=["rtsp://192.168.0.73:8554/live0.stream", 0],
    workspace_name="<YOUR-WORKSPACE>",
    workflow_id="<YOUR-WORKFLOW-ID>",
    batch_collection_timeout=0.05,
)

# get pipeline status
client.get_inference_pipeline_status(
    pipeline_id="182452f4-a2c1-4537-92e1-ec64d1e42de1",
)

# pause pipeline
client.pause_inference_pipeline(
    pipeline_id="182452f4-a2c1-4537-92e1-ec64d1e42de1",
)

# resume pipeline
client.resume_inference_pipeline(
    pipeline_id="182452f4-a2c1-4537-92e1-ec64d1e42de1",
)

# terminate pipeline
client.terminate_inference_pipeline(
    pipeline_id="182452f4-a2c1-4537-92e1-ec64d1e42de1",
)

# consume pipeline results
client.consume_inference_pipeline_result(
    pipeline_id="182452f4-a2c1-4537-92e1-ec64d1e42de1",
    excluded_fields=["workflow_output_field_to_exclude"]
)

Previewing Video Output

The client presented above can be used to preview workflow outputs. Assuming that the workflow you defined runs an object detection model and renders its output using Workflows visualization blocks that register the output image in the preview field, you can use the following script to poll and display processed video frames:

import cv2
from inference_sdk import InferenceHTTPClient
from inference.core.utils.image_utils import load_image

client = InferenceHTTPClient(
    api_url="http://127.0.0.1:9001",
    api_key="<YOUR-API-KEY>",
)

while True:
    result = client.consume_inference_pipeline_result(pipeline_id="<PIPELINE-ID>")
    if not result["outputs"] or not result["outputs"][0]:
        continue
    source_result = result["outputs"][0]
    image, _ = load_image(source_result["preview"])
    cv2.imshow("frame", image)
    cv2.waitKey(1)