Video Processing with Workflows
We have begun our journey into video processing using Workflows. Over time, we have expanded the number of video-specific blocks — including object tracker blocks for ByteTrack, SORT, and OC-SORT — and continue to dedicate efforts toward improving their performance and robustness. The current state of this work is as follows:
We have introduced the
WorkflowVideoMetadatainput to store metadata related to video frames, including declared FPS, measured FPS, timestamp, video source identifier, and file/stream flags. While this may not be the final approach for handling video metadata, it allows us to build stateful video-processing blocks at this stage. If your Workflow includes any blocks requiring input of kindvideo_metadata, you must define this input in your Workflow. The metadata functions as a batch-oriented parameter, treated by the Execution Engine in the same way asWorkflowImage.The
InferencePipelinesupports video processing with Workflows by automatically injectingWorkflowVideoMetadatainto thevideo_metadatafield. This allows you to seamlessly run your Workflow using theInferencePipelinewithin theinferencePython package.We have initiated efforts to enable video processing management via the
inferenceserver API. This means that eventually, no custom scripts will be required to process video using Workflows andInferencePipeline. You will simply call an endpoint, specify the video source and the workflow, and the server will handle the rest — allowing you to focus on consuming the results.
Video Management API
This is an experimental feature, and breaking changes may be introduced over time. There is a list of known issues. Please visit the page to raise new issues or comment on existing ones.
Using the API
from inference_sdk import InferenceHTTPClient
client = InferenceHTTPClient(
api_url="http://192.168.0.115:9001",
api_key="<ROBOFLOW-API-KEY>"
)
# list active pipelines
client.list_inference_pipelines()
# start processing - single stream
client.start_inference_pipeline_with_workflow(
video_reference=["rtsp://192.168.0.73:8554/live0.stream"],
workspace_name="<YOUR-WORKSPACE>",
workflow_id="<YOUR-WORKFLOW-ID>",
results_buffer_size=5,
)
# start processing - one RTSP stream and one camera
client.start_inference_pipeline_with_workflow(
video_reference=["rtsp://192.168.0.73:8554/live0.stream", 0],
workspace_name="<YOUR-WORKSPACE>",
workflow_id="<YOUR-WORKFLOW-ID>",
batch_collection_timeout=0.05,
)
# get pipeline status
client.get_inference_pipeline_status(
pipeline_id="182452f4-a2c1-4537-92e1-ec64d1e42de1",
)
# pause pipeline
client.pause_inference_pipeline(
pipeline_id="182452f4-a2c1-4537-92e1-ec64d1e42de1",
)
# resume pipeline
client.resume_inference_pipeline(
pipeline_id="182452f4-a2c1-4537-92e1-ec64d1e42de1",
)
# terminate pipeline
client.terminate_inference_pipeline(
pipeline_id="182452f4-a2c1-4537-92e1-ec64d1e42de1",
)
# consume pipeline results
client.consume_inference_pipeline_result(
pipeline_id="182452f4-a2c1-4537-92e1-ec64d1e42de1",
excluded_fields=["workflow_output_field_to_exclude"]
)
Previewing Video Output
The client presented above can be used to preview workflow outputs. Assuming that the workflow you defined runs an object detection model and renders its output using Workflows visualization blocks that register the output image in the preview field, you can use the following script to poll and display processed video frames:
import cv2
from inference_sdk import InferenceHTTPClient
from inference.core.utils.image_utils import load_image
client = InferenceHTTPClient(
api_url="http://127.0.0.1:9001",
api_key="<YOUR-API-KEY>",
)
while True:
result = client.consume_inference_pipeline_result(pipeline_id="<PIPELINE-ID>")
if not result["outputs"] or not result["outputs"][0]:
continue
source_result = result["outputs"][0]
image, _ = load_image(source_result["preview"])
cv2.imshow("frame", image)
cv2.waitKey(1)