Data Science now Supports Data Flow in ML Pipelines

The Data Flow Support feature in ML Pipelines lets users integrate Data Flow Applications as steps within a pipeline.

With this new functionality, users can orchestrate the runs of Data Flow Applications (Apache Spark as a Service) alongside other steps in an ML Pipeline, streamlining large-scale data processing tasks.

When a pipeline containing a Data Flow step is run, it automatically creates and manages a new run of the Data Flow Application associated with that step. The Data Flow run is treated the same as any other step in the pipeline. When successfully completed, the pipeline continues its run, starting later steps as part of the pipeline's orchestration.

For more information, see the Data Science documentation on Data Flow in ML Pipelines.