A way to "execute workflow with all batches in the Dataset" when a new batch triggers the execution

What problem are you facing?

A common setup in automated end-to-end automation in Data Studio is: when a new batch is added to a multi-batch snapshot, the "dataset refresh" triggers an automation which executes a workflow using that same Dataset as the source.

Despite the workflow's source step being configured to use all batches, the automation executed the workflow only using the latest loaded batch. When executed manually, the workflow runs as expected with all batches.

What impact does this problem have on you/your business?

This is the designed behaviour (see the discussion in this thread) but initially unexpected for users. More importantly, it requires adding some additional complexity into the pipeline to get to the desired behaviour.

Impact: slows down development of automated data pipelines

Do you have any existing workarounds? If so, please describe those.

There are a couple of workarounds described at high level in this post, essentially relying on the dataset refresh not triggering the main workflow execution directly, but triggering an intermediary process which will in turn trigger the target workflow.

Do you have any suggestions to solve the problem? Feel free to add images if this helps.

We are also tracking this via internal ID 259583.

A suggestion is to have a "use all batches" execution option or parameter in the automation's "run published workflow" action (off by default), that can be set when the dataset's source is multi-batch.

Tagged:
2
2 votes

Gathering interest · Last Updated