Looping workflows

Danny Roden
Danny Roden Administrator
edited December 3 in Reusable Workflows

About the requirement

Recently I’ve been asked a couple of times if it’s possible to cycle through a workflow for every value within a given column. This is mainly to drive 2x types of use case:

  1. Dynamic data export – effectively taking an input dataset and splitting it out into a separate file for each subset of data (as determined by a value in a chosen column, e.g. create a separate export of sales data for each region)
  2. Context specific validation results – this use case is about evaluating subsets of data within a broader dataset against a series of consistent rules (e.g. so you can breakdown the DQ results by country/region/team/business unit or any other value).

In both these use cases (and in others) the main pre-requisite for this below approach to work is that you have the ‘loop’ value defined consistently within a single column (e.g. supplier, country, team etc).

Assuming you’ve got your data in a ready state, or you’re comfortable doing so using Data Studio, then you’re ready to begin this tutorial.

Note

Before we begin, I want to note that this is a reasonably advanced use of the product and touches upon several areas and concepts that may be new to you including: automations, custom events, workflow parameters and multi-batch snapshots. However don’t let that put you off, this is simply an opportunity to learn more about the power of Aperture Data Studio and further your skills as a workflow designer.

Scenario context

For this walkthrough the task is looking at the ‘customer’ training dataset (you may be familiar with this from training/eLearning courses). And what I’d like to do is evaluate each subset of my data against a consistent set of DQ rules (in a validate step), where my data results are split out based on the discount_code for the customer:

So the target results I’m seeking to create will look like this:

And crucially I don’t want to have to manually split out my workflow into however many separate streams for each discount code (there could be loads and it could be something I want to re-run in the future where further discount code values could be supplied). So, I need to create an automation that’ll dynamically do this for me.

Solution Overview

This I’m walking through in this post essentially spans across 3x workflows:

And I’ve included a download link below so you can load this into your Data Studio environment to have a play with for yourself:

Note: I’d recommend ‘synchronizing’ this .dmxd into a new Space for you to explore for optimal setup (as that retains the 'published' status of the workflows which is crucial for some of the automations).

Workflow 1

The first workflow selects the distinct values from source that you want to ‘loop’ and saves these to a snapshot (which is to be treated as ‘memory’ for this automation):

The snapshots containing the ‘loop values’ needs to be setup as a ‘single batch’ snapshot to ensure that it’s contents can be updated (which we’ll do later on)

And right at the end of this first workflow, a ‘Fire Event’ step is used to trigger the next one. Here I’ve used a custom event called ‘Execute Looping Workflows’.

The events themselves are defined within the ‘automations’ area of Data Studio and this event is a simple one and only needs a name:

Events can then be used to trigger Automations, and I’ve used this event to trigger an automation (with the same name) that runs Workflow 2:

Workflow 2

Moving on to Workflow 2, this essentially selects the top value from the loop values list and triggers the execution of Workflow 3. It also then removes this value from the working memory ahead of next run (so the next value can be selected).

As you can see from above a split step is used to separate the top row of the ‘loop values’ (e.g. where current row = 1) and then a Fire Event step is used to trigger another custom event (see below) to execute Workflow 3 where this top loop value is passed through as event data to populate the workflow parameter used when executing workflow 3.

The remaining values (i.e. without the one used for the above execution) are then written back to overwrite the contents of the ‘loop values’ snapshot (thus reducing the row count by 1).

The custom event in this situation is configured to require a parameter to be defined (as this is what will be passed through to the final workflow as event data).

The associated automation that this event triggers then runs workflow 3 providing the value from the specified column at run-time:

Workflow 3

This final piece of the puzzle is the looping workflow to be re-executed over and over until no loop values remain.

This workflow simply requires a workflow parameter to be set when it is executed (this is provided from Workflow 2 for each iteration) and is used to both define the filter at the beginning and then also to explicitly populate a label in the resulting snapshot (however this could also be used to populate a file name if an export step is used).

The workflow in this specific example runs some DQ validation against the selected subset of the data (e.g. clients sharing a common discount code) and then saves the results to a multi-batch snapshot which essentially unions the results together (rather than overwriting the contents each time).

Finally at the end of the workflow a Fire Event step is used to trigger the ‘Execute Looping Workflows’ event to run Workflow 2 again (just like we did from Workflow 1) and thus the workflows re-run until all values have been exhausted.

Putting it all together

So now that you understand how it all works, the execution of the looping is simply instigated by running workflow 1 (everything else is then fully automated regardless of how many loop values there are):

You’ll see this nicely in the Jobs screen:

And after all values have been exhausted and the workflows have stopped running, you’ll end up with a Snapshot containing the full set of results:

Further thoughts

If your use case is for looping through a workflow to dynamically create N-number of export files based upon a value in a given column (e.g. split my product data lines by division) you can use the workflow parameter to append the ‘division’ within the name of the export file and use this approach exactly the same.

Please note that this method may not be appropriate for datasets where you have 100s of values to loop for, however I hope this helps provide some efficiencies for scenarios where you have a few dozen to cycle through and that this provides you with a chance to learn a little more about the product.

If you found this useful, have other uses for this technique or any further ideas on this I’d love to hear about them in the comments below.

Comments

  • Thanks Danny. Very interesting, I will give it a try (note that Workflow 3 failed due to offensive words and valid characters functions not existing). However, seems like a lot of configuration and work-around for what would be a great built-in function/feature of Aperture.

  • Danny Roden
    Danny Roden Administrator

    @James T Sidebotham thanks for the comment and the feedback. I've just updated the post with an updated .dmxd which removes the references to the functions which in my scenario were housed in another space.)

    Please let me know how you get on!

    Note: if you're interested in learning more about those functions which I have removed, they are also available on the Community site within the Functions Library (Invalid Characters for Name & Offensive Words checker)