♻ Reusable workflow example – calculate word frequency

Danny RodenDanny Roden Administrator
edited August 10 in Resources

Today I’d like to share with you a reusable workflow that I built a little while ago not only as a downloadable .dmx file (see below) for you to explore/use as you wish, but also as a means of demonstrating one of the key themes of Data Studio…reusability.

If you find that there are duplicate tasks being executed on your instance of Data Studio, I’d encourage you to step back and see if there’s an opportunity to simplify (and standardise) by using one of the following approaches:

  • Creating a view against a dataset (and sharing it) to standardise data prep tasks, including:
    • Column renaming
    • Data anonymisation
    • Grouping and aggregate calculations
    • Custom sort logic, sampling and filters
    • Basic transformations
  • Saving (and sharing) your transformation function for reuse (effectively ‘mastering’ it as a user defined function for common reference)
  • And for templating broader processing logic reusable workflows are a really powerful feature of Data Studio which unlock a world of options for better governing how data processing tasks are executed consistently across the enterprise.

For the purposes of this post, I’m going to focus on a relatively simple example, but note that there are additional features which can help make light work of common processing tasks. If you’re looking to explore efficiencies on your Data Studio usage, please do get in touch with us here on Community.

About reusable workflows

The first thing to note is that reusable workflows are built against a schema of data (i.e. a ‘base’ table of data which is effectively the ‘master’ reference). If you’re building a new reusable workflow, I’d encourage you to prep this ‘base schema’ first, with only the column(s) in that you’ll need using a sensible naming convention – as well as some suitable test data to allow you to build/test the logic.

For this example, I’m focusing on building a workflow that will analyse the word frequency within a single column therefore I’m going to build the workflow against a source with only a single column of input data (as I’m not going to actually be using anything else) and I’m expecting this workflow to be executed against all manner of different fields so I’ve given the ‘base schema’ a suitably generic name:

Within the reusable workflow, if you want to apply the processing to a new dataset, you’ll need to check the ‘can supply source when executed button’.

Next you’ll want to create the workflow to do what it needs to (using your sample data in the ‘base schema’) but for any outputs to want to expose you’ll need to use the ‘output’ step to allow the reusable workflow to pass the data on to subsequent steps when referenced.

Finally, you’ll want to ensure the workflow has the ‘can be used in other workflows’ option checked to allow this to be referenced in other workflows (like a new ‘step’).

Once you’ve got this setup, you’ll be able to reference your workflow in another one within the same space for testing purposes.

If you’re happy with this and want to share more widely then you can publish the workflow and share it with other spaces so as this processing can be consistently referenced/used by your colleagues.

At this point I’d encourage you to suitably document/explain the process so that other users can easily understand what this does, how it works, who created it and access meaningful release notes (as new versions of it are published).

And if you want to have a look at the workflow I’ve used for this post, you can download it below and explore for yourself in your instance of Data Studio (provided you are running at least version 2.11.9, if not you can get the latest release here)

This workflow is intended to be used for analysing data with no more than 20 words per cell (E.g. contact/account/product data, rather than lengthy free-text fields). This was originally designed to help identify standard ‘business terms’ in company/account fields to aid matching (e.g. identifying terms like ‘plc’ ‘limited’ ‘llc’ etc), but has also been used to support analysis and transformation of product name, job title and other fields too.

If this workflow was of use to you, please share your use case below and as always, drop a comment if you have any questions/ideas.



Sign In or Register to comment.