Getting started with Regular Expressions in Data Studio
A regular expression (from here on we’ll call them “regex” for short) is a well-known and powerful syntax for describing a search pattern in a simple string of text.
You can think of regex as a way of defining advanced wildcards: In Windows file explorer you might search for
*.txt to return all txt files, or in SQL you might search
WHERE SALARY LIKE '200%' to find salary values starting with “200”. Regex lets you define much more complicated searches using a similar approach.
Data Studio has four functions that work specifically with regex, as well as global library for storing and managing them, which means you can easily harness the power of regex for your project.
Before we look at an example where regex is used in Data Studio, here’s a couple of tips:
- Use an online regex tester to check that your expression does what you expect it to do with a range of different values. There are lots of great regex testers and debuggers out there, my favourite is Regex101
- Regex is powerful but can also be complicated. Any regex search could also be built using Data Studio’s standard comparison functions Equals, Starts With, Contains, and so on. Regex can be a more elegant and rapid way to define a search but it isn’t always the right approach!
- In the examples below, the regex is defined in the function itself. That’s OK, but if you plan to re-use an expression, or to need to maintain it over time, we recommend defining it as a constant value in the Glossary (you’ll find this in the Glossary, where there are already lots of useful regular expressions).
There are several ways to use regex in Data Studio. Let’s take a look at the most common scenario with a quick worked example: Using Compare to check whether values in a column of data match our regex. This is the most common way to use a regex in Data Studio.
In this example, we have a column of IDs:
We want to check which values conform to our expected format, which in this case is:
- The first character must be “A” or “B” (uppercase)
- The second and third characters must be “10” or “11”
- The fourth character must be a dash (hyphen)
- The fifth character must be “E” (uppercase) or a numeric value between 1 and 5
- The sixth character must be any numeric value between 0 and 9
This set of criteria is complex, but can be defined by the single regex expression:
Applying this regex in the Compare function of a workflow Transform step would be done by pasting the regex into the Compare function like this:
And the result of our new "Check Id Format" column on a sample of data would be: