Obfuscate/unobfuscate

It might be a useful feature to have an 'obfuscation' node to run text fields through an obfuscation algorithm (for display purposes eg in reporting). It might also be useful to be able to reverse this by having an 'unobfuscate' node, using the same algorithm in reverse to allow the original data to be examined for DQ issues
Comments
-
@Nigel Light This will depend on what kind of obfuscation you are looking for, and how you envisioned this in your workflow.
You can actually create a branch of the dataset that contains the text fields you want to obfuscate so you still can have a workflow path with data that is not obfuscated and another path where you can apply a transformation to obfuscate the data.
For example:
Assuming these are the failing rows i need to report and analyse for remediation. I want to obfuscate the Name since it is PII data.
I create a branch and apply a format pattern transformation on the Name column.
I then feed the dataset with the obfuscated Name to a Report with Masked Name.
I leave the other branch of the dataset unobfuscated, feeding to a Report with Unmasked Name.
Report with Masked Name
Report with Unmasked Name
Not sure if this works for you?
0 -
0
-
A question on obfuscating values came up again recently and I though I would take the opportunity to discuss a few techniques available in Data Studio.
There are many different ways to obfuscate data, coming with their own pros and cons depending on what you want to achieve. For example, here's just a few things to consider:
- Should the obfuscation be reversible (I can use a function applied on the obfuscated value to get back to the original)?
- How secure does the obfuscation need to be (how difficult to reverse)?
- Does uniqueness of values need to be preserved (is there a 1:1 mapping between original values and obfuscated values)?
- Do formats / character types need to be maintained in the obfuscation function (e.g. letter → letter, digit → digit)?
- Do value lengths need to be maintained?
The example in Sueann's answer uses the Format Pattern function. Another option would be the From Text To Hex function which converts string input into hexadecimal values:
Looking at how these two functions behave we could summarize as follows:
Feature / Property
Format Pattern
From Text To Hex
Reversible
❌ No
✅ Yes (via "From Hex to Text")
Security Level
⚠️ Moderate (guessable with rare formats)
❌ Low (easy to decode)
Preserves Uniqueness (1:1 mapping)
❌ No
✅ Yes
Preserves Character Types
✅ Yes (e.g., letters → `A`, digits → `9`)
❌ No (all becomes hex digits)
Preserves Length
✅ Yes
❌ No (hex output is longer)
Best For
- Format-preserving masking
- UI display
- Pattern analysis- Reversible obfuscation
- Linking records
- Lightweight reversible encodingExample Output
"AB12CD" → "AA99AA"
"AB12CD" → "414231324344"
Both these examples are OK as a lightweight obfuscation methods but are not suitable for protecting sensitive information.
A much more heavyweight obfuscation would be achieved using the Hash Code function, which generates hashed values based on a range of common algorithms
Hashing with something like SHA3-512 is good at creating non-reversible values that will be resistant to known cryptographic attacks (as of now). But you might consider it a poor choice for these reasons:
Concern
Explanation
🔒 Irreversible
You can't get the original data back—this is a feature for security, but a limitation for reversible obfuscation.
🧩 No format preservation
The output is a fixed-length hexadecimal string, which doesn't resemble the original data.
🧠 No uniqueness guarantee across datasets
Without a consistent salt or key, the same input may not produce the same hash in different contexts.
🧾 No structure retention
You lose all formatting (e.g., email, phone number, etc.).
Some other obfuscation approaches available in Data Studio:
- Nulling out selected values or characters, or simply hiding columns in a View
- e.g. hide or null out premises and street address columns, leaving just city, locality, postal code.
- Partial masking
- e.g. using Regular expression replace to replace certain element
- search value: (?<=.).(?=[^@]*?@)
- Replace value: *
- Deterministic Tokenization: Use regex functions, replace, or lookup functions to replace certain values with others from a defined list
- Use the SDK to create your own steps that obfuscate values just as you want.
- Try the new GenAI function generator to see what it is able to suggest.
0