Experian Data Quality Community

Learn, collaborate and solve problems with like-minded data quality enthusiasts.

Suggest Validation rules - Validate step
Aperture Data Studio has a Validate workflow step that allows you to define rules of how data in specific columns should be populated. This information can then be used to be alerted to any data quality problems or to track data quality improvements over time. These rules can be simple: should not be null should be a…
📋 Index of reusable Functions! 📇
This post simply acts as an index for all the functions shared in this library. If you would like to receive a notification whenever a new function is added, please bookmark this post by clicking on the bookmark icon to the right of the title. Current functions available: 👪 Parse Full Name - A handy set of functions that…
ℹ How to track DQ issues with Issue lists
Issue lists allow you to capture problematic records, assign them to stakeholders and collaborate on a resolution. Records that have been fixed will automatically be resolved. Setting up your Issues list Go to Issue lists in left nav bar and create a new list to track related issues Create a Workflow to write issues to the…
The chaining effect
What is the chaining effect? Find Duplicates in Aperture Data Studio works by identifying blocks or groups of duplicates, then comparing every possible record pair within a block based on rules to determine if they represent a single entity, represented by a cluster ID. Depending on how you have configured the rules, you…
Reviewing Find Duplicates Results
When you preview the Find Duplicates results, you will see the Cluster ID and Match status. Records with the same Cluster ID are identified as duplicates. The Cluster ID is influenced by the blocking keys that determine the record comparisons to be made. However, ultimately the duplicate records and match status is…
Building Rules
In order to start testing the Find Duplicates step, the Find Duplicates settings will also need to have a ruleset defined in addition to the blocking keys. When building rules, we will have to think about the following: How the Ruleset relates to Blocking Keys Blocking keys identifies potential matches. Rules determines if…
Exact match, fuzzy match and de-duplication with Find Duplicates
Hi everyone, I'm starting a series of articles all about matching and linking records to find duplicates in Aperture Data Studio with the intention to encourage some learning and interaction. Start here: Why worry about duplicated data? Simple ways to identify and resolve duplicated data in Aperture Data studio: Exact…
👋 Introduction to the Functions Library 📂
Intro This area of the Community hosts our library of re-usable functions for use in Aperture Data Studio. Whilst Aperture Data Studio comes with a wealth of native functions out-of-the-box, this area of Community has been established to provide a reference library of further re-usable functions that you can easily add to…
Can I integrate Aperture with Collibra
Is it possible to Integrate Aperture to Collibra? And then showcase the Rules created in the Aperture, display the same in Collibra Dashboard?
How do we expand on the provided base blocking keys and rule sets on Aperture Data Studio?
Hi team I am asking this question on behalf of one of our Credit Services team members. They are looking to create Blocking keys and Rules that accommodate for a text string (Drivers License) and date of birth, but is running into issues. The rules that they are currently using are part of the attachments. Is there an easy…
Find Duplicates Language for Notepad++
If you're using Notepad++ as your text editor, it can be helpful to have an interpreter to highlight key words for matching rules for Find Duplicates and have auto completion suggestions. Attached is are two xml files which allow for this. Adding Language To add the new language option, open Notepad++ and select Language…
Offensive Words 🤬
Summary The functions contained within this package all relate to flagging and dealing with data containing offensive language. All of these functions uses a domain of offensive words (contained within the .dmxd package) which contains a list of known offensive terms. Note: this data is open source and originates from:…
Proper Case Surname 📛
Summary This package contains 2x functions which help with contact data: Proper Case Surname and Validate Surname Casing. Proper Case Surname As the name suggests this function, produces a proper-cased Surname for a given input, taking into account some key exceptions including: Irish & Scottish names (e.g. O'Neil and…
Using a discriminant for Find Duplicates clustering
Find Duplicates will use fuzzy matching to link records, however, in some cases you may have a discriminant field you wish to use to break clusters. In this post we cover how to make use of these within a workflow. The simple scenario In this scenario, we are processing transaction records and matching on name, mailing…

Events

Data Governance Masterclass
Join us in London on Thursday, 10th July, for an exclusive in-person Data Governance Masterclass featuring Nicola Askham – The Data Governance Coach. This event is designed for data leaders like you who are looking to enhance their governance strategies, gain insights from industry experts, and connect with peers through…

Top contributors

Mahulima 287

James T Sidebotham 212

M.Lambert 136

Uma 84