Can I identify duplicate customer account numbers in a file?

Brian B · August 2023

I have a significantly large file over 5 million records. I wanted to see if there is a way to drill down in the file to see if there are any duplicate customer account numbers existing in the file

Josh Boxer · August 2023

Hi Brian

There are a few ways to do this, but the most straightforward is a Profile, which returns the Uniqueness for each column. Profile output is interactive allowing drilldown into the underlying Values where the uniqueness is less than 100%. You can then sort by the 'Row count' column to see the most frequently duplicated values.

https://docs.experianaperture.io/data-quality/aperture-data-studio-v2/create-a-single-customer-view-scv/discover-and-profile-data/

If this is a recurring issue you want to identify in a Workflow then a Group step will count the number of rows, add your 'account number' column to count by this, then filter count >1 for any duplicates.

https://docs.experianaperture.io/data-quality/aperture-data-studio-v2/get-started/create-a-workflow/#group

Finally a more advanced tool is Relationships which can check that each account number always returns the same customer with an interactive output showing any conflicting values

https://docs.experianaperture.io/data-quality/aperture-data-studio-v2/improve-data-quality/analyze-relationships/

Brian B · August 2023

Josh, this has been very helpful. I really appreciate the options offered. Thank you again!

Can I identify duplicate customer account numbers in a file?

Best Answer

Answers

Categories