I have a significantly large file over 5 million records. I wanted to see if there is a way to drill down in the file to see if there are any duplicate customer account numbers existing in the file
There are a few ways to do this, but the most straightforward is a Profile, which returns the Uniqueness for each column. Profile output is interactive allowing drilldown into the underlying Values where the uniqueness is less than 100%. You can then sort by the 'Row count' column to see the most frequently duplicated values.
If this is a recurring issue you want to identify in a Workflow then a Group step will count the number of rows, add your 'account number' column to count by this, then filter count >1 for any duplicates.
Finally a more advanced tool is Relationships which can check that each account number always returns the same customer with an interactive output showing any conflicting values
Josh, this has been very helpful. I really appreciate the options offered. Thank you again!