Select first three records in each cluster

Luke · August 2022

Hi

After running a Find Duplicates step at individual level I can see that we have many contact names for some business, so I want to cap these to three contacts for each business address.

I'm thinking I should run another Find Duplicates step with the Household step setting to get clusters of businesses, but from there I'm not sure how to select just three from each cluster. In SQL Server I have used row_number over partition to achieve a similar goal, does anyone know how I might do this in Aperture please?

Thanks

Luke

Josh Boxer · August 2022

Hi Luke

You can add the row number using the Function 'Current Row' https://docs.experianaperture.io/data-quality/aperture-data-studio-v2/get-started/create-functions/#dynamic-reference~native-functions

There is a recent post here that might be useful:

https://community.experianaperture.io/discussion/837/how-to-determine-nth-occurence-in-a-column

There are a couple of different approaches to append/calculate and 'Occurrence' column

Once that column is calculated you can filter to only include Occurrences less than N

Luke · August 2022

Hi Josh

With some ideas from the post I was able to do this...

Many Thanks

Luke

Select first three records in each cluster

Answers

Categories