Finding earliest date in a cluster
Hi
I have multiple rows for a customer and each row has a modified on date. I want to find the earliest date (most recent date) in the cluster.
Eg
Clusterid modified date
1 01-01-1999
1 02-01-2009
1 03-08-2009
1 04-09-2008
2 01-01-2020
2 01-01-2019
Therefore for cluster 1 give me the full row based on this modified date - 03-08-2009
For cluster 2, full row for the one with 01-01-2020
is there an easy way to flag the row with the most recent in the cluster and then just select that row for each cluster?
Please advise
thanks for your assistance
Carolyn
0
Answers
@Carolyn there are a couple of wasy to do this. One way is to perform a grouping on the records selecting the minimum value of the date as an aggregate, that will give you the oldest date, you can then join that back to the Find Duplicates results.
If you don't want to use grouping then you can use the Harmonize step configuration to select the record with oldest date as the survivor and then use Harmonize merge rules to choose the attributes from the related cluster records to derive the best blended record.
this takes me from this
to this
you can see for cluster1 that the earliest suggested date is now 01 May 1929 and for cluster2 it is 14 Sept 1935
Within the grouping system i am using minimum (as long as the date is defined as a date)