Assign latest changed id to all linked records
Hi,
I have a scenario where input file is providing changed address ids in from-address and to_address columns. Need to identify the links between the records using these 2 fields and assign latest id to all the linked records.
Input:
event_date,from_address_id,to_address_id
20210223,120,160
20210402,120,160
20180602,110,120
20170109,200,260
20170104,105,110
20210714,100,100
20151027,100,125
Output should have -
address_id = from_address_id/to_address_id.
event_date = latest event_date of the address_id
latest_address_id = latest to_address_id from set of address changes
Below should be output for sample input records given above.
address_id,event_date,latest_address_id
160,20210402,160
120,20210402,160
110,20180602,160
105,20170104,160
260,20170109,260
200,20170109,260
125,20151027,100
100,20210714,100
Can anyone help to derive the logic in Aperture.
Comments
-
You want to use the same source twice, manipulate the columns then Union to combine/stack the to_ and from_ columns
Then Lookup the event_date and from_address against the original data to return the last to_address:
[Edit] Realised you also want the latest date for each id, which you could do by adding a second Definition to the Lookup step. Might be able to do this in fewer steps, but hopefully this helps
1 -
Hi josh,
Thanks for the response.
Earlier, I can able to create records for all ids with latest event_date by using columns-to-rows and group by functions.
Now tried for latest_address_id for each address_id as you suggested above but i don't get the desired output.
After union , i tried with below lookup details,
I ended up with below result
Can you please let me know the output from lookup and merge and hide columns
0 -
OK I had another look at your output and what you are trying to achieve. (Unsure how you would get the example penultimate row but ignoring that) I think you should Group to find the latest date then lookup the to_ value that matches the from AND date. Rename columns and Union the outputs
0 -
To clarify the use case again, consider below records from the example
event_date, from_id → to_id
20210223,120 -> 160
20210402,120 -> 160
20180602,110 -> 120
20170104,105 -> 110In above example, each records is linked with previous record. Latest record is observed on 20210402 and to_id in that record is 160.
So 160 should be assigned to all the ids. I don't see this is covered up in the implementation you given. Can you please check into it.0 -
I can able to get the latest_id upto 2 levels of changes by multiple joins between from_id and to_ids. But there might be many links and difficult to assign latest id to all the linked Ids.
In above result, I achieved to assign 160 for 2 changes from latest event to old event (120 →160, 110 → 120). But to get for 105 (change from 105 → 110), again i need to traverse to 2nd record (id=120) and get the latest id).
Any suggestions.
0