Assign latest changed id to all linked records

Options
slngupta_1
slngupta_1 Experian Employee

Hi,

I have a scenario where input file is providing changed address ids in from-address and to_address columns. Need to identify the links between the records using these 2 fields and assign latest id to all the linked records.

Input:
event_date,from_address_id,to_address_id
20210223,120,160
20210402,120,160
20180602,110,120
20170109,200,260
20170104,105,110
20210714,100,100
20151027,100,125

Output should have -

address_id = from_address_id/to_address_id.

event_date = latest event_date of the address_id

latest_address_id = latest to_address_id from set of address changes

Below should be output for sample input records given above.

address_id,event_date,latest_address_id
160,20210402,160
120,20210402,160
110,20180602,160
105,20170104,160

260,20170109,260
200,20170109,260

125,20151027,100
100,20210714,100

Can anyone help to derive the logic in Aperture.

Comments

  • Josh Boxer
    Josh Boxer Administrator
    edited July 3
    Options

    You want to use the same source twice, manipulate the columns then Union to combine/stack the to_ and from_ columns

    Then Lookup the event_date and from_address against the original data to return the last to_address:

    [Edit] Realised you also want the latest date for each id, which you could do by adding a second Definition to the Lookup step. Might be able to do this in fewer steps, but hopefully this helps

  • slngupta_1
    slngupta_1 Experian Employee
    Options

    Hi josh,

    Thanks for the response.

    Earlier, I can able to create records for all ids with latest event_date by using columns-to-rows and group by functions.

    Now tried for latest_address_id for each address_id as you suggested above but i don't get the desired output.

    After union , i tried with below lookup details,

    I ended up with below result

    Can you please let me know the output from lookup and merge and hide columns

  • Josh Boxer
    Josh Boxer Administrator
    edited July 4
    Options

    OK I had another look at your output and what you are trying to achieve. (Unsure how you would get the example penultimate row but ignoring that) I think you should Group to find the latest date then lookup the to_ value that matches the from AND date. Rename columns and Union the outputs

  • slngupta_1
    slngupta_1 Experian Employee
    edited July 4
    Options

    To clarify the use case again, consider below records from the example

    event_date, from_id → to_id
    20210223,120 -> 160
    20210402,120 -> 160
    20180602,110 -> 120
    20170104,105 -> 110

    In above example, each records is linked with previous record. Latest record is observed on 20210402 and to_id in that record is 160.
    So 160 should be assigned to all the ids. I don't see this is covered up in the implementation you given. Can you please check into it.

  • slngupta_1
    slngupta_1 Experian Employee
    Options

    I can able to get the latest_id upto 2 levels of changes by multiple joins between from_id and to_ids. But there might be many links and difficult to assign latest id to all the linked Ids.

    In above result, I achieved to assign 160 for 2 changes from latest event to old event (120 →160, 110 → 120). But to get for 105 (change from 105 → 110), again i need to traverse to 2nd record (id=120) and get the latest id).

    Any suggestions.