How to determine Nth occurence in a column?

Richard Illyes
edited December 2023 in General

Hello.

Do you know of any function or combination of functions that can achieve a new column in the data as the "occurrence" column in the screenshot below?

I would like to know if the ID in the first column is the 1st, 2nd ... Nth occurrence in that column. Is there a way to achieve this?

Thanks in advance.

Richard


Best Answer

  • Ian Thornton
    Ian Thornton Experian Employee
    Answer ✓

    Hi Richard,

    I came up with a few solutions yesterday (one involving a Cartesian Join, another involving Group/Join) but also came up with something that, like yours, involves a Lookup but doesn't require Get Cell and preserves sort order.

    (See workflow screenshot below)

    Step 1 "Add Source Row Number" - add current row number to the source (call the column "source row")

    Step 2 - "Sort by Value" - sort this ascending by the value

    Step 3 - feed this into 1) the first input of a Lookup step 2) "Add Row Number" a transform step that adds the row number again ("sorted row number") and feed that into the second input of the Lookup step ("Value -> First Row"). Add a lookup definition that matches Value == Value and returns the first value of "sorted row number". Call this definition "first row"

    Step 4 - "Compute Occurrence" - a transform step to (like your solution) subtract "first row" from the current row number (this results in a zero-indexed nth occurence) and call this column "occurrence"

    Step 5 - "Restore Original Order" - restore the original sort order by sorting ascending on "source row"

    Step 6 - "Hide Columns and Adjust" - hide unnecessary columns and add 1 to "occurrence" to make it 1-indexed


    In your solution, if you lookup and return first value (which would be a row number) you don't need Get Cell to mark the change in value.

    Ian

Answers

  • Josh Boxer
    Josh Boxer Administrator

    Hi Richard

    There are two similar values that you could calculate more easily

    1. the total number of occurrences for each ID using a Group
    2. the First or Last occurrence of each ID using a Lookup step

    Assuming neither of those meet your needs I think it would be possible using multiple functions that utilize the Function Get cell however it would require the data to be pre-sorted (by ID in this case), would that be an issue for any subsequent operations?

  • Hi Josh,

    Thanks for your comment. The aim is to assign a new, unique sequential ID based on the original ID, so sorting won't be an issue.

    After posting the question we've been experimenting a bit with Get Cell as you suggested and what seems to work is the following:

    1. Sort
    2. Mark the first appearance of a value in a new column (with an X) and also add CurrentRow in another
    3. Do a lookup with this table on itself, by returning the Row number where we have the ID and the mark
    4. Subtract the (LookupResult-1) from the CurrentRow to get this sequential numbering