# How to determine Nth occurence in a column?

Learner

Hello.

Do you know of any function or combination of functions that can achieve a new column in the data as the "occurrence" column in the screenshot below?

I would like to know if the ID in the first column is the 1st, 2nd ... Nth occurrence in that column. Is there a way to achieve this?

Richard

• Experian Employee

Hi Richard,

I came up with a few solutions yesterday (one involving a Cartesian Join, another involving Group/Join) but also came up with something that, like yours, involves a Lookup but doesn't require Get Cell and preserves sort order.

(See workflow screenshot below)

Step 1 "Add Source Row Number" - add current row number to the source (call the column "source row")

Step 2 - "Sort by Value" - sort this ascending by the value

Step 3 - feed this into 1) the first input of a Lookup step 2) "Add Row Number" a transform step that adds the row number again ("sorted row number") and feed that into the second input of the Lookup step ("Value -> First Row"). Add a lookup definition that matches Value == Value and returns the first value of "sorted row number". Call this definition "first row"

Step 4 - "Compute Occurrence" - a transform step to (like your solution) subtract "first row" from the current row number (this results in a zero-indexed nth occurence) and call this column "occurrence"

Step 5 - "Restore Original Order" - restore the original sort order by sorting ascending on "source row"

Step 6 - "Hide Columns and Adjust" - hide unnecessary columns and add 1 to "occurrence" to make it 1-indexed

In your solution, if you lookup and return first value (which would be a row number) you don't need Get Cell to mark the change in value.

Ian

Hi Richard

There are two similar values that you could calculate more easily

1. the total number of occurrences for each ID using a Group
2. the First or Last occurrence of each ID using a Lookup step

Assuming neither of those meet your needs I think it would be possible using multiple functions that utilize the Function Get cell however it would require the data to be pre-sorted (by ID in this case), would that be an issue for any subsequent operations?

• Learner

Hi Josh,

Thanks for your comment. The aim is to assign a new, unique sequential ID based on the original ID, so sorting won't be an issue.

After posting the question we've been experimenting a bit with Get Cell as you suggested and what seems to work is the following:

1. Sort
2. Mark the first appearance of a value in a new column (with an X) and also add CurrentRow in another
3. Do a lookup with this table on itself, by returning the Row number where we have the ID and the mark
4. Subtract the (LookupResult-1) from the CurrentRow to get this sequential numbering