How to determine Nth occurence in a column?
Hello.
Do you know of any function or combination of functions that can achieve a new column in the data as the "occurrence" column in the screenshot below?
I would like to know if the ID in the first column is the 1st, 2nd ... Nth occurrence in that column. Is there a way to achieve this?
Thanks in advance.
Richard
Best Answer
-
Hi Richard,
I came up with a few solutions yesterday (one involving a Cartesian Join, another involving Group/Join) but also came up with something that, like yours, involves a Lookup but doesn't require Get Cell and preserves sort order.
(See workflow screenshot below)
Step 1 "Add Source Row Number" - add current row number to the source (call the column "source row")
Step 2 - "Sort by Value" - sort this ascending by the value
Step 3 - feed this into 1) the first input of a Lookup step 2) "Add Row Number" a transform step that adds the row number again ("sorted row number") and feed that into the second input of the Lookup step ("Value -> First Row"). Add a lookup definition that matches Value == Value and returns the first value of "sorted row number". Call this definition "first row"
Step 4 - "Compute Occurrence" - a transform step to (like your solution) subtract "first row" from the current row number (this results in a zero-indexed nth occurence) and call this column "occurrence"
Step 5 - "Restore Original Order" - restore the original sort order by sorting ascending on "source row"
Step 6 - "Hide Columns and Adjust" - hide unnecessary columns and add 1 to "occurrence" to make it 1-indexed
In your solution, if you lookup and return first value (which would be a row number) you don't need Get Cell to mark the change in value.
Ian
2
Answers
-
Hi Richard
There are two similar values that you could calculate more easily
- the total number of occurrences for each ID using a Group
- the First or Last occurrence of each ID using a Lookup step
Assuming neither of those meet your needs I think it would be possible using multiple functions that utilize the Function Get cell however it would require the data to be pre-sorted (by ID in this case), would that be an issue for any subsequent operations?
0 -
Hi Josh,
Thanks for your comment. The aim is to assign a new, unique sequential ID based on the original ID, so sorting won't be an issue.
After posting the question we've been experimenting a bit with Get Cell as you suggested and what seems to work is the following:
- Sort
- Mark the first appearance of a value in a new column (with an X) and also add CurrentRow in another
- Do a lookup with this table on itself, by returning the Row number where we have the ID and the mark
- Subtract the (LookupResult-1) from the CurrentRow to get this sequential numbering
2