Is there an easy way to do this partitioning?
Hi,
I have seen previous posts on partitioning, and use that technique regularly. However, my need this time is slightly different. I have groupings of duplicates, but I would like to count incrementally when I come across a new identifier. I actually have this working now, but it took some Aperture kung-fu, so wondering if anyone has a simple way to do this.
The column I am focusing on to create is ID_NUM, I only want it to increment within the GROUP when a new IDENTIFIER is found, otherwise keep the existing ID_NUM. (ID_TOT shows how many unique IDs there are in the group.)
Thanks
GROUP | IDENTIFIER | ID_TOT | ID_NUM |
---|---|---|---|
AAAA | 12345 | 4 | 1 |
AAAA | 99999 | 4 | 2 |
AAAA | 99999 | 4 | 2 |
AAAA | 54321 | 4 | 3 |
AAAA | 11111 | 4 | 4 |
AAAA | 11111 | 4 | 4 |
BBBB | 90210 | 3 | 1 |
BBBB | 90210 | 3 | 1 |
BBBB | 77777 | 3 | 2 |
BBBB | 13579 | 3 | 3 |
BBBB | 13579 | 3 | 3 |
Answers
-
Take a look at this discussion:
https://community.experianaperture.io/discussion/837/
If that does not help then might useful to share more detail on how you approached and where you'd like it to be simpler
0 -
Thanks Josh, I had not seen that discussion in my forum search. The two that I used for help were:
https://community.experianaperture.io/discussion/244/partition-values-dynamic-sql-partition-row-number#latest
The steps that I did were:
- Create a new column "Sort-row" from Row number (to be used by sort later to preserve the order)
- Group on: "Group", "Identifier" and aggregate on first populated value of "Sort-row"
- Sort the result (7 rows) by "Sort-row"
- Add new column "Row#2" from Row number
- Use a Lookup to get the row number where the partition (Group) starts
- Calculate the row increments (Part-row-incr) within the partition (Add(Subtract([c:ROW#2], [c:PART-ROW-START]),1))
- Do a Final lookup using the original data (11 rows) to the 7 rows from step 6. Lookup type=First, Return Column= "part-row-incr" where group=group and identifier=identifier
Looking at the thread you provided, I don't think I could do this much simpler. I did however have another method of creating a key of "Group+Identifier" and using Harmonize Duplicates rather than the grouping from step 2 above. Both work fine, I just don't know which method would be more efficient with 100,000s of duplicates and groupings - I haven't tested that yet. Any thoughts on that?
Thanks
0 -
Workflow if interested
0 -
@Josh Boxer - any thoughts on the Harmonize vs Grouping method?
Also, if Aperture had a step similar to this mockup, I would be very happy!! It would add in 2 columns for Count and Row based on the column names/requirements I gave above. (I am sure adding other functions as well, such as average, sum etc would be useful).
Thanks
0 -
Many moons ago i did write a partition custom step, so i think it should be possible to do within the SDK.
I'm not really sure of the status of this step and whether it was maintained for later versions though.
0 -
That's great Ian!! Seems like a native step would be used by others.
0