Is there an easy way to do this partitioning?

Hi,

I have seen previous posts on partitioning, and use that technique regularly. However, my need this time is slightly different. I have groupings of duplicates, but I would like to count incrementally when I come across a new identifier. I actually have this working now, but it took some Aperture kung-fu, so wondering if anyone has a simple way to do this.

The column I am focusing on to create is ID_NUM, I only want it to increment within the GROUP when a new IDENTIFIER is found, otherwise keep the existing ID_NUM. (ID_TOT shows how many unique IDs there are in the group.)

Thanks

GROUP

IDENTIFIER

ID_TOT

ID_NUM

AAAA

12345

4

1

AAAA

99999

4

2

AAAA

99999

4

2

AAAA

54321

4

3

AAAA

11111

4

4

AAAA

11111

4

4

BBBB

90210

3

1

BBBB

90210

3

1

BBBB

77777

3

2

BBBB

13579

3

3

BBBB

13579

3

3

Answers

  • Josh Boxer
    Josh Boxer Administrator
    edited May 27

    Take a look at this discussion:

    https://community.experianaperture.io/discussion/837/

    If that does not help then might useful to share more detail on how you approached and where you'd like it to be simpler

  • Thanks Josh, I had not seen that discussion in my forum search. The two that I used for help were:

    https://community.experianaperture.io/discussion/244/partition-values-dynamic-sql-partition-row-number#latest

    The steps that I did were:

    1. Create a new column "Sort-row" from Row number (to be used by sort later to preserve the order)
    2. Group on: "Group", "Identifier" and aggregate on first populated value of "Sort-row"
    3. Sort the result (7 rows) by "Sort-row"
    4. Add new column "Row#2" from Row number
    5. Use a Lookup to get the row number where the partition (Group) starts
    6. Calculate the row increments (Part-row-incr) within the partition (Add(Subtract([c:ROW#2], [c:PART-ROW-START]),1))
    7. Do a Final lookup using the original data (11 rows) to the 7 rows from step 6. Lookup type=First, Return Column= "part-row-incr" where group=group and identifier=identifier

    Looking at the thread you provided, I don't think I could do this much simpler. I did however have another method of creating a key of "Group+Identifier" and using Harmonize Duplicates rather than the grouping from step 2 above. Both work fine, I just don't know which method would be more efficient with 100,000s of duplicates and groupings - I haven't tested that yet. Any thoughts on that?

    Thanks

  • @Josh Boxer - any thoughts on the Harmonize vs Grouping method?

    Also, if Aperture had a step similar to this mockup, I would be very happy!! It would add in 2 columns for Count and Row based on the column names/requirements I gave above. (I am sure adding other functions as well, such as average, sum etc would be useful).

    Thanks

    image.png
  • Ian Buckle
    Ian Buckle Experian Contributor

    Many moons ago i did write a partition custom step, so i think it should be possible to do within the SDK.

    I'm not really sure of the status of this step and whether it was maintained for later versions though.

    image.png
  • That's great Ian!! Seems like a native step would be used by others.