Split a value into multiple rows

Luke
edited December 2023 in General

Hi

I have full addresses split by ; that I want to split out into new rows ready for the Address Validate step.

An example of the data layout ...

"Address1 Address2 Address3 Town County Postcode;Address1 Address2 Address3 Town County Postcode"

This should become two rows

"Address1 Address2 Address3 Town County Postcode"

"Address1 Address2 Address3 Town County Postcode"

Does anyone know if this is achievable in Aperture?

Thanks

Luke

Answers

  • Josh Boxer
    Josh Boxer Administrator
    edited December 2022

    Hi Luke

    Roughly how many addresses are there in that semi-colon separated list?

    Two ideas:

    1. Upload the data using semi-colon as the delimiter so that each address becomes a new column then in a Workflow use Columns to rows and select every column turning them into rows
    2. create a custom file parser https://docs.experianaperture.io/data-quality/aperture-data-studio-v2/extend-data-studio-functionality/use-the-sdk/#getstartedwiththesdk

    Hoping someone will reply with a better suggestion!

  • Danny Roden
    Danny Roden Administrator

    If you can split the values out into seperate columns, then you can use the 'columns to rows' step to split them out to individual rows. I've recently took an approach like this when creating a reusable workflow to generate a word frequency list https://community.experianaperture.io/discussion/859/word-frequency

    @Josh Boxer@Luke what are your thoughts on the idea of us tweaking the 'columns to rows' step to be more flexible to accommodate 'values to rows' (where values could either be presented in multiple columns OR in a list, where the user can define the delimiter)?

    @Sharat Kutty has also recently been looking to do something similar so it's something that's being discussed at the moment.


  • @Josh Boxer@Danny Roden Thanks for your suggestions, with this particular project I had around 10,000 records that expanded to around 14,000 records once the semi-colon delimited addresses were parsed. In the end I parsed the records into rows using SQL.

    @Danny Roden I think the values to rows idea with a user a defined delimiter would be perfect for this problem.

  • You could try to search for the first ; and use a substring function

    And, if there are more than 2, then repeat, starting from this position and search for the next one. Repeat as required

    Nige

  • Danny Roden
    Danny Roden Administrator

    Yea good idea @Nigel Light @Nigel Light . Also the 'split' function does a nice job if there's only a few to parse out. But when it's an unknown (or variable) number of inputs it can be a little challenging.

  • Thanks @Nigel Light @Danny Roden

    I'm not sure how to make use of 'split' and 'substring' functions, and then use the columns to rows to tool, when I don't know the number of separated addresses. Some only have one address, others have over 10.

  • Danny Roden
    Danny Roden Administrator

    For the time being I'd suggest profiling all the values in the column to understand the maximum list size. You can use this to inform the logic for how you then build out the required additional columns to split the list into seperate addresses:

    So the 'Split' function is probably the best current option available for the job (simply set the seperator and specify which element you want and do this N times for how ever many addresses there are in the most populated value in your list).


    There's also a custom step available which performs a quick 'text to columns' split similar to the tool in Excel (contact your account manager or techincal consultant if you'd like access to this).


    Cheers,

    Danny

  • @Danny Roden I have a similar task. Split values from a column to rows based on a delimiter with a variable number of values per column. I think the max being 25 … would the text to column step be the best option for this task. Dropping empty columns before pivoting to rows?