Repeating Characters 🔁

Danny Roden
Danny Roden Administrator
edited December 2023 in Functions library

Check for same character repeated

This function uses a regular expression to identify records where the entire value of a cell is made up of the same character repeated (e.g. "aaa" or "0000000"). The idea here is to detect cells which are clearly just entered with default values (either through automated processing, e.g. in a previous migration, or through 'keyboard bashing').


See below for a screenshot with some example results displayed (note the length variable is not used in the calculation but presented to highlight some of the test values, particularly one which contains space characters):


Compatibility:

This function is compatible with all instances of Data Studio from v2.1.11 onwards.

[dl-button|Download|https://us.v-cdn.net/6031645/uploads/HWZVRS21O0NA/identify-repeating-characters.dmx]

Comments

  • Andreea C
    Andreea C Member
    edited September 2021

    Hi, how could we also identify records containing repeating characters? (in this case the repeating character would not be the entire value, but only a part of it)

    For example, if we're looking to identify characters repeating 3 times, how could we identify 'TEEEST', as well as 'EEE'?

    Thank you


    Actually I may have found a possible solution using Transformation and Parse by regular expression  (.)\1{2}

  • Fatme Kungyova
    Fatme Kungyova Experian Employee
    edited April 2022

    Hello,

    @Henry Simms and I worked on creating a function, using the regular expression, for a project that included finding words that contain repeating characters. Some of the records that we had contained groups of repeated characters, for example: "abcabcabc", others contained the same character repeated: "aaaa" or substring of a word with repeated characters: "aaaren".

    We used this function and the regular expression from the comment by @Andreea C for inspiration but changed it so it can allow groups of repeated characters to be identified. Moreover, the function also properly identifies repeated characters when punctuation and whitespace occurs. (Example: "Zab,ab,ab,abCD")

    The function can be modified easily to return required output which may be a message similar to 'There are repeated characters in the string: {the repeated chars}' or a Boolean value True/False!

    This is how the function works:

    The first part of the function uses the 'Parse by regular expression' with the regex (.*)\{2}. Then all the elements in the list created by the first step is sorted and the first value is extracted to retain the repeated characters and to remove all punctuation that may be in the string.

    The regex can be modified to to identify characters or groups of characters that are repeated 2, 3 or more times. This can be done by changing the number in the curly brackets from 2 to any number required. For example if {2} is replaced with {3}, the repeated characters have to be 4 in order to be identified by the function as repeated characters.

    The second part of the function checks if the first part returns a repeated character or a group of characters, if it does, it concatenates the message with the repeated character. Otherwise, it returns the input as given.

    This can be modified to return True or False or another message.

    Here is the DMX file for importing the function, it is compatible with all instances of Aperture Data Studio from v2(2.6.3):


  • 2 questions on this function:

    Which option do we use to create the "Repeated Characters" to put an end to that string?

    Is the dmx file something that can we downloaded above and uploaded to Aperture where this function would be added and no need to manually create it?

  • Danny Roden
    Danny Roden Administrator

    @Juan_NFCU yes, a .dmx package is an export from Data Studio containing 1 or more objects (in this case a rule). You can easily import these into your instance of Data Studio from the dropdown menu containing your 'space' options:

    Select 'syncronise' and then follow the steps to upload the .dmx package, preview what it contains and proceed to add it to your environment.

    Once in, you can apply it immediately as well as edit/tweak it as you see fit (e.g. to fix it to repeated characters at the end of the string by adjusting the regular expression in your case).

  • Juan_NFCU
    edited August 2022

    Thank you @Danny Roden. I created it based on the screenshots, but good to know that it's that simple to upload the .dmx file.