Replacing the string value with Null if the end of the string matches with the LOV

Mahulima
Mahulima Contributor
edited December 2023 in General

I have company names in a field I want to replace the end of the company name with Null if the name is ending with OY, OYJ, RY etc. For example if I have company name as Mahu COMPANY OY I want to replace it as Mahu Company

I have a long list so I have kept that in a lookup table now which function should I use to achieve this

In short want to check if the company Name contains any of the value present in the Look up table column value then that will be replaced with Null

Tagged:

Comments

  • Seamus Kyle
    Seamus Kyle Experian Contributor

    Hi Mahulima

    I think the function "Replace Matches" could do this for you. However you would need 2 columns in your lookup table. One would be "lookup value" and one would be "return value".

    So, in your example, you would have the value "Mahu Company OY" in the lookup value column and the value "Mahu Company" in the return value column.

    As long as you were able to create the lookup table like this, then "Replace Matches" should work.

    Seamus

  • Akshay Davis
    Akshay Davis Experian Super Contributor
    edited July 2021

    You can do this by supplying a look up table of regex terms for the terns to identify and null for the replacement value.


    Then create a function to replace the text, this is simply a replace matches function.

    Which should return the results you need.


  • Mahulima
    Mahulima Contributor

    @Seamus Kyle and @Akshay I am not aware of the entire list of the company names so My Lookup table can not contain the company names rather I can only include those in the Lookup which will be replaced by Null

    So in that case Replace Matches will not work. I want some function which I read the entire string and if it finds any of the LOV values in the string it will replace with Null

  • Josh Boxer
    Josh Boxer Administrator

    Hi Mahulima

    Please review Akshay's solution again. He has shown how to turn your list 'Company Org LOV' into Regex values that can be used with the Replace Matches functions and the match type 'Regular Expression'. This will then search the end of each string and remove any matching values.

  • Seamus Kyle
    Seamus Kyle Experian Contributor

    @Akshay Davis - good thinking. A much better solution than mine.

    @Mahulima , Akshay's suggestion will do exactly what you want. I.e. it will find any example of OY, AB, ABP (as defined in the lookup table) and then replace them with null.

    Note, in Akshay's example lookup table the regex is matching OY (etc) only at the end of the line. The regex is "\bOY$". Here \b designates a "word boundary" typically white space, and $ designates end of line.

    If you wanted to replace OY etc anywhere in the string you should use the regex \bOY\b. That is OY surrounded by white space. Hope that makes sense.

  • Mahulima
    Mahulima Contributor

    @Akshay @Seamus Kyle @Josh Boxer Thank you very much for your Help!. I have got exactly what I was looking for.

    @Akshay Davis Your solution worked. :) Thank you very much! I really appreciate your help!

  • Mahulima
    Mahulima Contributor

    @Akshay Davis I have another scenario where I have Company name Mahu Company-OY, after reg replace I want Mahu Company as Result what should I use in the RegEx LOv List. Can you please help me. I want to use some regex which will cover both scenarios Space and "-".

    Mahu Company OY Result Mahu Company, Mahu Company-OY result Mahu Company

  • Josh Boxer
    Josh Boxer Administrator

    @Mahulima Seamus gave detail above "The regex is "\bOY$". Here \b designates a "word boundary" typically white space, and $ designates end of line."

    So \b is the space character which you want to replace with a hyphen:

    -OY$
    -OYJ$
    -RY$
    \bOY$
    \bOYJ$
    \bRY$
    

    There are lots of sites to help with writing/testing Regex such as this one https://regex101.com/

  • Mahulima
    Mahulima Contributor
    I tried with using -RY$ but its not working
    


  • Mahulima
    Mahulima Contributor
    As in CSV I have to give '-RY$ otherwise not able to enter only -RY$ 
    


  • Josh Boxer
    Josh Boxer Administrator

    You have to get the correct Unicode hyphen matching your data which could possibly be the issue. Edit your CSV in notepad as the ' at the start is an Excel issue.

    You could possibly use \W = matches any non-word character (equivalent to [^a-zA-Z0-9_])

    \WOY$
    
  • Mahulima
    Mahulima Contributor
    edited August 2021

    @Josh Boxer I tried both but not working. Also I edited in Notepad and now there is no ' character in the beginning

    Marked in Yellow is not working where as non marked is working

  • Pavan Rao
    Pavan Rao Experian Contributor
    edited August 2021

    Hi @Mahulima , I presume you are using the Replace Matches function suggested by Akshay earlier. For your query to cover both scenarios Space and "-", I suggest using a "Replace" function first where it replaces "-" to a single Space and then the rest will be handled by the "Replace Matches Function" as before. Something like this might help:


  • Mahulima
    Mahulima Contributor

    @Pavan Rao But I dont want to replace all "-" to Spaces. I want to only replace when I have -AMMATTIKORKEAKOULU

  • Pavan Rao
    Pavan Rao Experian Contributor

    @Mahulima - Thanks for clarifying. Try this in regex lookup instead:

    [-]AMMATTIKORKEAKOULU$

  • Seamus Kyle
    Seamus Kyle Experian Contributor

    HI @Mahulima

    I think I know what the issue is. The problem is that the function "Replace Matches" works by looking for matches based on space separated values (there are other options e.g. whole value or comma separated but we are using space separated).

    That is why it works for the string "Mahu Company OY" here the OY is a space separated value.

    However with the string "Mahu Company-OY" OY is not a space separated value so the Replace Matches regexes will not match it.

    I have thought of a workaround that, I think, should resolve this problem. In this, I temporarily replace any "-" with a string like "XXXX -" (Note there is a space after "XXXX" and before the "-). Here the XXXX is just a string which is unlikely to appear in your real data. ".

    This would change "Mahu Company-OY" "to "Mahu CompanyXXXX -OY".

    You then run the Replace Matches and it will correctly remote the "-OY" as now there is a space before it.

    After this, do another Replace to change "XXXX " (four Xs and a space) with null. This will put back the string to the original version (except for any values removed by the Replace Match function).

    See example below. I hope you can see that OK.


    IMPORTANT: the last Replace has "XXXX " i.e. 4 Xs and a space.

    Here the regex file is this:


    Let me know if this is helpful.

  • Mahulima
    Mahulima Contributor

    @Seamus Kyle Thank you very much for your detailed work aroung the only problem I see with this solution is if I replace any "-" with a string like "XXXX - then it will replace all the "-" buyt I dont want to do that I want to replace only  -AMMATTIKORKEAKOULU, for eExample if I have value Mahu Company  -AMMATTIKORKEAKOULU I want Mahu Company as result, and that is the only case, I dont want this to happen in all scenarios where I have "-"

  • Sueann See
    Sueann See Experian Super Contributor

    @Mahulima what is the purpose of removing those values from the company name? Do you ultimately want to match/find duplicates among these names?