Spelling Check

I've quite a few datasets that have spelling problems or words with number instead of a letter and vice versa i.e. Flat I (1 is replaced by an "l" ) or FIeming ( "l" is replaced by a capital "i") . What is your advise to handle this issue?

Tagged:

Best Answers

  • Sueann SeeSueann See Administrator
    edited April 20 Answer ✓

    @Marco_13112001 there are several ways to address this.

    • If you are able to identify the specific problem, then you can create specific transformation functions to clean/correct those values. For example, you can use one of the Replace functions to replace specific values with a standardized value and use functions to standardize values to title case or upper case.
    • Alternatively, you may want to take a look at our Find Duplicates step (You need an add-on license for this), where you can create rules to identify potential duplicates and then later use the Harmonize Duplicates step to select the best value to keep. Find duplicates is a huge topic on its own. More information is available here on the community.
  • Josh BoxerJosh Boxer Administrator
    Answer ✓

    This is a good question. Spell check would not work for addresses as there are so many 'correct' words that would be flagged

    If there are a handful of common issues that you wanted to flag then you could build out a Contains function:

    If you are confident that the correct things are being detected then work on updating this to a Replace.

    You could make it more generic using Regex to find any words containing numbers for example:

    Might want to expand to ignore 'words' that are just numbers.

    An example function using Regex here: https://community.experianaperture.io/discussion/570/invalid-character-for-names

    Interested to see if anyone has any better ideas to solve this issue

  • Sueann SeeSueann See Administrator
    Answer ✓

    @Marco_13112001 Are you using the Validate Addresses step? what was the result returned for Flat I ?

Answers

  • Marco_13112001Marco_13112001 Super Learner

    @Sueann See no, we are not using the validate addresses step. The dataset was created with this issues so we could simulate the process and I couldn't find a answer on the "normal" function list. However, your answer combine with @Josh Boxer gave me a path to solve the issue.

Sign In or Register to comment.