Spelling Check
I've quite a few datasets that have spelling problems or words with number instead of a letter and vice versa i.e. Flat I (1 is replaced by an "l" ) or FIeming ( "l" is replaced by a capital "i") . What is your advise to handle this issue?
Best Answers
-
@Marco_13112001 there are several ways to address this.
- If you are able to identify the specific problem, then you can create specific transformation functions to clean/correct those values. For example, you can use one of the Replace functions to replace specific values with a standardized value and use functions to standardize values to title case or upper case.
- Alternatively, you may want to take a look at our Find Duplicates step (You need an add-on license for this), where you can create rules to identify potential duplicates and then later use the Harmonize Duplicates step to select the best value to keep. Find duplicates is a huge topic on its own. More information is available here on the community.
0 -
This is a good question. Spell check would not work for addresses as there are so many 'correct' words that would be flagged
If there are a handful of common issues that you wanted to flag then you could build out a Contains function:
If you are confident that the correct things are being detected then work on updating this to a Replace.
You could make it more generic using Regex to find any words containing numbers for example:
Might want to expand to ignore 'words' that are just numbers.
An example function using Regex here: https://community.experianaperture.io/discussion/570/invalid-character-for-names
Interested to see if anyone has any better ideas to solve this issue
0 -
@Marco_13112001 Are you using the Validate Addresses step? what was the result returned for Flat I ?
0
Answers
-
@Sueann See no, we are not using the validate addresses step. The dataset was created with this issues so we could simulate the process and I couldn't find a answer on the "normal" function list. However, your answer combine with @Josh Boxer gave me a path to solve the issue.
1