Removing an unwanted intermittent string from the start of an alphanumeric field
At St. James’s Place we needed to remove a string, that occurred intermittently, from the start of an alphanumeric dataset. The “Trim” function didn’t quite meet our needs so we managed to achieve this by creating a new column in a “Transform” step as follows :
- “Replace” function, replacing the string that you are searching for with nulls
- Feed this into a “First Non Null” function
- (this is either 1 if the string hasn’t been found or the position of the first character after the matched string if it existed)
- Feed this into a “Substring” function, with the start and length equal to 1
- (this will give you the first required character – either the first character or the first character after the string if present)
- Feed this into a “Position” function, searching the whole string and looking for the results from iii)
Finish by feeding this into a final “Substring”, searching the whole string, starting from the number returned from iv) for length 0 (which takes you to the end of the string)
Comments
-
a nice catch @Nigel Light
0 -
Great post @Nigel Light, hopefully helps others too!
0 -
@Nigel Light great post, I'm wondering if you were aware of the AFTER function which will give you the remaining string after the detected string. There is also a Before function as well which will do the same but the other way around.
2 -
Thanks - I wasn't aware of this one. Would it still work if there were 2 occurrences of the string at the start?
0 -
@Nigel Light unfortunately it doesn't do a dedupe. We can look at that as perhaps an enhancement
0 -
Looking at another function, using Strip Substring will remove all instances of a given string
1