In the Edit Distance function with Levenshtein or Jaro-Winkler we have a field "Length" (1-4) and "Prefix weighting" (1-100).
What impact have these fields on the result?
I'm not going to pretend to understand the maths, but the prefix value gives higher ratings to strings that have a common prefix up to a prefix length of 4
The prefix weighting will be applied to the length of prefix that you supply and weight the results in preference of the strings with common prefixes.
Just doing some reading on this there is an interesting use case here where there is a bias towards the prefix being similar (list of apps in this case)
@Chris Downer is there a couple of practical examples where the results are materially different when the prefix weight is adjusted?
Also, there are two functions, what is the difference between them
I put together a quick View that you can experiment with. Download the attached .dmx file and import it to your space:
In this view i simply apply the Jaro Winkler algorithm to a matrix so you can experiment with the variables.
For example in the following three Views i have
Length=4 Prefix Weighting=100%
Length=4 Prefix weighting=25(%)
Length=4 Prefix Weighting=1(%)
When less weight is given to the prefix the scores are lower because the prefix is similar.
Just FYI, if you download and import the Dataset/View (Dynamo_JaroWinkler.dmx), you need to use the first transform step (highlighted red below) to change the functions - i've hidden the cells with the matrix values...