Edit Distance

In the Edit Distance function with Levenshtein or Jaro-Winkler we have a field "Length" (1-4) and "Prefix weighting" (1-100).

What impact have these fields on the result?

Best Answer

Answers

  • Chris DownerChris Downer Administrator

    Just doing some reading on this there is an interesting use case here where there is a bias towards the prefix being similar (list of apps in this case)


  • Clinton JonesClinton Jones Experian Elite
    edited June 22


    @Chris Downer is there a couple of practical examples where the results are materially different when the prefix weight is adjusted?

    Also, there are two functions, what is the difference between them


  • Chris DownerChris Downer Administrator
    edited June 22

    I put together a quick View that you can experiment with. Download the attached .dmx file and import it to your space:

    In this view i simply apply the Jaro Winkler algorithm to a matrix so you can experiment with the variables.

    For example in the following three Views i have

    Length=4 Prefix Weighting=100%


    Length=4 Prefix weighting=25(%)

    Length=4 Prefix Weighting=1(%)

    When less weight is given to the prefix the scores are lower because the prefix is similar.

  • Chris DownerChris Downer Administrator
    edited June 22

    Just FYI, if you download and import the Dataset/View (Dynamo_JaroWinkler.dmx), you need to use the first transform step (highlighted red below) to change the functions - i've hidden the cells with the matrix values...


Sign In or Register to comment.