Programming
python probability similarity metric
Updated Sun, 04 Sep 2022 20:07:05 GMT

Find the similarity metric between two strings


How do I get the probability of a string being similar to another string in Python?

I want to get a decimal value like 0.9 (meaning 90%) etc. Preferably with standard Python and library.

e.g.

similar("Apple","Appel") #would have a high prob.
similar("Apple","Mango") #would have a lower prob.



Solution

There is a built in.

from difflib import SequenceMatcher
def similar(a, b):
    return SequenceMatcher(None, a, b).ratio()

Using it:

>>> similar("Apple","Appel")
0.8
>>> similar("Apple","Mango")
0.0




Comments (5)

  • +0 – See this great answer comparing SequenceMatcher vs python-Levenshtein module. stackoverflow.com/questions/6690739/… — Feb 09, 2015 at 13:06  
  • +1 – Interesting article and tool: chairnerd.seatgeek.com/… — Jan 05, 2016 at 19:04  
  • +7 – I would highly recommend checking out the whole difflib doc docs.python.org/2/library/difflib.html there is a get_close_matches built in, although i found sorted(... key=lambda x: difflib.SequenceMatcher(None, x, search).ratio(), ...) more reliable, with custom sorted(... .get_matching_blocks())[-1] > min_match checks — Sep 15, 2016 at 19:51  
  • +2 – @ThorSummoner brings attention to a very useful function (get_closest_matches). It's a convenience function that may be what you are looking for, AKA read the docs! In my particular application I was doing some basic error checking / reporting to the user providing bad input, and this answer allows me to report to them the potential matches and what the "similarity" was. If you don't need to display the similarity, though, definitely check out get_closest_matches — Sep 03, 2017 at 22:54  
  • +0 – This worked perfectly. Simple and effective. Thankyou :) — May 09, 2020 at 16:39