Where does the cat sleep (find the closest sentence)?
I had a somewhat unique case where I needed to import data with sentences indicating where to store them, but sometimes the sentences were not all spelled correctly. They were very close, but not close enough to use a "like" test. So, I sought a way to find the closest sentence using the Levenshtein distance method (https://en.wikipedia.org/wiki/Levenshtein_distance). This analysis indicates how many changes need to be made to transform one sentence into another. For example, to change "bonjour tout le monde" (hello everyone) to "bonjour le monde" (hello the world), you need to remove 4 letters, resulting in a "distance" of 4. This is closer than "bonjour les amis" (hello friends), which has a distance of 10 from "bonjour tout le monde."
Later, I wanted to use this as a keyword search system. However, it didn't work correctly because, in the case of keywords, it's not just the distance to the sentence that matters, but also the number of keywords present in the sentence. Thus, I added a distance in terms of the number of keywords found, combined with the distance to the sentence.
It seems to me that this now provides a satisfactory search engine for finding a close sentence and/or keywords in a list of sentences.
I don't have a direct application in mind, but perhaps some of you will have ideas .
In any case, don't forget to keep having fun while programming .
Reply
Content aside
-
1
Likes
- 8 mths agoLast active
- 68Views
-
1
Following