Exact matching is simple: two strings are either the same or they are not. That is useful for identifiers, checksums, and codes where one wrong character should fail the comparison. It is much less useful when the text came from people. Names, addresses, product titles, survey answers, and search queries often contain typos, missing words, changed word order, abbreviations, or different spellings of the same sound.
Fuzzy matching is the practice of deciding when two text values are similar enough to treat as the same thing even though their characters are not identical. The phrase sounds like one technique, but in real systems it is a family of choices. A spelling correction feature, a duplicate-customer merge, and a search engine autocomplete box can all use fuzzy matching, yet they do not need the same definition of “close.” The important engineering question is not whether two strings are vaguely similar. It is which kinds of difference your application should forgive, which kinds it should reject, and where the cutoff should sit.
Fuzzy String Matching
Fuzzy string matching compares text values by similarity instead of exact equality. The input is usually a pair of strings, and the output is either a distance, a normalized score, or a decision such as accept, reject, or review. The key design choice is what kind of difference should count as small: a typo, a missing character, a swapped adjacent pair, an abbreviation, a reordered token, or a shared prefix.
Character repair is the first model to learn because it makes the score explainable. Instead of asking how many letters two strings share, ask how expensive it is to transform one string into the other.
The matrix below makes that cost visible. The source string runs across the columns, the target string runs down the rows, and the highlighted route is the cheapest sequence of repairs found by the algorithm.
Changing kitten to sitting makes the path pay for local repairs: substitute one character, keep several matching characters, then insert the final one.
The martha and marhta example shows why adjacent swaps matter.
If a comparator allows a transposition as one operation, a common typing error can be treated as a small mistake instead of two separate substitutions.
The score at the end is therefore not a general percentage of shared letters.
It is the cost of the cheapest concrete route through the grid.
Levenshtein Distance For Fuzzy Matching
Levenshtein distance is the classic edit-distance measure for fuzzy matching.
It starts from a plain idea: two strings are close when one can be repaired into the other with a small number of operations.
The usual operations are insertion, deletion, and substitution.
If the source is kitten and the target is sitting, substituting k with s, substituting e with i, and inserting g gives a distance of 3.
Other routes are possible, but the algorithm cares about the cheapest route.
This is why the matrix is useful. Each cell represents a pair of prefixes: some first part of the source and some first part of the target. The number in that cell is the cheapest known repair cost between those two prefixes. To fill a cell, the algorithm looks at nearby cells and asks which previous state gives the least expensive next step. A diagonal move compares the next two characters. If they match, the cost stays the same. If they differ, a substitution adds cost. A horizontal move deletes a source character. A vertical move inserts a target character.
This dynamic-programming structure is the foundation behind the classic Wagner-Fischer edit-distance algorithm described in The String-to-String Correction Problem. The point of dynamic programming is not only speed. It also gives the score a precise meaning: the final bottom-right cell contains the best repair cost for the whole source and target strings because every shorter prefix comparison has already been solved.
A common normalized similarity score turns distance into a 0-to-1 value:
Here the numerator is the cheapest number of repairs, and the denominator scales that cost by the longer string length. A distance of 1 is severe for a two-character code and minor for a twenty-character product title. The normalization makes comparisons easier to display, but it does not remove the domain question. You still need to decide whether a given score is acceptable for the field being matched.
Transpositions Match Human Typing Errors
Plain Levenshtein distance counts insertions, deletions, and substitutions.
That is a strong baseline because many spelling mistakes are local.
It also has a limitation: swapping two adjacent letters costs two edits if swaps are not part of the operation model.
For martha and marhta, the t and h have changed places.
Treating that as two substitutions feels stricter than the way many people think about the mistake.
Damerau’s work on spelling-error correction described adjacent transposition as one of the common error types, alongside missing, extra, and wrong letters. That is the motivation for Damerau-Levenshtein distance: it extends the operation set so a neighboring swap can cost one repair. The matrix makes the consequence visible. When a long diagonal swap is available, the path can cross the swapped pair directly. When it is not available, the path must pay for the mismatch in smaller steps.
The practical lesson is that a fuzzy matcher carries assumptions about the mistakes it expects. If the text comes from manual typing, transpositions are worth considering. If the field is an exact identifier where a swapped pair changes the meaning completely, forgiving the swap may be wrong. The algorithm is not only a technical detail. It is a policy about which errors are plausible and which errors are dangerous.
One Similarity Score Is Not Universal
Edit distance is often the right first explanation because it is concrete. It is not always the right comparator. Character repairs work well for local spelling mistakes, but they can be a poor fit for short names, abbreviations, word reordering, nicknames, and values where the beginning of the string carries different importance from the end.
Personal names show the problem clearly.
Jon Smith and John Smyth are probably close in many record-linkage settings, even though there are multiple character-level differences.
Christa and Christy share a long beginning but differ at the end.
Marta and Barta differ at the first character, which may be more meaningful than a suffix change for some datasets.
The same edit count can feel different depending on where the change happens.
That is why name-oriented comparators such as Jaro and Jaro-Winkler exist. Jaro rewards matching characters that appear within a limited window of their original position. It is less focused on the exact sequence of insertions and deletions than Levenshtein distance. Jaro-Winkler starts from Jaro and adds a prefix bonus when the strings share the same beginning. This can be useful for names because early characters often carry strong identifying signal, but it can also be too generous when many unrelated records share a common prefix.
Comparators Encode Different Beliefs
The next visualization puts three comparators on the same 0-to-1 scale. The editable name pair drives all three lanes at once. The threshold line is shared, so a pair can be accepted by one comparator and rejected by another without changing the underlying strings. Selecting a lane changes the character strip to show what that comparator is treating as aligned or matched.
The Martha / Marhta example separates the models well.
Levenshtein sees an edit sequence.
Jaro sees matching characters that mostly remain near their original positions, even though two are transposed.
Jaro-Winkler adds little extra information beyond that unless the shared prefix is doing real work.
With Christa / Christy, the prefix boost becomes much more visible because the beginning is unchanged while the suffix differs.
Move the threshold line and watch the accept/reject labels change. The threshold is not a property of the strings. It is a decision rule applied after a comparator has produced a score. At a strict cutoff, only very similar strings pass. At a loose cutoff, more true matches pass, but more false matches pass too. This is the same sensitivity-specificity tension that appears in applied record linkage, including patient matching studies where comparator choice changes real outcomes.
Levenshtein, Jaro, and Jaro-Winkler Favor Different Evidence
Levenshtein similarity is easiest to trust when the main problem is misspelling. It asks how many local edits are needed after giving every character position roughly equal importance. That makes it explainable and predictable. It also means it can be harsh on short strings. Changing one character in a four-letter name is a large fraction of the string, even if the name still looks plausible to a person.
Jaro compares matching characters within a window. It cares about whether the same characters can be found near one another, and it penalizes transpositions when matched characters appear in a different order. This makes it behave more naturally for some short-name comparisons because it does not require the whole transformation to be phrased as an edit script. The record-linkage work associated with Jaro was motivated by messy census-style matching, where exact equality across all fields is too brittle.
Jaro-Winkler adds a controlled boost for a shared prefix, commonly up to the first four matching characters. The useful reading is: if the names already have a strong Jaro score and they begin the same way, the comparator becomes more confident. That is often reasonable for personal names. It is not universally safe. If a product catalog contains many values beginning with the same brand, model family, or legal prefix, a prefix bonus can reward the least discriminating part of the string.
This is why practical tools and libraries rarely expose only one fuzzy function. Guidance from systems such as Splink’s comparator documentation emphasizes choosing comparators by field type. Python libraries such as RapidFuzz expose multiple ratios because raw character similarity, partial matching, token sorting, and token-set overlap answer different questions. There is no single fuzzy score that is best in all contexts.
Fuzzy Matching Thresholds
A comparator produces a score. A system needs an action. That action may be “show this search result,” “suggest a correction,” “merge these records,” “route to human review,” or “reject the match.” The threshold is the line between those actions.
Lowering a threshold usually increases recall: more real matches are captured. It also increases false positives: more non-matches are accepted. Raising a threshold usually increases precision: accepted matches are more trustworthy. It also creates missed matches. The right point depends on the cost of each error. For a search box, showing an extra result may be acceptable. For a patient record, legal entity, bank account, or medication name, an incorrect merge can be much more costly than asking for review.
This is the part that often gets hidden when fuzzy matching is presented as a helper function. The algorithm may be deterministic, but the decision is contextual. A score of 0.88 is not inherently good or bad. It is acceptable only relative to the comparator, the field, the preprocessing, the candidate set, and the cost of being wrong.
Preprocessing Changes the Problem Before Scoring
Most production fuzzy matching starts before the comparator runs. Text is commonly lowercased, accents may be normalized, punctuation may be removed, repeated whitespace may be collapsed, and known abbreviations may be expanded. These choices can help the comparator focus on meaningful differences instead of formatting noise. They can also hide differences that matter.
For example, case-insensitive matching is usually sensible for names and product titles.
Removing punctuation may help with company suffixes and addresses.
Mapping St. to Street can improve address matching, while mapping every short form too aggressively can create collisions.
Microsoft’s Power Query fuzzy matching documentation reflects this product-level view: matching behavior is shaped by thresholds, case handling, text-part combination, and transformation tables, not only by the underlying similarity score.
The same principle applies to names.
Removing accents may be useful when source systems store text inconsistently, but it also erases information from languages where accents distinguish names or words.
Trimming honorifics may help link Dr. Jane Smith to Jane Smith, but deleting too much can make many records look alike.
Preprocessing should be treated as part of the matching design, not as harmless cleanup.
Candidate Generation Keeps Matching Practical
The two visualizations compare one pair of strings at a time. That is the right scale for learning the scoring behavior. Large systems also need to decide which pairs are worth scoring. Comparing every record against every other record becomes expensive quickly. One million records contain roughly five hundred billion possible pairs if every pair is considered directly.
Candidate generation narrows the search.
A system may block records by normalized last name, postal code, date of birth, phonetic code, or another coarse key before applying a more expensive comparator.
Search systems often use n-grams or trigrams for this purpose.
PostgreSQL’s pg_trgm extension, for example, breaks text into overlapping three-character chunks and can use indexes to find strings with enough chunk overlap.
Elasticsearch’s fuzzy query approaches the retrieval problem through edit-distance-based term expansion.
Candidate generation is not a replacement for scoring. It is the step that makes scoring feasible. It can also introduce its own mistakes. If the blocking rule is too strict, true matches never reach the comparator. If it is too loose, the system spends time scoring poor candidates and may create more borderline decisions.
Fuzzy Matching Algorithm
A practical fuzzy matching algorithm is usually a pipeline rather than one isolated scoring function. For a small pairwise comparison, the algorithm can simply normalize two strings, compute a similarity score, and compare that score with a threshold. For a production dataset, the same idea needs candidate generation first, because scoring every possible pair is too expensive.
A useful fuzzy matching workflow has several separate decisions. First, normalize the text enough to remove noise that should not affect matching. Second, generate plausible candidates so the system does not compare everything with everything. Third, choose a comparator that fits the field. Edit distance is a good baseline for local spelling mistakes. Jaro and Jaro-Winkler are often useful for short names. Token methods are better when whole words may reorder or when extra words should be tolerated. Phonetic methods can help with names that sound similar, but they create collisions and depend on language assumptions.
Fourth, choose thresholds according to the cost of errors. Many systems need more than one threshold. High scores can be accepted automatically, low scores can be rejected, and the middle region can go to review. That review band is often more honest than pretending a single cutoff can handle every edge case.
Finally, evaluate the matcher on examples from the actual domain. Toy pairs are useful for understanding mechanics, but real data contains patterns you may not expect. Names cluster by culture and region. Product titles contain shared brands and repeated model words. Addresses contain abbreviations, unit numbers, and local conventions. Survey answers contain slang and inconsistent granularity. Fuzzy matching works best when the comparator, preprocessing, and threshold are tuned to those patterns instead of copied from a generic example.
Summary
Fuzzy matching starts with a simple problem: exact equality is too brittle for human-entered text. The first dependable mental model is edit distance, where similarity comes from the cheapest sequence of concrete repairs. The matrix makes that score explainable because every insertion, deletion, substitution, match, and swap changes the route through the grid.
The next lesson is that similarity depends on the comparator. Levenshtein, Jaro, and Jaro-Winkler can score the same name pair differently because they reward different evidence. Thresholds then turn those scores into decisions, and the right threshold depends on the cost of false positives and missed matches.
A good fuzzy matcher is therefore a small pipeline, not a single magic score. Normalize the text deliberately, generate plausible candidates, score with a comparator suited to the field, and set thresholds that match the consequences of being wrong.