Edit distance with a similarity score
Levenshtein distance counts the minimum number of single-character operations (insert, delete, or substitute) that transform one string into another. The classic example is kitten to sitting: substitute k for s, substitute e for i, insert g; three edits, distance 3. The algorithm fills a 2D table using the recurrence min(prev+1, curr+1, prev_diag + cost), where cost is 0 for matching characters and 1 for substitutions.
Output reports four values: the edit count itself, a similarity percentage computed as (1 - distance / max(len_A, len_B)) * 100, and both string lengths in code units. The percentage is rounded to one decimal place. Identical strings score 100% with distance 0; entirely disjoint strings score 0% at distance equal to the longer length.
The split between text A and text B is a line containing exactly ---. Comparison is exact: case sensitive, whitespace included. Memory is linear in the shorter side via two rolling arrays; time is O(n × m). For pure word-level overlap see text similarity (Jaccard); for the largest unchanged span see longest common substring.
How to use levenshtein distance calculator
- 1Paste string A into the input panel, then a line with
---, then string B. - 2The four-line output appears as you type: distance, similarity %, length A, length B.
- 3Click Copy to copy the result block, or Download to save it.
- 4Use the similarity % for fuzzy-match thresholds (over 80% is "very close" for short strings).
- 5For an inline marked-up view rather than a single number, switch to character diff.
Keyboard shortcuts
Drive TextResult without touching the mouse.
| Shortcut | Action |
|---|---|
| Ctrl F | Open the find & replace panel inside the input Plus |
| Ctrl Z | Undo the last input change |
| Ctrl Shift Z | Redo |
| Ctrl Shift Enter | Toggle fullscreen focus on the editor Plus |
| Esc | Close find & replace, or exit fullscreen |
| Ctrl K | Open the command palette to jump to any tool Plus |
| Ctrl S | Save current workflow draft Plus |
| Ctrl P | Run a saved workflow Plus |
How the distance is computed
Classic Levenshtein recurrence
The algorithm fills a 2D table where dp[i][j] is the distance between the first i characters of A and the first j of B. The recurrence is min(dp[i-1][j] + 1, dp[i][j-1] + 1, dp[i-1][j-1] + cost), with cost = 0 when characters match and 1 otherwise.
Linear memory rolling arrays
Two arrays of length m + 1 stand in for the full matrix, so memory is linear in the shorter side. Multi-kilobyte inputs still complete in well under a second on a modern browser.
Similarity % derived from distance
Reported as (1 - distance / max(len_A, len_B)) * 100, rounded to one decimal place. Identical strings score 100%; an empty string compared to a non-empty string scores 0%.
Case and whitespace count
Comparison is exact. Hello versus hello registers a substitution. A space versus a tab also costs one. Lowercase or strip first if you want softer matching.
Three-hyphen separator
The split between A and B is a line containing exactly ---. The tool needs this marker to know which side is which.
Worked example
Three edits transform kitten into sitting: k -> s, e -> i, insert g. The 57.1% comes from (1 - 3/7) * 100. For an inline marker view see character diff.
kitten --- sitting
Levenshtein distance: 3 Similarity: 57.1% Length A: 6 Length B: 7
Settings reference
| Behaviour | Effect on output |
|---|---|
| Separator | A line containing exactly --- splits text A from text B. |
| Algorithm | Standard Levenshtein with rolling-array dynamic programming. |
| Operations counted | Insert, delete, substitute (each cost 1). |
| Match rule | Exact codepoint equality, case sensitive. |
| Similarity formula | (1 - distance / max(len_A, len_B)) * 100, one decimal place. |
| Length unit | UTF-16 code units (JavaScript .length). |
| Empty side | Distance equals the length of the other side. |
FAQ
What counts as one edit?
How is similarity calculated?
(1 - distance / max(len_A, len_B)) * 100, rounded to one decimal place. So distance 3 against a 7-character string gives (1 - 3/7) * 100 = 57.1%.Is the comparison case sensitive?
Hello versus hello costs one substitution. Lowercase both sides first with lowercase if you want case-folded distance.How does this differ from Jaccard similarity?
How long are the inputs allowed to be?
O(n × m), so a 10,000 by 10,000 character compare takes noticeable time. For very long inputs prefer longest common substring or word-level similarity.