Set difference on words
Word set difference computes A − B at the word level: every distinct word in text A whose lowercase form is not present in text B. Tokens are extracted with the regex \b[\w']+\b, so contractions such as don't stay intact and punctuation is stripped. Each token is lowercased before being placed in a JavaScript Set for O(1) lookup.
The split between text A and text B is a line containing exactly ---. Output is one word per line, alphabetically sorted via .sort(). Frequency is ignored; a word that appears ten times in A still contributes one entry to the output. To get the mirror direction (B − A), swap the halves around the separator.
For the shared vocabulary instead see word set intersection; for a single Jaccard similarity percentage see text similarity; for line-level rather than word-level use find unique lines.
How to use word set difference
- 1Paste text A into the input panel, then a line with
---, then paste text B. - 2The list of words in A but not B appears in the output panel, sorted alphabetically.
- 3Click Copy to copy the list, or Download to save it as a
.txtfile. - 4Swap A and B around the separator to get the mirror direction (words in B but not A).
- 5For shared words use word set intersection.
Keyboard shortcuts
Drive TextResult without touching the mouse.
| Shortcut | Action |
|---|---|
| Ctrl F | Open the find & replace panel inside the input Plus |
| Ctrl Z | Undo the last input change |
| Ctrl Shift Z | Redo |
| Ctrl Shift Enter | Toggle fullscreen focus on the editor Plus |
| Esc | Close find & replace, or exit fullscreen |
| Ctrl K | Open the command palette to jump to any tool Plus |
| Ctrl S | Save current workflow draft Plus |
| Ctrl P | Run a saved workflow Plus |
How the difference is computed
Set difference A minus B
Both sides are tokenised, lowercased, and stored in Set objects. The output walks set A and keeps any token not present in set B.
Word boundary tokenisation
The regex \b[\w']+\b extracts tokens that include letters, digits, underscores, and apostrophes. Punctuation is dropped, and contractions like don't stay intact.
Case folded, deduplicated
Tokens are lowercased before set insertion. Fox and fox count as the same word. Frequency is ignored; sets store each word once.
Alphabetically sorted output
The result is sorted with JavaScript's default string comparison so output is stable run to run. One word per line.
Three-hyphen separator
The split between A and B is a line containing exactly ---. Without it the tool returns a prompt asking for two halves.
Worked example
Three words (jumps, quick, the) appear in A but not B. brown and fox are shared, so they are excluded; they would show up via word set intersection instead.
the quick brown fox jumps --- a fast brown fox runs
jumps quick the
Settings reference
| Behaviour | Effect on output |
|---|---|
| Separator | A line containing exactly --- splits text A from text B. |
| Operation | Set difference A − B at word level. |
| Tokeniser | Regex \b[\w']+\b, lowercased before set insertion. |
| Case | Folded to lowercase. |
| Frequency | Ignored. Each word listed once regardless of how often it appears in A. |
| Order | Alphabetically sorted. |
| Mirror | Swap halves around --- to compute B − A. |
FAQ
How do I see the words unique to text B?
--- separator. The tool always returns A − B; flipping the input flips the direction.Are matches case sensitive?
Fox and fox count as the same word. Line diff is the right tool if you need to detect casing changes.