Word set difference

Word set difference lists every distinct word in text A that does not appear in text B. Paste both into the input pane separated by a line of ---; the tool tokenises each side, lowercases the tokens, builds two sets, and emits the words present in A but missing from B, alphabetically sorted. The compute runs in your browser; nothing uploads. For shared words see word set intersection.

Input
Line 1:1 LF cloud_done Saved locally
Result Word Set Difference
0 lines 0 chars

Set difference on words

Word set difference computes A − B at the word level: every distinct word in text A whose lowercase form is not present in text B. Tokens are extracted with the regex \b[\w']+\b, so contractions such as don't stay intact and punctuation is stripped. Each token is lowercased before being placed in a JavaScript Set for O(1) lookup.

The split between text A and text B is a line containing exactly ---. Output is one word per line, alphabetically sorted via .sort(). Frequency is ignored; a word that appears ten times in A still contributes one entry to the output. To get the mirror direction (B − A), swap the halves around the separator.

For the shared vocabulary instead see word set intersection; for a single Jaccard similarity percentage see text similarity; for line-level rather than word-level use find unique lines.

How to use word set difference

  1. 1Paste text A into the input panel, then a line with ---, then paste text B.
  2. 2The list of words in A but not B appears in the output panel, sorted alphabetically.
  3. 3Click Copy to copy the list, or Download to save it as a .txt file.
  4. 4Swap A and B around the separator to get the mirror direction (words in B but not A).
  5. 5For shared words use word set intersection.

Keyboard shortcuts

Drive TextResult without touching the mouse.

Shortcut Action
Ctrl FOpen the find & replace panel inside the input Plus
Ctrl ZUndo the last input change
Ctrl Shift ZRedo
Ctrl Shift EnterToggle fullscreen focus on the editor Plus
EscClose find & replace, or exit fullscreen
Ctrl KOpen the command palette to jump to any tool Plus
Ctrl SSave current workflow draft Plus
Ctrl PRun a saved workflow Plus

How the difference is computed

Set difference A minus B

Both sides are tokenised, lowercased, and stored in Set objects. The output walks set A and keeps any token not present in set B.

Word boundary tokenisation

The regex \b[\w']+\b extracts tokens that include letters, digits, underscores, and apostrophes. Punctuation is dropped, and contractions like don't stay intact.

Case folded, deduplicated

Tokens are lowercased before set insertion. Fox and fox count as the same word. Frequency is ignored; sets store each word once.

Alphabetically sorted output

The result is sorted with JavaScript's default string comparison so output is stable run to run. One word per line.

Three-hyphen separator

The split between A and B is a line containing exactly ---. Without it the tool returns a prompt asking for two halves.

Worked example

Three words (jumps, quick, the) appear in A but not B. brown and fox are shared, so they are excluded; they would show up via word set intersection instead.

Input
the quick brown fox jumps
---
a fast brown fox runs
Output
jumps
quick
the

Settings reference

Behaviour Effect on output
Separator A line containing exactly --- splits text A from text B.
Operation Set difference A − B at word level.
Tokeniser Regex \b[\w']+\b, lowercased before set insertion.
Case Folded to lowercase.
Frequency Ignored. Each word listed once regardless of how often it appears in A.
Order Alphabetically sorted.
Mirror Swap halves around --- to compute B − A.

FAQ

How do I see the words unique to text B?
Swap the halves around the --- separator. The tool always returns A − B; flipping the input flips the direction.
Are matches case sensitive?
No. Tokens are lowercased before set insertion, so Fox and fox count as the same word. Line diff is the right tool if you need to detect casing changes.
Does it count word frequency?
No. Both sides are sets, so each unique word from A is listed once if it does not appear in B at all. Frequency on either side is irrelevant to inclusion.
How is this different from find unique lines?
Find unique lines works on whole lines, exact match, in original order. Word set difference works on individual words, lowercased, alphabetically sorted. Different granularity.
Where does my text go?
Nowhere. Everything happens locally in your browser via JavaScript.