Word Set Difference

Set difference on words

Word set difference computes A − B at the word level: every distinct word in text A whose lowercase form is not present in text B. Tokens are extracted with the regex \b[\w']+\b, so contractions such as don't stay intact and punctuation is stripped. Each token is lowercased before being placed in a JavaScript Set for O(1) lookup.

The split between text A and text B is a line containing exactly ---. Output is one word per line, alphabetically sorted via .sort(). Frequency is ignored; a word that appears ten times in A still contributes one entry to the output. To get the mirror direction (B − A), swap the halves around the separator.

For the shared vocabulary instead see word set intersection; for a single Jaccard similarity percentage see text similarity; for line-level rather than word-level use find unique lines.

How to use word set difference

1Paste text A into the input panel, then a line with ---, then paste text B.
2The list of words in A but not B appears in the output panel, sorted alphabetically.
3Click Copy to copy the list, or Download to save it as a .txt file.
4Swap A and B around the separator to get the mirror direction (words in B but not A).
5For shared words use word set intersection.

Keyboard shortcuts

Drive TextResult without touching the mouse.

Shortcut	Action
`Ctrl` `F`	Open the find & replace panel inside the input Plus
`Ctrl` `Z`	Undo the last input change
`Ctrl` `Shift` `Z`	Redo
`Ctrl` `Shift` `Enter`	Toggle fullscreen focus on the editor Plus
`Esc`	Close find & replace, or exit fullscreen
`Ctrl` `K`	Open the command palette to jump to any tool Plus
`Ctrl` `S`	Save current workflow draft Plus
`Ctrl` `P`	Run a saved workflow Plus

How the difference is computed

Set difference A minus B

Both sides are tokenised, lowercased, and stored in Set objects. The output walks set A and keeps any token not present in set B.

Word boundary tokenisation

The regex \b[\w']+\b extracts tokens that include letters, digits, underscores, and apostrophes. Punctuation is dropped, and contractions like don't stay intact.

Case folded, deduplicated

Tokens are lowercased before set insertion. Fox and fox count as the same word. Frequency is ignored; sets store each word once.

Alphabetically sorted output

The result is sorted with JavaScript's default string comparison so output is stable run to run. One word per line.

Three-hyphen separator

The split between A and B is a line containing exactly ---. Without it the tool returns a prompt asking for two halves.

Worked example

Three words (jumps, quick, the) appear in A but not B. brown and fox are shared, so they are excluded; they would show up via word set intersection instead.

Input

the quick brown fox jumps
---
a fast brown fox runs

Output

jumps
quick
the

Settings reference

Behaviour	Effect on output
Separator	A line containing exactly `---` splits text A from text B.
Operation	Set difference A − B at word level.
Tokeniser	Regex `\b[\w']+\b`, lowercased before set insertion.
Case	Folded to lowercase.
Frequency	Ignored. Each word listed once regardless of how often it appears in A.
Order	Alphabetically sorted.
Mirror	Swap halves around `---` to compute B − A.

FAQ

How do I see the words unique to text B?

Swap the halves around the --- separator. The tool always returns A − B; flipping the input flips the direction.

Are matches case sensitive?

No. Tokens are lowercased before set insertion, so Fox and fox count as the same word. Line diff is the right tool if you need to detect casing changes.

Does it count word frequency?

No. Both sides are sets, so each unique word from A is listed once if it does not appear in B at all. Frequency on either side is irrelevant to inclusion.

How is this different from find unique lines?

Find unique lines works on whole lines, exact match, in original order. Word set difference works on individual words, lowercased, alphabetically sorted. Different granularity.

Where does my text go?

Nowhere. Everything happens locally in your browser via JavaScript.

Also known as

word set difference words only in first text unique vocabulary finder compare word lists word subtraction A minus B words distinctive words tool vocabulary diff

Set difference on words

How to use word set difference

Keyboard shortcuts

How the difference is computed

Set difference A minus B

Word boundary tokenisation

Case folded, deduplicated

Alphabetically sorted output

Three-hyphen separator

Worked example

Settings reference

FAQ

Also known as

Explore another workspace

Text Formatting

Text Cleaning

Text Conversion

Find & Replace

Generators

Counters & Analysis

Encoding & Security

Text Extraction

Text Comparison

Text Styling