Word set intersection

Word set intersection lists every distinct word that appears in both text A and text B. Paste both into the input pane separated by a line of ---; the tool tokenises each side, lowercases the tokens, deduplicates them, and emits the words found in both sets, alphabetically sorted. The compute runs in your browser; nothing uploads. For a single similarity percentage see text similarity.

Input
Line 1:1 LF cloud_done Saved locally
Result Word Set Intersection
0 lines 0 chars

Set intersection on words

Word set intersection treats each text as a bag of distinct words and returns the bag of words shared between them. Tokens are extracted with the regex \b[\w']+\b, which keeps letters, digits, underscores, and apostrophes. Each token is lowercased before being placed in a JavaScript Set; that handles deduplication for free. The intersection is computed by walking set A and keeping any token also in set B.

The split between text A and text B is a line containing exactly ---. Output is one word per line, alphabetically sorted. Frequency is ignored; a word that appears five times in A and twice in B contributes one line to the output. Punctuation is stripped during tokenisation so commas, full stops, and brackets do not interfere.

For the words in A but not B see word set difference; for whole-line overlap rather than word-level use find common lines; for a single Jaccard percentage instead of the actual word list see text similarity.

How to use word set intersection

  1. 1Paste text A into the input panel, then a line with ---, then paste text B.
  2. 2The shared word list appears in the output panel, one word per line, sorted alphabetically.
  3. 3Click Copy to copy the list, or Download to save it as a .txt file.
  4. 4For words unique to one side, pivot to word set difference.
  5. 5For a single overlap percentage use text similarity.

Keyboard shortcuts

Drive TextResult without touching the mouse.

Shortcut Action
Ctrl FOpen the find & replace panel inside the input Plus
Ctrl ZUndo the last input change
Ctrl Shift ZRedo
Ctrl Shift EnterToggle fullscreen focus on the editor Plus
EscClose find & replace, or exit fullscreen
Ctrl KOpen the command palette to jump to any tool Plus
Ctrl SSave current workflow draft Plus
Ctrl PRun a saved workflow Plus

How the intersection is computed

Word boundary tokenisation

Tokens are extracted with the regex \b[\w']+\b. That picks up letters, digits, underscores, and apostrophes. don't stays as a single token; punctuation is dropped.

Case folded, deduplicated

Every token is lowercased before being added to the set. Fox and fox count as the same word. Frequency is ignored; sets store each word once.

Alphabetically sorted output

After computing the intersection the tool calls .sort(), so the words come out in standard JavaScript lexicographic order. That makes results stable across runs.

Set intersection algorithm

Both sides are converted to Set objects for O(1) lookup. The intersection walks set A and keeps any token also present in set B.

Three-hyphen separator

The split between A and B is a line containing exactly ---. The tool needs this marker to know where text A ends and text B begins.

Worked example

Two words (brown and fox) appear in both texts. the, quick, jumps, a, fast, and runs appear in only one side. For the words in A but not B see word set difference.

Input
the quick brown fox jumps
---
a fast brown fox runs
Output
brown
fox

Settings reference

Behaviour Effect on output
Separator A line containing exactly --- splits text A from text B.
Tokeniser Regex \b[\w']+\b, lowercased before set insertion.
Case Folded to lowercase.
Frequency Ignored. Each word listed once regardless of how often it appears.
Order Alphabetically sorted via JavaScript .sort().
Punctuation Dropped during tokenisation.
Empty input Empty output (no shared words).

FAQ

Why is don't kept as one word?
The regex \b[\w']+\b includes the apostrophe in the token character class, so contractions stay intact. don't, it's, and can't all behave as single tokens.
Are matches case sensitive?
No. Tokens are lowercased before going into the set, so Fox and fox count as the same word. Use line diff if you need to detect casing changes.
Does it count word frequency?
No. The output is a set, so each shared word appears once regardless of how many times it appears on either side. For a Jaccard similarity that uses the same word sets see text similarity.
How is this different from find common lines?
Find common lines works on whole lines, exact match, ordered. Word set intersection works on individual words, lowercased and sorted. Different granularity.
Where does my text go?
Nowhere. Everything happens locally in your browser via JavaScript.