Word Set Intersection

Set intersection on words

Word set intersection treats each text as a bag of distinct words and returns the bag of words shared between them. Tokens are extracted with the regex \b[\w']+\b, which keeps letters, digits, underscores, and apostrophes. Each token is lowercased before being placed in a JavaScript Set; that handles deduplication for free. The intersection is computed by walking set A and keeping any token also in set B.

The split between text A and text B is a line containing exactly ---. Output is one word per line, alphabetically sorted. Frequency is ignored; a word that appears five times in A and twice in B contributes one line to the output. Punctuation is stripped during tokenisation so commas, full stops, and brackets do not interfere.

For the words in A but not B see word set difference; for whole-line overlap rather than word-level use find common lines; for a single Jaccard percentage instead of the actual word list see text similarity.

How to use word set intersection

1Paste text A into the input panel, then a line with ---, then paste text B.
2The shared word list appears in the output panel, one word per line, sorted alphabetically.
3Click Copy to copy the list, or Download to save it as a .txt file.
4For words unique to one side, pivot to word set difference.
5For a single overlap percentage use text similarity.

Keyboard shortcuts

Drive TextResult without touching the mouse.

Shortcut	Action
`Ctrl` `F`	Open the find & replace panel inside the input Plus
`Ctrl` `Z`	Undo the last input change
`Ctrl` `Shift` `Z`	Redo
`Ctrl` `Shift` `Enter`	Toggle fullscreen focus on the editor Plus
`Esc`	Close find & replace, or exit fullscreen
`Ctrl` `K`	Open the command palette to jump to any tool Plus
`Ctrl` `S`	Save current workflow draft Plus
`Ctrl` `P`	Run a saved workflow Plus

How the intersection is computed

Word boundary tokenisation

Tokens are extracted with the regex \b[\w']+\b. That picks up letters, digits, underscores, and apostrophes. don't stays as a single token; punctuation is dropped.

Case folded, deduplicated

Every token is lowercased before being added to the set. Fox and fox count as the same word. Frequency is ignored; sets store each word once.

Alphabetically sorted output

After computing the intersection the tool calls .sort(), so the words come out in standard JavaScript lexicographic order. That makes results stable across runs.

Set intersection algorithm

Both sides are converted to Set objects for O(1) lookup. The intersection walks set A and keeps any token also present in set B.

Three-hyphen separator

The split between A and B is a line containing exactly ---. The tool needs this marker to know where text A ends and text B begins.

Worked example

Two words (brown and fox) appear in both texts. the, quick, jumps, a, fast, and runs appear in only one side. For the words in A but not B see word set difference.

Input

the quick brown fox jumps
---
a fast brown fox runs

Output

brown
fox

Settings reference

Behaviour	Effect on output
Separator	A line containing exactly `---` splits text A from text B.
Tokeniser	Regex `\b[\w']+\b`, lowercased before set insertion.
Case	Folded to lowercase.
Frequency	Ignored. Each word listed once regardless of how often it appears.
Order	Alphabetically sorted via JavaScript `.sort()`.
Punctuation	Dropped during tokenisation.
Empty input	Empty output (no shared words).

FAQ

Why is don't kept as one word?

The regex \b[\w']+\b includes the apostrophe in the token character class, so contractions stay intact. don't, it's, and can't all behave as single tokens.

Are matches case sensitive?

No. Tokens are lowercased before going into the set, so Fox and fox count as the same word. Use line diff if you need to detect casing changes.

Does it count word frequency?

No. The output is a set, so each shared word appears once regardless of how many times it appears on either side. For a Jaccard similarity that uses the same word sets see text similarity.

How is this different from find common lines?

Find common lines works on whole lines, exact match, ordered. Word set intersection works on individual words, lowercased and sorted. Different granularity.

Where does my text go?

Nowhere. Everything happens locally in your browser via JavaScript.

Also known as

word set intersection shared words two texts common words finder compare vocabulary word overlap list shared vocabulary tool words in both texts word intersection online

Set intersection on words

How to use word set intersection

Keyboard shortcuts

How the intersection is computed

Word boundary tokenisation

Case folded, deduplicated

Alphabetically sorted output

Set intersection algorithm

Three-hyphen separator

Worked example

Settings reference

FAQ

Also known as

Explore another workspace

Text Formatting

Text Cleaning

Text Conversion

Find & Replace

Generators

Counters & Analysis

Encoding & Security

Text Extraction

Text Comparison

Text Styling