Extract Words from Text Online

How word extraction works here

The pattern \b[\w']+\b splits on anything that is not a word character or a straight apostrophe. Letters A-Z a-z, digits 0-9, underscore and ' stay together; everything else (spaces, punctuation, dashes, em-dashes, smart quotes) is a token break. So "don't" is one token and "check-in" is two (check and in).

Smart curly apostrophes (’) are not in the character class, so a contraction written with smart punctuation (don’t) splits into don and t. To keep curly apostrophes attached, run find and replace first to swap ’ for ', or switch to extract regex matches with a pattern like [\w’']+.

Output is one token per line in source order. Duplicates are kept; for a unique vocabulary list pipe the result through remove duplicate lines. To count tokens, use word counter on the original text.

How to use extract words from text

1Paste your text into the input panel.
2The output panel shows every word, one per line.
3Click Copy to copy the list.
4Click Download to save it as a plain-text file.
5For a unique vocabulary list, send the result to remove duplicate lines.

Keyboard shortcuts

Drive TextResult without touching the mouse.

Shortcut	Action
`Ctrl` `F`	Open the find & replace panel inside the input Plus
`Ctrl` `Z`	Undo the last input change
`Ctrl` `Shift` `Z`	Redo
`Ctrl` `Shift` `Enter`	Toggle fullscreen focus on the editor Plus
`Esc`	Close find & replace, or exit fullscreen
`Ctrl` `K`	Open the command palette to jump to any tool Plus
`Ctrl` `S`	Save current workflow draft Plus
`Ctrl` `P`	Run a saved workflow Plus

What counts as a word here

Letters, digits, underscore, straight apostrophe

The character class is [\w']: ASCII letters, digits, underscore and '. So fox, 2026, order_id and don't all match as single tokens.

Punctuation and dashes split words

Hyphens, dashes, slashes, dots and other punctuation are token breaks. check-in splits to check and in; up/down splits to up and down.

Smart apostrophes split contractions

Curly ’ is not in the word class, so don’t splits to don and t. Replace with straight ' first if your text uses smart punctuation.

Accented and non-Latin letters split words

The ASCII word class does not include é, ñ, Cyrillic, Greek, CJK or any other non-ASCII letter. café splits to caf alone (the é is dropped). For Unicode-aware tokenisation, use extract regex matches with the pattern [\p{L}\p{N}']+ and the gu flag.

Order preserved, duplicates kept

Tokens appear in source order. Duplicates are not removed; for a unique list pipe the output through remove duplicate lines.

Worked example

don't stays as one token because the apostrophe is in the word class. check-in splits because the hyphen is not. ORDER-4821 splits the same way.

Input

The quick brown fox jumps over the lazy dog.
Don't worry: check-in at 5pm.
ORDER-4821 ships today.

Output

The
quick
brown
fox
jumps
over
the
lazy
dog
Don't
worry
check
in
at
5pm
ORDER
4821
ships
today

Settings reference

Behaviour	Effect on output
Word body characters	Letters `A-Z a-z`, digits `0-9`, underscore, straight `'`.
Hyphens and dashes	Token breaks. `check-in` splits.
Smart apostrophes	Token break. Replace `’` with `'` first to keep contractions intact.
Accented and Unicode letters	Not in the word class. `café` splits.
Punctuation and symbols	Token break. Dropped from output.
Order and duplicates	Source order kept, duplicates kept.

FAQ

Why is café coming out as caf?

The pattern uses the ASCII \w shorthand, which is letters, digits and underscore only. Accented letters and other Unicode letters split tokens. For full Unicode tokenisation, use extract regex matches with [\p{L}\p{N}']+ and the gu flag.

How do I keep contractions written with smart quotes?

Run find and replace on the input first to swap ’ for a straight ', then extract. Or use extract regex matches with [\w’']+.

Are duplicates removed?

No. Every token appears in source order. Pipe the result through remove duplicate lines for a unique vocabulary list.

How do I count words instead of listing them?

Use word counter on the original text. It uses the same word-boundary rule but returns a single number instead of a list.

Is anything sent to a server?

No. The match runs entirely in your browser.

Also known as

extract words word extractor split text into words word tokenizer list words from text word splitter pull words out of text word per line

Extract words from text

How word extraction works here

How to use extract words from text

Keyboard shortcuts

What counts as a word here

Letters, digits, underscore, straight apostrophe

Punctuation and dashes split words

Smart apostrophes split contractions

Accented and non-Latin letters split words

Order preserved, duplicates kept

Worked example

Settings reference

FAQ

Also known as

Explore another workspace

Text Formatting

Text Cleaning

Text Conversion

Find & Replace

Generators

Counters & Analysis

Encoding & Security

Text Extraction

Text Comparison

Text Styling