Frequency tables, configurable to the column
The tool tokenizes on \b[\w']+\b (any run of word characters, optionally with internal apostrophes) and counts occurrences. With Group Size = 1 you get a word-frequency table. With 2, the count is over consecutive word pairs (bigrams), so "the fox" and "fox is" are separate keys. The maximum group size is 6.
Pre-processing is governed by two toggles. Strip Punct (on by default) replaces .,;:!?'"()[]{}*@#$%^&+=`~/\|<>_- with spaces before tokenizing, so "fox." and "fox" collapse. Ignore Case (on) lowercases the input first, so The and the are the same key. Stop at . bounds groups inside sentences only; bigrams will not span across ., !, or ?.
Output formatting is also yours: Show Count prepends the count column, Show % prepends a percentage of total, Show Total appends a Total: N footer, and Sort picks By Count (default), Alphabetical, or Insertion Order. Output is capped at 200 entries to keep the panel readable. For just the unique words without counts, see find unique words.
How to use rank words by frequency
- 1Paste or type your text into the input panel on the left.
- 2The frequency table appears in the output panel, sorted by count.
- 3Switch Group Size to 2 for bigrams, 3 for trigrams.
- 4Toggle Show % or Show Total in the action bar to add columns.
- 5Switch Sort to
AlphabeticalorInsertion Orderif count-sorting is not what you want.
Keyboard shortcuts
Drive TextResult without touching the mouse.
| Shortcut | Action |
|---|---|
| Ctrl F | Open the find & replace panel inside the input Plus |
| Ctrl Z | Undo the last input change |
| Ctrl Shift Z | Redo |
| Ctrl Shift Enter | Toggle fullscreen focus on the editor Plus |
| Esc | Close find & replace, or exit fullscreen |
| Ctrl K | Open the command palette to jump to any tool Plus |
| Ctrl S | Save current workflow draft Plus |
| Ctrl P | Run a saved workflow Plus |
What this tool actually does
Tokenizes on \b[\w']+\b
Words are runs of \w with optional internal apostrophes. Hyphenated terms split on the hyphen here (different from word counter). Punctuation is stripped first when Strip Punct is on, which is the default.
Group Size 1 to 6 for n-grams
Default 1 = unigrams (word frequencies). 2 = bigrams (consecutive pairs). 3 = trigrams. Up to 6. Higher group sizes produce many more keys; the output cap of 200 entries still applies.
Stop at .: respect sentence boundaries
When on, the input is split on ., !, ? before n-gram extraction. A bigram will not include the last word of one sentence and the first word of the next. Off (default), n-grams span freely across sentences.
Strip Punct and Ignore Case
Strip Punct (default on) replaces a wide set of ASCII punctuation with spaces before tokenizing, so fox. and fox collapse into the same key. Ignore Case (default on) lowercases the input first, so The and the collapse. Turn either off to keep the original variants distinct.
Output columns and sort
Show Count (default on) writes a count column. Show % (default off) adds a percent-of-total column. Show Total (default off) adds a final Total: N line. Sort defaults to By Count; switch to Alphabetical or Insertion Order to change the row order. The output is tab-separated and capped at 200 rows.
Worked example
Tab-separated, sorted by count descending. the appears 3 times, quick and fox twice each. Switch Group Size to 2 and the table becomes bigrams: the quick, quick brown, etc.
the quick brown fox jumps over the lazy dog. the fox is quick.
3 the 2 quick 2 fox 1 brown 1 jumps 1 over 1 lazy 1 dog 1 is
Settings reference
| Option | Effect on output |
|---|---|
| Group Size (default 1) | n-gram length. 1 = words, 2 = bigrams, up to 6. |
| Stop at . (default off) | On: n-grams cannot span sentence boundaries (., !, ?). |
| Strip Punct (default on) | Replace ASCII punctuation with spaces before tokenizing. |
| Show Count (default on) | Prepend a count column to each row. |
| Show % (default off) | Prepend a percent-of-total column. |
| Show Total (default off) | Append a final Total: N line. |
| Sort (default By Count) | Row order: By Count, Alphabetical, or Insertion Order. |
| Ignore Case (default on) | On: The and the are one key. Off: separate keys. |
| Output cap | Top 200 rows shown. |
FAQ
How are words tokenized?
\b[\w']+\b (word chars with optional internal apostrophes). Hyphens are word boundaries here, so well-known splits into well and known.How do I get bigrams or trigrams?
Why are The and the grouped together?
Can I get percentages?
Total: N line.