Shannon entropy, computed character by character
Shannon entropy is H = -sum(p_i * log2 p_i), where p_i is the empirical probability of the i-th character. The tool builds the character distribution by tallying every character in the input, normalises to probabilities, and applies the formula. The unit is bits per character: an entropy of 4 means each character carries on average 4 bits of information given the rest of the distribution.
Typical values for English prose are 4.0 to 4.5 bits per character (with the lowercase alphabet plus space and common punctuation, the maximum is around 4.7 for uniform letters). Random ASCII (mixed case, digits, punctuation) approaches 6.5. Highly repetitive text (aaaaaa) approaches 0. Compressed binary or random bytes interpreted as text can exceed 7.5.
Scope picks the unit. Whole Text (default) reports one figure plus the unique- and total-character counts. Per Line reports Line N: H, one per input line. Per Paragraph reports Paragraph N: H, one per non-empty paragraph (paragraphs split on blank lines). Decimals controls how many digits the entropy is shown to (0 to 6, default 3).
How to use calculate shannon entropy of text
- 1Paste or type your text into the input panel on the left.
- 2The entropy figure appears in the output panel as you type.
- 3Switch Scope to
Per LineorPer Paragraphfor finer-grained measurement. - 4Set Decimals to widen or narrow the precision.
- 5Click Copy in the output header to copy the result.
Keyboard shortcuts
Drive TextResult without touching the mouse.
| Shortcut | Action |
|---|---|
| Ctrl F | Open the find & replace panel inside the input Plus |
| Ctrl Z | Undo the last input change |
| Ctrl Shift Z | Redo |
| Ctrl Shift Enter | Toggle fullscreen focus on the editor Plus |
| Esc | Close find & replace, or exit fullscreen |
| Ctrl K | Open the command palette to jump to any tool Plus |
| Ctrl S | Save current workflow draft Plus |
| Ctrl P | Run a saved workflow Plus |
What this tool actually does
Shannon entropy formula
H = -sum(p * log2 p) over every character in the input. The probability p for each character is its frequency divided by the total length. Empty input reports 0.
Character-level, code-unit-by-code-unit
The tally indexes by JavaScript string position (UTF-16 code units), so an emoji surrogate pair contributes two distinct characters to the distribution. For BMP-only Latin text this is equivalent to byte-level entropy on the corresponding UTF-16 stream.
Scope = Whole, Per Line, or Per Paragraph
Default Whole Text emits three lines: entropy, unique-characters, total-characters. Per Line emits Line N: H per input line. Per Paragraph splits on \n\n+ and emits Paragraph N: H per non-empty paragraph.
Decimals sets precision
Default 3 (e.g. 4.123). Range 0 to 6. Set higher to compare similar inputs; set lower for a clean integer.
Reads as a relative measure
Entropy is most useful for comparison: encrypted vs. plain, English vs. random, source A vs. source B. Absolute values depend on the alphabet and the unit, so do not over-interpret a single number in isolation.
Worked example
Forty-three characters across 17 distinct ones (the pangram has every letter of the alphabet plus space, but some letters repeat). Switch Scope to Per Line and the output collapses to a single Line 1: 4.099 entry.
the quick brown fox jumps over the lazy dog
Shannon entropy: 4.099 bits per character Unique characters: 17 Total characters: 43
Settings reference
| Option | Effect on output |
|---|---|
| Scope = Whole Text (default) | Three lines: entropy, unique-chars, total-chars. |
| Scope = Per Line | One Line N: H per input line. |
| Scope = Per Paragraph | One Paragraph N: H per non-empty paragraph. |
| Decimals (default 3) | Digits after the decimal point. 0 to 6. |
| Empty input | Reports Entropy: 0 bits/char. |
| Unit | Bits per character (log base 2). |
FAQ
What is a typical entropy for English text?
Does the tool count emoji as one character or two?
How do I score each paragraph separately?
Per Paragraph. Paragraphs are split on blank lines (\n\n+) and each gets its own line of output.