How HTML extraction works here
The tool runs new DOMParser().parseFromString(html, "text/html") and reads document.body.textContent. That gives you exactly what the browser would render as text, minus the styling. Entities like &, < and — are decoded to their characters. Script and style content is included by default because textContent walks every node; remove those upstream if needed.
The BR to Newline toggle replaces every <br> tag with a literal newline before parsing. Without it, <br> tags vanish silently because they have no text content. With it, you keep the visible line breaks from the original markup.
The Collapse WS toggle (on by default) flattens runs of whitespace (spaces, tabs, newlines) to a single space and trims each line. Switch it off to keep the raw whitespace from the source HTML, including the indentation around tags. Compared with strip HTML tags (regex-based), this tool decodes entities, walks nested tags correctly, and follows the browser's actual parsing rules.
How to use extract text from html
- 1Paste the HTML markup into the input panel.
- 2The plain-text content appears in the output panel.
- 3Toggle BR to Newline in the action bar to keep
<br>line breaks. - 4Toggle Collapse WS off if you want the raw whitespace from the source.
- 5Click Copy or Download to save the result.
Keyboard shortcuts
Drive TextResult without touching the mouse.
| Shortcut | Action |
|---|---|
| Ctrl F | Open the find & replace panel inside the input Plus |
| Ctrl Z | Undo the last input change |
| Ctrl Shift Z | Redo |
| Ctrl Shift Enter | Toggle fullscreen focus on the editor Plus |
| Esc | Close find & replace, or exit fullscreen |
| Ctrl K | Open the command palette to jump to any tool Plus |
| Ctrl S | Save current workflow draft Plus |
| Ctrl P | Run a saved workflow Plus |
What this tool actually does
DOMParser-based, not regex
The browser builds a real DOM tree from the HTML and the tool reads textContent. Nested tags, malformed markup and self-closing elements are all handled the way your browser handles them. Compare with strip HTML tags, which is regex-based and faster on small snippets.
HTML entities decoded
Named entities (&, <, ©), numeric entities (—) and hex entities (—) are all decoded to their characters. Regex-based strippers leave entities in place; this tool does not.
BR to Newline toggle
Off (default): <br> tags vanish because textContent ignores them. On: every <br> in the source is replaced by a newline before parsing, so visible line breaks come through.
Collapse WS toggle
On (default): runs of whitespace are flattened to single spaces and lines are trimmed. Off: the raw whitespace from the source HTML is preserved, including indentation around tags. With both BR to Newline and Collapse WS on, each line is collapsed individually so you keep the line breaks but lose the per-line indentation.
Script and style content included
Because textContent walks every node, the bodies of <script> and <style> tags come through as text. Strip those tags upstream with find and replace if you only want the visible body copy.
Worked example
With Collapse WS on (default) and BR to Newline on, you get one tidy line per <br> break. Notice & decoded to &, and the href attribute did not appear in the output.
<p>Hello <strong>world</strong>!</p> <p>Line one.<br>Line two.</p> <a href="https://example.com">Visit & say hi</a>
Hello world! Line one. Line two. Visit & say hi
Settings reference
| Setting | Effect on output |
|---|---|
| BR to Newline off (default) | <br> tags vanish. Line one and two run together unless other whitespace separates them. |
| BR to Newline on | Every <br> becomes a literal newline before parsing. |
| Collapse WS on (default) | Runs of whitespace (spaces, tabs, newlines) flatten to a single space; lines are trimmed. |
| Collapse WS off | Raw whitespace from the source is kept, including indentation between tags. |
| HTML entities | Always decoded. & becomes &. |
| Script and style bodies | Included as text. Remove tags upstream if you do not want their content. |
FAQ
How is this different from strip HTML tags?
< and >. It is fast for tiny snippets but does not decode entities and can choke on nested or malformed markup. This tool uses the browser's real DOMParser, which builds a proper DOM, decodes entities, and produces output identical to what the browser would render as text.Why does my <br> tag disappear?
<br> elements have no text content, the DOM walker skips them. Turn on BR to Newline in the action bar to swap each <br> for a literal newline before parsing.Are <script> and <style> bodies removed?
textContent walks every node, so script and style code comes through as text. If you only want the visible body copy, run find and replace with regex first to delete <script[^>]*>[\s\S]*?</script> and the equivalent for style, then paste the result here.Are HTML entities decoded?
&), numeric (—) and hex (—) entities all decode to their characters because the browser does it for you when it builds the DOM.Is anything sent to a server?
DOMParser runs entirely in your browser. Nothing uploads, nothing is logged.