Extract URLs from Text Online

How URL matching works here

The pattern starts at http:// or https:// and grabs every following character that is not whitespace, not an angle-bracket and not a single or double quote. That stops the match at sensible boundaries: a space, a line break, the end of an HTML attribute, or the start of a quoted string. Query strings (?ref=docs), hashes (#install) and ports (:8443) are part of the match.

Schemeless URLs (example.com/page) are not captured. The matcher requires http:// or https:// so you do not get false positives from filenames, version numbers or domain-only mentions. Other schemes like ftp://, mailto: and tel: are also skipped; for those, use extract regex matches with a custom pattern.

Output is one URL per line in the order they appear. Trailing punctuation that sits next to a URL inside a sentence (a full stop, a closing bracket) can sometimes be included if it is not whitespace or a quote; in that case run find and replace on the result to trim the tail.

How to use extract urls from text

1Paste text containing URLs into the input panel.
2The output panel shows every http or https URL, one per line.
3Click Copy to copy the list.
4Click Download to save it as a plain-text file.
5Pipe the result through remove duplicate lines if you want unique URLs only.

Keyboard shortcuts

Drive TextResult without touching the mouse.

Shortcut	Action
`Ctrl` `F`	Open the find & replace panel inside the input Plus
`Ctrl` `Z`	Undo the last input change
`Ctrl` `Shift` `Z`	Redo
`Ctrl` `Shift` `Enter`	Toggle fullscreen focus on the editor Plus
`Esc`	Close find & replace, or exit fullscreen
`Ctrl` `K`	Open the command palette to jump to any tool Plus
`Ctrl` `S`	Save current workflow draft Plus
`Ctrl` `P`	Run a saved workflow Plus

What counts as a URL here

Scheme is required

The match starts at http:// or https://. Schemeless mentions like example.com are not captured. This avoids false positives on file names and version strings.

Query strings, hashes and ports preserved

?utm_source=site, #section-2 and :8443 are part of the URL and are kept. So https://example.com:8443/path?q=1#top is one match, not three.

Stops at whitespace, angle-brackets and quotes

The character class [^\s<>"'] halts the match at a space, tab, newline, <, >, single or double quote. So URLs inside href="...", plain prose, and angle-bracketed (<https://example.com>) all match cleanly.

Other schemes skipped

ftp://, file://, mailto:, tel: and protocol-relative //host/path are not matched. For those, use extract regex matches with a custom pattern.

Order preserved, duplicates kept

URLs appear in the order they are found. Duplicates are not removed; for a unique list pipe the output into remove duplicate lines.

Worked example

The URL inside href="..." stops at the closing quote. The trailing full stop on prose URLs is included because it is not whitespace; trim it with find and replace if needed.

Input

Visit https://example.com and https://example.org/page?ref=docs.
Mirror at https://example.net:8443/v2#install.
Docs: <a href="https://example.com/help">help</a>.

Output

https://example.com
https://example.org/page?ref=docs.
https://example.net:8443/v2#install.
https://example.com/help

Settings reference

Behaviour	Effect on output
Scheme	`http://` or `https://` required.
Other schemes	`ftp://`, `mailto:`, `tel:`, `//host/...` are skipped.
Path, query, hash, port	All preserved as part of the match.
Stop characters	Whitespace, `<`, `>`, single quote, double quote.
Trailing punctuation	A full stop or closing bracket sitting next to a URL may be included if not whitespace or quote.
Order and duplicates	Matches appear in source order. Duplicates are kept.

FAQ

Will example.com without http match?

No. The pattern requires http:// or https://. This avoids false positives on filenames (config.prod.json), version numbers and casual domain mentions. To match schemeless hosts, use extract regex matches with a pattern like \b(?:[\w-]+\.)+[a-z]{2,}(?:/\S*)?.

How do I get just the host name from each URL?

Extract first to get the full URLs, then run the result through regex replace with pattern https?:\/\/([^\/]+).* and replacement $1 to keep only the host.

Why is a full stop included on the end of some URLs?

The match stops at whitespace, angle-brackets and quotes, but not at full stops or closing brackets, because those characters are valid URL characters. If a sentence ends with a URL followed by ., the dot may be captured. Run find and replace on the output to trim trailing punctuation.

Are duplicates removed?

No. Every URL is listed in source order, duplicates included. Pipe the result through remove duplicate lines for a unique list.

Is anything sent to a server?

No. The match runs entirely in your browser. Nothing uploads, nothing is logged.

Also known as

extract urls url extractor pull links from text find urls in text link extractor list urls in text http url scraper get links from text

Extract URLs from text

How URL matching works here

How to use extract urls from text

Keyboard shortcuts

What counts as a URL here

Scheme is required

Query strings, hashes and ports preserved

Stops at whitespace, angle-brackets and quotes

Other schemes skipped

Order preserved, duplicates kept

Worked example

Settings reference

FAQ

Also known as

Explore another workspace

Text Formatting

Text Cleaning

Text Conversion

Find & Replace

Generators

Counters & Analysis

Encoding & Security

Text Extraction

Text Comparison

Text Styling