How URL matching works here
The pattern starts at http:// or https:// and grabs every following character that is not whitespace, not an angle-bracket and not a single or double quote. That stops the match at sensible boundaries: a space, a line break, the end of an HTML attribute, or the start of a quoted string. Query strings (?ref=docs), hashes (#install) and ports (:8443) are part of the match.
Schemeless URLs (example.com/page) are not captured. The matcher requires http:// or https:// so you do not get false positives from filenames, version numbers or domain-only mentions. Other schemes like ftp://, mailto: and tel: are also skipped; for those, use extract regex matches with a custom pattern.
Output is one URL per line in the order they appear. Trailing punctuation that sits next to a URL inside a sentence (a full stop, a closing bracket) can sometimes be included if it is not whitespace or a quote; in that case run find and replace on the result to trim the tail.
How to use extract urls from text
- 1Paste text containing URLs into the input panel.
- 2The output panel shows every
httporhttpsURL, one per line. - 3Click Copy to copy the list.
- 4Click Download to save it as a plain-text file.
- 5Pipe the result through remove duplicate lines if you want unique URLs only.
Keyboard shortcuts
Drive TextResult without touching the mouse.
| Shortcut | Action |
|---|---|
| Ctrl F | Open the find & replace panel inside the input Plus |
| Ctrl Z | Undo the last input change |
| Ctrl Shift Z | Redo |
| Ctrl Shift Enter | Toggle fullscreen focus on the editor Plus |
| Esc | Close find & replace, or exit fullscreen |
| Ctrl K | Open the command palette to jump to any tool Plus |
| Ctrl S | Save current workflow draft Plus |
| Ctrl P | Run a saved workflow Plus |
What counts as a URL here
Scheme is required
The match starts at http:// or https://. Schemeless mentions like example.com are not captured. This avoids false positives on file names and version strings.
Query strings, hashes and ports preserved
?utm_source=site, #section-2 and :8443 are part of the URL and are kept. So https://example.com:8443/path?q=1#top is one match, not three.
Stops at whitespace, angle-brackets and quotes
The character class [^\s<>"'] halts the match at a space, tab, newline, <, >, single or double quote. So URLs inside href="...", plain prose, and angle-bracketed (<https://example.com>) all match cleanly.
Other schemes skipped
ftp://, file://, mailto:, tel: and protocol-relative //host/path are not matched. For those, use extract regex matches with a custom pattern.
Order preserved, duplicates kept
URLs appear in the order they are found. Duplicates are not removed; for a unique list pipe the output into remove duplicate lines.
Worked example
The URL inside href="..." stops at the closing quote. The trailing full stop on prose URLs is included because it is not whitespace; trim it with find and replace if needed.
Visit https://example.com and https://example.org/page?ref=docs. Mirror at https://example.net:8443/v2#install. Docs: <a href="https://example.com/help">help</a>.
https://example.com https://example.org/page?ref=docs. https://example.net:8443/v2#install. https://example.com/help
Settings reference
| Behaviour | Effect on output |
|---|---|
| Scheme | http:// or https:// required. |
| Other schemes | ftp://, mailto:, tel:, //host/... are skipped. |
| Path, query, hash, port | All preserved as part of the match. |
| Stop characters | Whitespace, <, >, single quote, double quote. |
| Trailing punctuation | A full stop or closing bracket sitting next to a URL may be included if not whitespace or quote. |
| Order and duplicates | Matches appear in source order. Duplicates are kept. |
FAQ
Will example.com without http match?
http:// or https://. This avoids false positives on filenames (config.prod.json), version numbers and casual domain mentions. To match schemeless hosts, use extract regex matches with a pattern like \b(?:[\w-]+\.)+[a-z]{2,}(?:/\S*)?.How do I get just the host name from each URL?
https?:\/\/([^\/]+).* and replacement $1 to keep only the host.Why is a full stop included on the end of some URLs?
., the dot may be captured. Run find and replace on the output to trim trailing punctuation.