How to Validate a URL List Before You Import It

A list of URLs almost never arrives clean. It comes out of a CSV export, a support ticket, a copied web page, or a teammate's Markdown notes — and somewhere in those few hundred rows are entries that will quietly break your import, your redirect map, or your allowlist. The fix is to validate the list first, row by row, so you know exactly which lines are well-formed and which ones need a second look before they go anywhere.

The URL List Validator does that check entirely in your browser. You paste a list (or load a local text file), and it parses every line as a URL, marks each one pass or fail, and writes a reason beside every reject. Nothing is uploaded — the parsing runs locally with the File API.

What "valid" actually means here

This is the part people get wrong, so let me be precise about it. URL validation checks structure. A URL needs a scheme (like http or https), a host, and then an optional path, query, and fragment. The validator parses each line against those rules and flags the line when a required part is missing or malformed.

That means:

example.com gets flagged — it has no scheme, so the parser can't tell a host from a path.
htps://typo.com gets flagged — htps is not a real scheme.
http:// gets flagged — there's a scheme but no host.
A line with an illegal character in the host, or a port written as :99abc, gets flagged with the exact part named.

And here is the limit you have to keep in mind: a structurally valid URL can still 404. https://example.com/page-that-was-deleted parses perfectly and passes every check — it just doesn't exist anymore. This tool is a syntax filter, not a liveness check. It tells you a line is shaped like a URL. It does not open the link, resolve the DNS, or confirm anything on the other end responds. The manifest is blunt about this in its common-mistakes section: never treat URL validation as proof that the account, domain, or resource really exists. If you need reachability, that's a separate crawl step — validate the syntax here first so the crawler isn't wasting requests on garbage rows.

A worked example

Say you paste this list, copied from a few different sources:

https://toolora.info/en/t/url-list-validator/
example.com
htps://typo.com
http://
https://docs.example.com:8080/api?ref=blog
https://toolora.info/en/t/url-list-validator/

The validator parses each line and produces a report. In CSV form, with the reason column, you get something like:

value,line,valid,reason
https://toolora.info/en/t/url-list-validator/,1,true,OK
example.com,2,false,missing scheme
htps://typo.com,3,false,invalid scheme
http://,4,false,missing host
https://docs.example.com:8080/api?ref=blog,5,true,OK
https://toolora.info/en/t/url-list-validator/,6,true,duplicate

Three rows fail, and each one tells you why: a missing scheme, a bad scheme, a missing host. The two clean rows pass — including the one with a port and a query string, which are perfectly legal. And line 6 is a duplicate of line 1, which you can drop with the dedupe option or keep for the audit trail.

Notice the validator keeps line numbers. That's deliberate. When a reject says "line 3, invalid scheme," you can jump straight back to the source text and fix htps to https instead of guessing which of two hundred rows it meant.

Cleaning the list, not just judging it

Validation on its own gives you a verdict. The reason this tool is useful for real cleanup work is that it lets you act on that verdict in the same pass:

Keep unique rows only — drop duplicates that crept in from merging two exports.
Preserve invalid rows for review — instead of silently deleting the rejects, carry them out with their reasons so a teammate can repair them.
Sort the normalized output — so the same list always lands in a predictable order.
Switch the output format — CSV, JSON, Markdown, SQL IN, a TypeScript union, or plain lines.

That last point matters more than it sounds. If your clean list is going into a database query, you can export it as a SQL IN clause directly, with the quoting and commas already handled. If it's going into typed code, the TypeScript union is ready to paste. You're not hand-editing punctuation across hundreds of rows.

One more safeguard worth knowing: when the input contains sensitive patterns the tool recognizes — card numbers, JWTs — those values are masked in the output while you still get the validation signal. So a redirect map that happens to carry a token in a query string doesn't spill that token into a CSV you're about to share.

How I use it on a redirect map

I run a small site migration every few months, and the redirect map is always where things go sideways. The old URLs come from one CSV, the new targets from a spreadsheet someone filled in by hand, and the two never line up cleanly. The first thing I do now is paste both columns through the validator. Last time it flagged eleven rows: four were missing the https:// because the spreadsheet had stripped it, three had a stray space inside the host from a copy-paste, and the rest were honest typos in the scheme. Catching those before the redirect map went live saved me from a batch of broken 301s that would have looked fine until someone actually clicked them. The line numbers meant I fixed each one in under a minute.

Where this fits with the other text tools

URL validation is one step in a longer cleanup flow, and it pairs naturally with a few neighbors:

If you're starting from messy prose or HTML and need to pull the links out first, the URL Extractor does that lifting, and the HTML Link Extractor handles copied markup specifically.
Once your list is valid, the URL Normalizer can canonicalize trailing slashes and casing, and the URL Deduplicator collapses the repeats.
When you just need to reshape a clean list into another format, the URL List Converter handles the output gymnastics.

A reasonable order is extract, validate, normalize, dedupe, convert — each step trusting that the one before it did its job.

The short version

URL validation answers exactly one question: is this line shaped like a URL? It checks for a scheme, a host, and the optional pieces that follow, and it names the part that broke when one is missing. It does not check whether the link is alive. Use it as the syntax gate at the front of any link-cleaning job — paste the list, read the reasons, fix the rejects, and export the clean artifact. Then send the survivors to a reachability check if you need one.

Start with the URL List Validator and run your next messy list through it before it reaches anything that matters.

Made by Toolora · Updated 2026-06-13