Email Validator: Why Valid Syntax Doesn't Mean a Real Inbox

The first time I trusted a single regex to validate email, I shipped a signup form that happily accepted test@test and rejected you+work@gmail.com. Both calls were wrong. That bug taught me the thing almost every "validate email" tutorial gets backwards: checking that an address looks right and confirming that an inbox exists are two completely different problems. A browser can do one of them well. The other needs a mail server. Confusing the two is how good addresses get blocked and dead ones slip through to your bounce report.

This guide walks through what an email validator can actually prove, where the limits are, and how to clean a real list without losing legitimate users.

Syntax validation vs. real existence

There are two layers to "is this email good?"

Syntax validation asks: is this string a structurally legal email address? Local part within length limits, a single @, a domain with valid labels and a real top-level domain. This runs instantly, offline, on a million rows. It catches typos, pasted garbage, and malformed exports.

Existence verification asks: does a mailbox actually receive mail there? Proving that requires a DNS MX lookup to find the domain's mail server, then an SMTP RCPT TO handshake to ask the server whether the address is deliverable. Browsers can't open raw TCP sockets, so this layer always needs a backend.

The honest split: syntax validation catches roughly 95% of what causes real bounces — typos, disposable signups, length-limit violations, broken spreadsheet paste. Existence verification covers the rest. The mistake is treating layer one as if it were layer two. ceo@yourcompany.com can pass every syntax check on earth and still bounce because that mailbox was deleted last quarter.

What RFC 5321 actually requires

The address format isn't folklore — it's specified. RFC 5321 (the SMTP standard) and RFC 5322 (message format) define the hard limits worth enforcing:

Local part (before the @): maximum 64 octets.
Full address: maximum 254 characters end to end.
Domain labels: letters, digits, and hyphens only (the LDH rule), no leading or trailing hyphen on any label.
Top-level domain: alphabetic — an all-numeric TLD like john@example.123 is not a valid hostname TLD.
Local part characters: more permissive than people expect. Beyond letters and digits, RFC 5322 atext legally allows `!#$%&'*/=?^_{|}~-`` plus dots — as long as a dot isn't leading, trailing, or doubled.

That last point is why + is legal. you+toolora@gmail.com is "subaddressing": it routes to you@gmail.com but lets you tag which service has your address. Gmail, Fastmail, ProtonMail, and iCloud all support it. Any validator that rejects + is throwing away real, deliverable users.

Why one regex can't validate email

People reach for a regex and assume the problem is solved. It isn't, and the standards bodies say so out loud. The W3C HTML5 spec ships a regex for <input type="email"> and explicitly calls it "a willful violation of RFC 5322" — deliberately too permissive to keep the UX simple.

A regex breaks down on several fronts:

It can't easily enforce the 254-character total or the 64-character local-part limit.
It misses domain labels with leading or trailing hyphens, doubled dots, and trailing-dot domains.
Most importantly, it returns a single boolean. When gmial.com fails, a regex can't tell you it failed because of a likely typo — it just says false.

A real validator returns a structured reason per address and, where it can, a suggested correction. That's the difference between a form that says "invalid email" and one that says "Did you mean gmail.com?" If you want to understand exactly how brittle pattern matching gets at this scale, building and testing patterns in a regex tester makes the failure modes obvious fast — you'll watch your "perfect" pattern reject legal addresses.

A real example

Here's a batch I ran through the tool — six addresses that look fine at a glance:

| Input | Result | Reason | |---|---|---| | you+toolora@gmail.com | Valid | Subaddressing is legal | | first.last@outlook.com | Valid | Dots in local part are fine | | sales@mailinator.com | Disposable | Throwaway provider | | test@test | Invalid | No TLD on the domain | | john@gmial.com | Typo | Did you mean gmail.com? | | a@b.com, | Invalid | Trailing comma from a CSV export |

Three of these six are problems, and none of them is the kind a quick eyeball catches. The trailing comma in the last row is the classic one — paste an Excel column and half your "errors" are invisible trailing spaces and commas that came along for the ride. Trim the source, or the row flags as malformed and you blame the wrong thing.

Bulk lists: dedupe and clean before you send

The single-address case is the easy one. The painful case is a 12,000-row signup export full of name@gmail,com, trailing spaces, and the occasional test@test. Pasting the whole column, sorting by status, and pulling only the clean rows is what keeps a launch email's bounce rate under 2% — above that, sending providers start flagging you as spam.

A few rules I follow on bulk lists:

Dedupe carefully. You@Gmail.com and you@gmail.com are the same mailbox; the domain is case-insensitive. But don't naively lowercase the local part for every provider — treat the domain case-insensitively and let the validator normalize.
Filter disposables explicitly. Addresses like @mailinator.com and @guerrillamail.com are syntactically perfect and SMTP-verifiable — only a known-provider list catches them. Most teams find 8–15% of free signups are throwaway domains.
Export the clean subset, then verify. Run the validated green rows through a paid existence check (NeverBounce, ZeroBounce) instead of the raw list. You stop paying to verify addresses you already know are malformed.

If your list arrives as a spreadsheet rather than a flat column, converting it first with a CSV to JSON tool gives you clean, structured rows you can paste back in without the stray delimiters that wreck a naive paste. Everything stays in your browser tab — no leaked list, signup CSV, or internal roster ever touches a server.

The takeaway

Validate syntax to catch the 95% — typos, disposables, length violations, broken paste — and run an MX/SMTP step only on the survivors when the send is large. Don't let one regex stand in for either layer; it's too permissive to be a gate and too dumb to explain itself. And before any big mailing, remember the rule that has saved me more than once: structurally valid is not the same as real.

Made by Toolora · Updated 2026-06-13