Skip to main content

URL Encoding vs Percent Encoding vs Form Encoding: One String, Three Different Outputs

Percent encoding, URL encoding, and form encoding sound interchangeable until a space character breaks your query string. I ran one real string through all three pipelines — here is exactly where they diverge and why.

Published By Lei Li
#url #encoding #web-development #http

URL Encoding vs Percent Encoding vs Form Encoding: One String, Three Different Outputs

Three terms, one family of escaping rules, and a surprising number of production bugs hiding in the gaps between them. "Percent encoding" is the mechanism, "URL encoding" is the everyday name for applying that mechanism to URLs, and "form encoding" is a separate serialization format that borrows the mechanism but changes one rule — and that one rule (the space character) is responsible for most of the confusion. Let's push a real string through each pipeline and look at the actual bytes.

The three names and what each one actually specifies

Percent encoding is the formal mechanism defined in RFC 3986 (2005): any byte that is not allowed to appear literally gets replaced by % followed by two hexadecimal digits. The RFC defines exactly 18 reserved characters — the gen-delims : / ? # [ ] @ and the sub-delims ! $ & ' ( ) * + , ; = — plus an unreserved set (letters, digits, -, ., _, ~) that must never be escaped. Everything outside those sets is encoded byte by byte.

URL encoding is not a separate standard. It is the informal name developers use for "percent-encode this thing so it survives inside a URL." When someone says "URL-encode the query value," they mean percent encoding applied with the right context: a path segment keeps / literal, a query value does not.

Form encoding means application/x-www-form-urlencoded, the default Content-Type an HTML <form method="post"> has used since the early 1990s and which the WHATWG URL Standard still specifies today. It reuses percent escapes for most characters but adds two changes: key-value pairs are joined as key=value&key=value, and — the famous one — a space becomes + instead of %20.

So the relationship is: percent encoding is the tool, URL encoding is the act of using it on URLs, and form encoding is a sibling format with one incompatible substitution.

One real string through all three pipelines

Take the string café & crème brûlée — two accented words, an ampersand, three spaces. I ran it through Node 20 with no libraries, just built-ins, and here is the verbatim output.

Percent encoding for a URL component (encodeURIComponent):

Input:  café & crème brûlée
Output: caf%C3%A9%20%26%20cr%C3%A8me%20br%C3%BBl%C3%A9e

Form encoding (new URLSearchParams({ q: 'café & crème brûlée' }).toString()):

Output: q=caf%C3%A9+%26+cr%C3%A8me+br%C3%BBl%C3%A9e

Whole-URL encoding (encodeURI on a full address):

Input:  https://example.com/menu?q=café & crème
Output: https://example.com/menu?q=caf%C3%A9%20&%20cr%C3%A8me

Read the three outputs against each other and every rule becomes visible. The é is %C3%A9 in all three, because UTF-8 bytes are escaped identically everywhere. The ampersand becomes %26 in the first two but stays literal in the third — encodeURI assumes you are encoding a complete URL, so it preserves the 18 reserved characters that give the URL its structure. And the spaces: %20 in percent encoding, + in form encoding. Same string, three legitimate, spec-compliant, mutually incompatible results.

That third output is also a live bug. The literal & inside the query means a server will parse it as a parameter separator and split q into garbage. encodeURI is almost never the function you want for values — it exists for already-assembled URLs, not for the pieces.

Why the space character carries so much history

The + convention predates RFC 3986 by over a decade — it comes from the original HTML forms implementation and was kept for backward compatibility ever since. The cost of that compatibility is asymmetry: a decoder that follows pure RFC 3986 will turn crème+brûlée back into the literal string crème+brûlée, plus sign included, while a form decoder will correctly produce the space. PHP even ships two functions for exactly this split: urlencode() produces + (form style) and rawurlencode() produces %20 (RFC 3986 style).

Size is the other practical consequence. Each escaped byte expands from 1 character to 3 (%XX), so a single CJK character — 3 bytes in UTF-8 — becomes 9 characters when percent-encoded, a 3× inflation. That matters because URL length is not unlimited: Internet Explorer historically capped URLs at 2,083 characters (Microsoft's documented limit in KB 208427), which works out to roughly 230 percent-encoded Chinese characters before the address itself overflows. Modern browsers go far higher, but CDNs and proxies still commonly enforce limits in the 4–8 KB range, and encoded non-ASCII text eats that budget three times faster than it looks.

The two bugs I keep seeing in code review

The first is double encoding. A value gets encoded once at the client, then a framework encodes it again on the way out, and %20 becomes %2520 (the % itself escaped as %25). The user sees caf%20%C3%A9 rendered literally in your UI. The fix is structural, not string-level: encode exactly once, at the boundary where the value enters the URL, and never store pre-encoded strings in your database.

The second is mixing decoders across the space divide. A backend receives a+b in a query string. Was that "a b" submitted by a form, or the literal expression "a+b" sent by an API client that correctly used %20 for spaces and left + alone? You cannot tell from the bytes — you have to know which convention the sender used. This is the single most common cause of "plus signs turning into spaces" tickets, and the defensive answer is to always send %20 and reserve + handling for actual form posts.

How I check encodings without writing a scratch script

When I debugged a webhook last month that kept corrupting C++ developer into C developer, I didn't reach for Node first. I pasted the raw query string into Toolora's URL encoder/decoder, toggled between component and full-URL modes, and the +-versus-%20 mismatch was visible in about ten seconds: the sender form-encoded, my decoder was RFC-strict. For a messy multi-parameter URL, the URL parser splits the address into scheme, host, path, and individual query pairs so you can see which specific value is mangled, and the query params extractor is faster when all I want is a clean table of keys and decoded values out of a 500-character tracking link.

The mental model that survives all the terminology: percent encoding is the byte-level escape rule, URL encoding is that rule applied with positional awareness inside a URL, and form encoding is the same rule wearing a + where its space should be. Pick the encoder that matches the decoder on the other end, encode exactly once, and the three names stop being interchangeable trivia and start being a checklist.


Made by Toolora · Updated 2026-06-12