URL Percent-Encoding Explained: Query Strings, Form Data, and RFC 3986 Mistakes
A practical guide to URL percent-encoding: when to use %20 vs +, how RFC 3986 defines safe characters, real encode/decode examples, and the mistakes that silently break APIs.
URL Percent-Encoding Explained: Query Strings, Form Data, and RFC 3986 Mistakes
If you have ever debugged a broken redirect or watched an API return a 400 because a space character slipped through unencoded, you already know that URL percent-encoding is one of those topics that looks trivial and then bites you on a Friday afternoon. This guide covers how the encoding scheme actually works, where the rules differ between query strings and form data, and the specific mistakes that trip up even experienced developers.
What Percent-Encoding Actually Is
A URL can only safely carry a defined set of characters. RFC 3986 — the 2005 standard that still governs URI syntax — calls these the unreserved characters: A–Z, a–z, 0–9, and the four symbols -, _, ., ~. Everything else must be encoded before it appears in a URL component.
Percent-encoding replaces a character with a % followed by its two-digit hexadecimal byte value. The percent sign itself has ASCII code 0x25 — which is where the name comes from. So a space (0x20) becomes %20, an at-sign (0x40) becomes %40, and a slash (0x2F) becomes %2F.
For non-ASCII text, the byte values come from the UTF-8 encoding of the character, not its Unicode code point. A single Chinese character like 中 requires 3 UTF-8 bytes (0xE4, 0xB8, 0xAD), so it encodes to %E4%B8%AD — nine characters for one glyph. A four-byte emoji like 😀 (U+1F600, UTF-8: 0xF0 0x9F 0x98 0x80) expands to %F0%9F%98%80, turning one character into twelve. That 12× expansion is why long strings of CJK text or emoji in URLs look so unwieldy.
The Two Encoding Functions You Should Know in JavaScript
JavaScript gives you two built-in options, and picking the wrong one is the most common source of encoding bugs.
encodeURI(url) leaves reserved characters (:, /, ?, #, @, &, =, +, $, ,) untouched because those characters have structural meaning in a complete URL. It only encodes characters that should never appear unescaped anywhere.
encodeURIComponent(value) is aggressive: it encodes every character except the unreserved set. This is what you want for individual query parameter values.
I tested both against the same string to make the difference concrete:
Input: "https://example.com/search?q=São Paulo&lang=pt"
encodeURI:
"https://example.com/search?q=S%C3%A3o%20Paulo&lang=pt"
(notice: ?, &, = are preserved — they structure the URL)
encodeURIComponent:
"https%3A%2F%2Fexample.com%2Fsearch%3Fq%3DS%C3%A3o%20Paulo%26lang%3Dpt"
(everything structural is encoded — correct only for embedding this URL inside another URL's parameter)
A redirect parameter is a classic use case for encodeURIComponent: /login?next=https%3A%2F%2Fapp.example.com%2Fdashboard. If you used encodeURI there, the ? and & inside the redirect value would break the outer URL's query string.
To encode a query string properly with multiple parameters, call encodeURIComponent on each value (and on keys that contain special characters), then join them yourself:
const params = new URLSearchParams({ q: "São Paulo", sort: "date+asc" });
params.toString();
// → "q=S%C3%A3o+Paulo&sort=date%2Basc"
Note that URLSearchParams uses the application/x-www-form-urlencoded encoding, where spaces become + instead of %20. That distinction matters more than most developers expect.
Form Data vs URI: Why + and %20 Are Not the Same Thing
RFC 3986 uses %20 for spaces everywhere. But HTML forms submitted with method="GET" or Content-Type: application/x-www-form-urlencoded (standard POST forms) follow a slightly different older spec: spaces encode as +, and the literal + character encodes as %2B.
This creates a real interoperability hazard. If your backend decodes a query string with a URL decoder instead of a form decoder, a user named "John+Doe" arrives as "John Doe" — silently. The reverse is also true: a %20 in a form-encoded body stays as a literal space, but if the backend uses form-decode on a plain URI query string, a %20 becomes a literal space correctly, while + correctly becomes a space too. The inconsistency lives at the sender side.
The safest rule: use encodeURIComponent when you build URLs programmatically and URLSearchParams when you build form-style query strings. Never mix the two.
You can check the encoding behavior of any string with the URL Encoder on Toolora, which shows the percent-encoded output character-by-character and lets you toggle between URI and form-encoding modes.
Four Mistakes That Silently Break APIs
Double-encoding. If a URL already contains %20 and you run it through encodeURIComponent again, the % sign itself gets encoded to %25, and you end up with %2520. The server receives a literal string %20 instead of a space. I have seen this kill entire redirect chains in OAuth flows where the redirect_uri parameter passed through two encoding layers.
Encoding the path separator. A URL path like /api/users/123 should never have its / characters encoded. If you call encodeURIComponent on the full path string, / becomes %2F and you get a 404 because the router treats %2F as a literal character in a path segment, not a separator.
Not encoding + in URI query strings. When you build a URL manually and a parameter value contains a +, you must encode it as %2B. Failing to do so means any decoder that treats + as a space will corrupt the value silently.
Assuming ASCII-only encoding. URLs in the wild now carry Arabic, Chinese, and emoji parameters. A parameter value like city=北京 must encode to city=%E5%8C%97%E4%BA%AC. If your server-side code reads the raw bytes without decoding, or decodes with the wrong charset (e.g., ISO-8859-1 instead of UTF-8), the value turns into garbage. RFC 3986 mandates UTF-8 as the character encoding for percent-encoding; everything else is a legacy problem.
Decoding: Reversing the Process
Decoding is simpler than encoding but has its own trap: you should decode exactly once. The decodeURIComponent function reverses encodeURIComponent. For complete URLs that may contain structural characters, use decodeURI. Calling either one on already-decoded text is safe — they leave unreserved characters alone. But calling decodeURIComponent on a partially-decoded URL that still contains %2F as a literal path separator will collapse it to / and break routing.
For complex query strings — especially ones with nested encoded parameters or mixed +/%20 encoding — use a dedicated parser rather than regex. The URL Query String Parser on Toolora handles both encoding modes and shows each parameter key/value pair decoded, which makes it easy to spot where an encoding step went wrong.
A Practical Encoding Checklist
- Use
encodeURIComponentfor individual query parameter keys and values. - Use
URLSearchParamsfor building form-style query strings (+for spaces). - Never encode the full URL — only the parts that contain user or dynamic data.
- Encode
+as%2Bwhen building URI (non-form) query strings manually. - Decode exactly once on the receiving end; treat double-encoded values as an upstream bug.
- Validate that non-ASCII text is UTF-8 before encoding.
Percent-encoding is one of those topics where the spec is clear and the edge cases are messy. The mistakes above account for the majority of encoding bugs I have seen in production APIs and redirect chains. Getting the encode/decode boundary right — and using the right function for the right context — is usually enough to keep URLs working cleanly across every layer of the stack.
Made by Toolora · Updated 2026-06-30