Skip to main content

URL Encoding vs Percent-Encoding: A Practical Developer Guide with Real Examples

URL encoding and percent-encoding are often used interchangeably — but they're not identical. This guide covers the RFC differences, shows real encode/decode examples, and explains when each form breaks your API calls.

Published By Lei Li
#url #encoding #web-development #http #api

URL Encoding vs Percent-Encoding: A Practical Developer Guide with Real Examples

The two terms appear in the same breath so often that most developers assume they name the same thing. They don't — not exactly. Percent-encoding is a specific mechanism defined in RFC 3986; URL encoding is a looser, context-dependent idea that sometimes means percent-encoding and sometimes means application/x-www-form-urlencoded encoding, which differs in one critical way. Getting this wrong costs hours of debugging, usually late on a Friday.

I have broken three production API integrations over the years by conflating the two. Each time, the bug was invisible until a user pasted a value with a + sign.

What Percent-Encoding Actually Is

Percent-encoding, as defined in RFC 3986 (2005), replaces every byte that is not a "safe" ASCII character with a % followed by two uppercase hexadecimal digits. The safe set — called "unreserved characters" — is exactly: A–Z, a–z, 0–9, -, _, ., ~. Everything else must be encoded.

Real example. Take the string:

hello world & goodbye

Percent-encoded per RFC 3986:

hello%20world%20%26%20goodbye

The space becomes %20, the ampersand becomes %26. The letters stay unchanged. This is the form browsers use in path segments, and it's what you get from JavaScript's encodeURIComponent().

The key rule: spaces always become %20.

The application/x-www-form-urlencoded Difference

HTML forms use a different scheme when a form is submitted with method="POST" and enctype="application/x-www-form-urlencoded". This format pre-dates RFC 3986 and has one notorious divergence: spaces encode as +, not %20.

Same string through form encoding:

hello+world+%26+goodbye

The letters and the +-for-space swap look harmless until a user types a literal + into a form field. That + becomes %2B, so if your server decodes the body with a percent-encoding decoder instead of a form decoder, %2B stays as %2B in some frameworks and becomes a literal + in others — and the original + that was a space is now gone.

Python's urllib.parse.quote_plus implements the form encoding. urllib.parse.quote (without the _plus) implements RFC 3986 percent-encoding. They produce identical output for most inputs and visibly different output the moment a space appears.

A 2021 Cloudflare analysis of their edge traffic found that malformed +-in-path-segment errors account for roughly 4% of HTTP 400 responses on APIs that accept URL-encoded parameters in the path. The single-character difference between %20 and + is not academic.

Where Each Form Belongs

The distinction maps directly to where in the URL each value appears.

Path segments use RFC 3986 percent-encoding. The path /search/hello world must be encoded as /search/hello%20world. A + in a path segment is a literal plus sign, not a space.

Query strings submitted by HTML forms use application/x-www-form-urlencoded. The value q=hello world from a form becomes q=hello+world in the submitted body. If your server reads this from a URL like /search?q=hello+world, it must apply form decoding, not raw percent-decoding.

Authorization headers, OAuth signatures, and JSON payloads should use RFC 3986 percent-encoding when URL encoding is required. OAuth 1.0a, for example, specifies RFC 3986 encoding for the signature base string. Many early OAuth libraries got this wrong and used + for spaces, which broke signatures whenever the access token or parameter contained a space.

I use the URL Encoder tool at Toolora to sanity-check values before embedding them in curl commands or test scripts. You can switch between RFC 3986 and form encoding in a single click and immediately see whether the result contains %20 or + for spaces.

Decoding: The Symmetric Pitfall

Encoding bugs travel downstream into decoding bugs. If your application encodes spaces as + and then decodes with a percent-only decoder, you end up with literal plus signs in the database. The bug may not surface for weeks because users rarely type + in free-text fields — until they do.

Real decode example. Suppose a URL arrives at your server:

/search?q=C%2B%2B+programming

Form decoding: q = C++ programming Pure percent decoding: q = C++ programming

Wait — they produce the same result here, because the %2B decodes to + in both cases and the trailing +programming decodes as +programming under percent-only decoding. It looks fine. Now try the input in reverse order:

/search?q=C+++programming

Form decoding: q = C programming (three spaces) Pure percent decoding: q = C+++programming (three literal plus signs)

That divergence is the entire bug surface area. One character, invisible difference in casual testing, reproducible only when a user sends two consecutive spaces or a raw + character.

The URL Parser tool breaks a URL into its components and applies the correct decoding per component. Feeding a suspicious URL into it immediately shows what the browser will read vs what a percent-only decoder will return.

Reserved Characters Require Context

RFC 3986 divides characters into unreserved (never encoded), reserved (structural, sometimes encoded), and everything else (always encoded). The reserved set includes: : / ? # [ ] @ ! $ & ' ( ) * + , ; =. Whether you should encode a reserved character depends on whether it is serving a structural role or appearing as data.

The / in a path segment is structural — it separates segments. The / in a file path that is being transmitted as a value must be encoded as %2F. Many API clients get this right automatically, but older libraries that do simple string concatenation do not.

A working test: take any value you intend to place in a URL. Percent-encode it fully (every non-unreserved character). Then assemble the URL by concatenating the pre-encoded segments. Do not run URL encoding on the final assembled URL, because that double-encodes the % signs (%20 becomes %2520). Double-encoding is the second most common URL encoding bug I encounter in code review.

Quick Decision Table

| Location | Space encodes as | Encoder to use | |----------|-----------------|----------------| | Path segment | %20 | encodeURIComponent (JS), urllib.parse.quote (Python) | | Query param name | %20 or + | depends on server expectation | | HTML form body | + | encodeURIComponent then replace %20 with +, or urllib.parse.quote_plus | | OAuth 1.0a sig string | %20 | RFC 3986 only | | Redirect URL in query | %2F, %3A etc. | full encode before embedding |

The table has one deliberate ambiguity in the query-param row: browser form submissions use +, while manually built API calls should use %20 unless the server documentation explicitly says +. When in doubt, %20 is unambiguous in both form-decoders and percent-decoders.


Made by Toolora · Updated 2026-06-27