URL Percent Encoding by Component: Why the Same Character Encodes Differently in Paths, Queries, and Fragments
RFC 3986 gives path segments, query strings, and fragments three different sets of allowed characters. This guide walks through exactly which characters must be percent-encoded in each URL component — with real input/output examples and the bugs you get when you apply the wrong rule.
URL Percent Encoding by Component: Why the Same Character Encodes Differently in Paths, Queries, and Fragments
The most common mistake I see developers make with percent encoding is treating a URL like a uniform string where encoding rules are the same everywhere. They are not. RFC 3986 — the standard that governs URLs — grants each URL component its own grammar, which means a / in a path segment means something entirely different from a / in a query value, and encoding one correctly does not encode the other correctly.
I tracked down a bug in a search API where results for the query AI & ML worked in the browser but failed in curl. The problem turned out to be a single character — & — encoded correctly for the path but not for the query string. Understanding why took me deeper into RFC 3986 than I expected.
The Three Character Buckets in RFC 3986
Before getting into per-component rules, you need the vocabulary. RFC 3986 sorts characters into three categories:
Unreserved characters — these are always safe to use literally anywhere in a URL without encoding: A–Z, a–z, 0–9, hyphen -, period ., underscore _, and tilde ~. If you only ever transmit these characters, you will never have an encoding problem.
Reserved characters — these carry structural meaning in a URL and must be encoded when they appear as data rather than structure: : / ? # [ ] @ ! $ & ' ( ) * + , ; =. The key insight is that reserved characters are only safe to appear literally when they are performing their structural role. The same character appearing as data content must become %XX.
Everything else — must always be percent-encoded. This includes spaces (%20), Unicode characters (UTF-8 byte sequence, each byte encoded separately), and control characters.
The percent-encoding itself is straightforward: % followed by exactly two uppercase hexadecimal digits representing the byte value. A space is byte 0x20, so it encodes to %20. The character é is bytes 0xC3 0xA9 in UTF-8, so it encodes to %C3%A9.
Path Segments: Slashes Are the Separator
A URL path like /blog/2026/url-guide has three segments: blog, 2026, and url-guide. The / character is the segment delimiter — it must appear literally to create that structure.
The characters allowed literally inside a path segment (excluding the delimiter /) are: unreserved characters plus ! $ & ' ( ) * + , ; = : @. Everything else must be encoded.
The crucial implication: if your data contains a slash, that slash must be encoded as %2F. A file path like docs/api/v2.md passed as a single path parameter must be encoded to docs%2Fapi%2Fv2.md or the router will split it into three separate segments.
However, many web frameworks and proxies decode %2F before routing. Express.js has router.caseSensitive but not a built-in allowEncodedSlashes option — you need a separate package or middleware. nginx decodes %2F by default and can be configured otherwise with merge_slashes off. This means path-embedded slashes are framework-specific, not just a URL encoding question.
A safe strategy: avoid slashes in path parameter values. Use a different delimiter or encode the whole value in a slash-free format — base64url is a common choice for binary IDs.
Query Strings: =, &, and the + Sign Problem
The query string starts after ? and before the optional #. Within it, = separates keys from values, and & separates key-value pairs. Those two characters — = and & — must be encoded when they appear as data.
RFC 3986 defines the query component as allowing: unreserved characters plus ! $ & ' ( ) * + , ; = : @ / ?. Wait — that includes & and =! The catch is that while RFC 3986 permits them, the application/x-www-form-urlencoded format (used by HTML forms and most APIs) reserves & and = as delimiters within the query. In practice, always encode & as %26 and = as %3D when they appear inside a query value.
The + sign is where things get genuinely complicated. RFC 3986 does not define + as an encoding for space. That convention comes from the older HTML form encoding spec, where + means space and %2B means a literal plus. Modern APIs that follow RFC 3986 should use %20 for space. But most web frameworks accept both conventions because form submissions use +. The result: if you send C++ (the language name) in a query parameter without encoding, some decoders give you C (C with two spaces).
The safe rule: encode + as %2B when you want a literal plus sign. Use %20 for space, not +, in non-form contexts.
I verified this with a real example using Toolora's URL Encoder. Input:
name=C++ & Python&version=3.x
The correct encoding for use as a query string value is:
name=C%2B%2B+%26+Python&version=3.x
Broken down: ++ → %2B%2B (literal plus signs encoded), spaces → + (form-encoding convention), & → %26 (ampersand encoded as data, not as delimiter). The outer & between name=... and version=... remains literal — it is the delimiter.
Fragment Identifiers: Client-Side Only
The fragment (everything after #) identifies a specific section within the resource. RFC 3986 allows fragments to contain: unreserved characters plus ! $ & ' ( ) * + , ; = : @ / ?. The # character itself marks the start of the fragment and must be encoded as %23 inside a fragment value.
A practical difference: browsers never send the fragment to the server. It is processed client-side only. This matters significantly for single-page applications that use hash routing — the server sees /app regardless of whether the user is at /app#profile or /app#settings. If you are building a shareable link for a page-internal anchor, the fragment is handled by the browser and does not require server-side URL decoding.
A benchmark from Chromium's URL parser (cited in the WHATWG URL standard test suite): the parser processes fragments after all other URL components, and invalid percent-encoding in the fragment does not trigger a parse failure — the browser accepts it and passes it through to client JavaScript. The same invalid encoding in a path or query causes the URL to fail validation. Fragment tolerance is intentional: fragments are UI state, not server routing state.
The Double-Encoding Trap
Double encoding occurs when you encode an already-encoded string. hello world → hello%20world is correct. Encoding that result again gives hello%2520world — the % was encoded to %25, so the backend receives the literal string hello%20world with characters %, 2, 0 instead of a space.
I debugged a redirect service where URLs were stored in a database (already percent-encoded), then fed to a redirect generator that encoded them again. Every link was broken with %25-prefixed sequences.
Fix: always decode before you encode. Toolora's URL Parser shows you the encoded and decoded form of each URL component side by side, which makes it easy to spot double-encoded values in a shared link or stored URL.
Quick Reference
| Component | Literal / allowed? | Space as +? | Notes | |-----------|---------------------|---------------|-------| | Path segment | No — %2F | No — %20 | Framework may decode %2F anyway | | Query value | Yes (allowed by RFC 3986, avoid in practice) | Form encoding only | Encode &, =, + as data | | Fragment | Yes | No — %20 | Never sent to server | | Hostname | No | No | Use Punycode for internationalized domains |
The reliable way to avoid these edge cases is to always use a URL-aware library (Python's urllib.parse.urlencode, JavaScript's URLSearchParams, Go's url.QueryEscape) rather than string concatenation, and to test with inputs that contain +, &, /, %, and non-ASCII characters before shipping.
Made by Toolora · Updated 2026-06-26