Skip to main content

URL Percent-Encoding Complete Guide: Which Characters to Encode and Real-World Gotchas

Learn exactly which characters need percent-encoding in URLs, the difference between reserved and unreserved characters, and the bugs that catch even experienced developers.

Published
#url-encoding #web-development #http #developer-tools

URL Percent-Encoding Complete Guide: Which Characters to Encode and Real-World Gotchas

Every URL you type or construct in code goes through a quiet gatekeeping process: percent-encoding. Get it wrong and you end up with a 400 Bad Request, a broken OAuth callback, or — worst of all — silently incorrect data that reaches your server looking fine but meaning something else entirely.

This guide explains the mechanics of percent-encoding from first principles, shows you exactly which characters need encoding and why, and walks through the edge cases that trip up experienced developers.

The 66 Characters That Never Need Encoding

RFC 3986 (the 2005 standard that governs how URLs work) divides ASCII characters into two camps: unreserved and everything else. The 66 unreserved characters are:

  • 26 uppercase letters: A–Z
  • 26 lowercase letters: a–z
  • 10 digits: 0–9
  • Four symbols: - . _ ~

These 66 characters can appear anywhere in a URL component without encoding, and a conforming parser will pass them through unchanged. If your entire query string, path segment, or fragment consists only of these characters, you never need to touch it.

Everything outside this set — including characters you'd think are harmless, like a space or an ampersand — must be encoded as a percent sign followed by two uppercase hex digits representing the byte value. A space becomes %20, an ampersand becomes %26, a euro sign (U+20AC) becomes %E2%80%AC across three UTF-8 bytes.

The 18 Reserved Characters: Syntax Versus Data

RFC 3986 further defines 18 reserved characters that have structural meaning in URLs:

: / ? # [ ] @   (gen-delims)
! $ & ' ( ) * + , ; =  (sub-delims)

Here is the critical distinction most tutorials gloss over: reserved characters do not need to be encoded when they are serving their syntactic role, but they must be encoded when they appear as data.

Consider a URL like:

https://api.example.com/search?q=salt+&+pepper&lang=en

The ? and & delimiters separate the query string from the path and separate parameters from each other. They should not be encoded. But if your search term literally is salt+&+pepper, then the & inside the value must become %26:

https://api.example.com/search?q=salt%2B%26%2Bpepper&lang=en

(And notice the + signs are also encoded as %2B here — more on the + trap below.)

Three Functions, Three Different Rules

I tested a JavaScript snippet that exposed exactly how much encoding behavior varies between the three standard browser functions:

const input = "hello world & café/path?q=1";

encodeURI(input)
// "hello%20world%20&%20caf%C3%A9/path?q=1"

encodeURIComponent(input)
// "hello%20world%20%26%20caf%C3%A9%2Fpath%3Fq%3D1"

new URLSearchParams({ q: input }).toString()
// "q=hello+world+%26+caf%C3%A9%2Fpath%3Fq%3D1"

Three calls, three different outputs — all from the same string.

  • encodeURI is designed for a complete URL. It leaves reserved characters (&, /, ?, #, etc.) unencoded because they may be acting as URL structure. Use this only when you are encoding a full URL that is already assembled correctly.
  • encodeURIComponent encodes everything except the 66 unreserved characters. Use this for individual query-parameter values, path segments, or fragment identifiers — anything that should be treated as pure data.
  • URLSearchParams follows the application/x-www-form-urlencoded specification, which encodes spaces as + instead of %20. This is correct for HTML form submissions but wrong for REST APIs that use %20.

The practical rule: for query parameter values, always use encodeURIComponent. For entire assembled URLs, encodeURI. For HTML form data, URLSearchParams. Never mix them on the same string.

Use Toolora's URL Encoder to test any string through all three modes side-by-side with instant output — no setup required.

Real-World Gotchas That Break Production Systems

The + space trap. In application/x-www-form-urlencoded (HTML forms), a + represents a space. In RFC 3986 URLs, + is just a literal plus sign. If you build a query string using URLSearchParams and then embed it inside a fetch URL that your server parses with an RFC 3986-compliant router, your spaces may survive — but if the server uses a form decoder, + will be decoded as a space even inside path segments where it should never mean that. The fix: always encode with %20 unless you are specifically constructing form POST bodies.

Double-encoding. This one burned me on a file-hosting API. My path was /files/project%20notes/readme.md — already encoded. I then passed that full URL through encodeURIComponent, turning %20 into %2520. The server decoded %2520 to %20 and looked for a folder literally named project%20notes rather than project notes. Rule: encode once, at the boundary where raw text enters the URL, never re-encode an already-encoded value.

UTF-8 multi-byte characters. Percent-encoding works on bytes, not Unicode code points. The emoji 🎸 is U+1F3B8, encoded in UTF-8 as four bytes: F0 9F 8E B8. Its percent-encoded form is %F0%9F%8E%B8. If you naively take the code point value and write %1F3B8, you will produce invalid percent-encoding that parsers reject or misinterpret. Modern runtimes handle this correctly in their encode functions, but if you are assembling encodings by hand — for example in a server-side template — validate the byte-level output.

The # in fragment identifiers. A # in a URL starts the fragment; everything after it is never sent to the server. This means https://example.com/search?q=C%23 (C# the programming language, with the # encoded) successfully sends q=C# to the server. But https://example.com/search?q=C#sharp sends q=C to the server and sharp becomes the fragment — silently truncating your query. Encode # as %23 whenever it appears in a query value.

Emoji and non-ASCII in paths. RFC 3986 technically only defines percent-encoding for ASCII. Internationalized domain names and path components use a separate standard (IRIs, RFC 3987). Many frameworks automatically percent-encode non-ASCII path segments, but if you are constructing URLs manually, do not assume that copying a Chinese or Arabic character into a path will round-trip correctly through all proxies and caches. Encode them explicitly.

Checking HTML-Related Encoding Too

Percent-encoding and HTML entity encoding often get confused. They solve different problems: percent-encoding makes a string safe to appear in a URL; HTML entity encoding makes a string safe to appear in HTML markup. The character & needs to be %26 in a URL query value but & in an HTML attribute.

When building an href attribute in HTML, you may need both: percent-encode the URL structure, then HTML-encode the resulting URL when embedding it in an attribute. A tool like Toolora's HTML Entity Encoder handles the HTML side of that equation.

Practical Checklist Before You Ship

  1. Identify every place your code concatenates strings into a URL. Each dynamic value that flows into a path segment or query value must pass through encodeURIComponent before concatenation.
  2. Check what encoding your HTTP library applies. axios, fetch, and request all have different defaults for query objects. Some double-encode if you pre-encode. Read the docs.
  3. Validate round-trip fidelity. Encode a value, transmit it, decode it server-side, compare with the original. Do this with: a string containing +, a string containing %, a string containing non-ASCII characters, and an empty string.
  4. Audit redirect and callback URLs. OAuth and webhook callback URLs that contain your own query parameters are a notorious source of double-encoding bugs.

RFC 3986's 18-year-old rules are precise and consistent. The bugs almost never come from the spec being unclear — they come from inconsistent application of it across the many layers of an HTTP request.


Made by Toolora · Updated 2026-06-29