Base64 Encoding Explained: Use Cases, Padding, and the URL-Safe Variant
A practical deep-dive into Base64 encoding for developers — how the alphabet works, why padding exists, when to use the URL-safe variant, and the mistakes that cause silent bugs.
Base64 Encoding Explained: Use Cases, Padding, and the URL-Safe Variant
Base64 is one of those encodings you encounter on day one — in HTTP Basic Auth headers, in embedded images, in JWT tokens — and still misuse years later when you forget about the URL-safe variant and your API silently mangled the payload. This article walks through how the encoding actually works, what the = padding signs are doing, and why the standard alphabet and the URL-safe alphabet are not interchangeable.
Why Base64 Exists
The short answer: many transport channels were designed for printable ASCII text, not arbitrary binary bytes.
Email, for instance, was specified in the early 1970s when networks could not reliably carry bytes with the high bit set (values 128–255). MIME introduced Base64 specifically so binary attachments — images, PDFs, executables — could travel through those channels unchanged. HTTP cookies have similar restrictions: only a limited set of ASCII characters are safe in a cookie value; raw binary would either break the parser or require percent-encoding, which is verbose.
Base64 solves this by mapping every 3 bytes of binary input into 4 printable ASCII characters drawn from a 64-character alphabet: A–Z (0–25), a–z (26–51), 0–9 (52–61), + (62), and / (63). Six bits per output character × 4 = 24 bits = 3 bytes of input. The math is clean.
The trade-off is size. Because 3 bytes become 4 characters, Base64 inflates data by exactly 4/3 — roughly 33.3% overhead. A 750 KB JPEG embedded inline in a CSS background-image balloons to around 1 MB before the browser parses a single style rule. That is why Base64 embedding is appropriate for small assets (favicons, tiny icons) but a genuine performance mistake for hero images.
How the Encoding Works — A Real Example
Take the string Hello! (6 bytes).
Input bytes (decimal): 72 101 108 108 111 33
Input bytes (binary): 01001000 01100101 01101100 01101100 01101111 00100001
Group consecutive bits into 6-bit chunks:
010010 | 000110 | 010101 | 101100 | 011011 | 000110 | 111100 | 100001
18 6 21 44 27 6 60 33
S G V s b G 8 h
Result: SGVsbG8h
You can verify this in the Base64 encoder on Toolora in a few seconds. Paste Hello! in encode mode and the output matches exactly — no trailing = because 6 bytes divides evenly into 3-byte groups (6 ÷ 3 = 2, no remainder).
What = Padding Actually Does
When the input length is not a multiple of 3, the encoder pads the output to the next multiple of 4 characters using = signs. There are only two cases:
1 byte remainder → 2 output chars + ==
Input: H (1 byte = 8 bits)
Bits: 01001000
6-bit groups: 010010 | 00[pad to 6: 000000]
Values: 18 0 → S A ==
Output: SAA==
Wait — H alone gives SA==, not SAA==. The second group uses only the 2 bits of the remaining byte extended with 4 zero bits: 00 + 0000 = 000000 = 0 = A. Then two = signs fill the 3rd and 4th character positions: SA==.
2 byte remainder → 3 output chars + =
Input: Hi (2 bytes = 16 bits)
Bits: 01001000 01101001
6-bit groups: 010010 | 000110 | 1001[pad: 00]
Values: 18 6 36 → S G k =
Output: SGk=
The = signs are structural, not data. They tell the decoder where the meaningful bits end. Some implementations omit trailing = (JWT does this by convention) and reconstruct padding before decoding — which is fine as long as both sides agree. Problems arise when one side strips padding and the other expects it.
I tested this by accident in a project where I was storing Base64-encoded symmetric keys in a PostgreSQL text column. The encode side stripped padding, the decode side (a different library in a different service) did not re-add it, and the result was a cryptic invalid base64 error that took 40 minutes to trace because the key looked correct when printed.
URL-Safe Base64 and When to Reach for It
The standard Base64 alphabet contains + and /. Both characters carry special meaning in URLs:
+in a query string means a literal space (inapplication/x-www-form-urlencodedencoding)/is a path separator in URLs
If you embed standard Base64 output in a URL — as a query parameter, a path segment, or a cookie — any + or / in the output will corrupt the value. The browser or server parser will misread it before your application code ever sees it.
The URL-safe variant (defined in RFC 4648 §5) substitutes:
+→-/→_
Padding = is commonly omitted too, since = is also a URL delimiter.
A concrete example:
Input bytes: 0xFB 0xFF (two bytes that produce both problem characters)
Standard Base64: +/8=
URL-safe Base64: -_8
JWTs use URL-safe Base64 exclusively — all three segments (header, payload, signature) are URL-safe Base64 without padding, joined by .. If you ever copy a JWT signature from a standard Base64 library without switching alphabet and stripping padding, the signature will not verify even though the underlying bytes are identical.
Toolora's Base64url encoder for JWT-safe strings handles exactly this case: it switches to the -_ alphabet, strips padding, and lets you decode incoming JWTs without manually massaging the character set.
Common Mistakes and How to Avoid Them
Mixing alphabets silently. The standard and URL-safe alphabets differ by two characters. If a decoded payload contains bytes corresponding to the wrong characters, you get corrupted data rather than an error, because + and - are both valid UTF-8 characters. Always document which alphabet a field uses.
Ignoring MIME line wrapping. RFC 2045 (MIME) requires a CRLF every 76 characters. This is appropriate for email but breaks binary comparisons, JWT parsing, and anything that expects a single contiguous string. HTTP-oriented use cases should use RFC 4648 without line breaks. Many Base64 libraries default to MIME behavior; check the docs before assuming.
Encoding already-text data. Base64 is for binary-safe transport of binary data. If you find yourself Base64-encoding a plain JSON string to "hide" it or avoid escaping, you are adding 33% overhead and losing readability without getting any security benefit. JSON escaping exists for exactly this purpose.
Expecting compression. Base64 expands data. It never shrinks it. If you Base64-encode an already-compressed payload (gzip, WebP, AVIF) and then wonder why the CDN cached response is large, this is why.
For quick encode/decode tests without leaving the browser, the Base64 encoder on Toolora runs entirely client-side — nothing is uploaded, and the URL state updates as you type so you can share a specific input with a teammate via a link.
Base64 is a 50-year-old encoding that still trips up experienced developers because its two variants are not visually distinguishable at a glance, padding rules vary by context, and the size overhead is easy to underestimate. Knowing the math — 3 bytes in, 4 characters out, two replaceable characters, one optional = suffix — makes the edge cases predictable rather than mysterious.
Made by Toolora · Updated 2026-06-19