Skip to main content

Base64 Encoding Explained: How It Works, Why It Grows Your Data 33%, and When to Use URL-Safe

A practical guide to Base64 encoding: how 3 bytes become 4 characters, why output grows ~33%, the URL-safe variant from RFC 4648, and when Base64 is the wrong tool.

Published By 李雷
#base64 #encoding #web-development #data-urls #url-safe

Base64 Encoding Explained: How It Works, Why It Grows Your Data 33%, and When to Use URL-Safe

Base64 is one of those things you copy and paste a hundred times before you ever stop to ask what it actually does. A Kubernetes Secret hands you cGFzc3dvcmQxMjM=, a JWT shows up as three dot-separated blobs, a CSS file embeds a font as data:font/woff2;base64,.... They all share the same trick: turning arbitrary bytes into a string that survives systems built for plain text. This guide walks through the mechanics, the famous 33% size penalty, the URL-safe variant, and the cases where reaching for Base64 is a mistake.

If you want to follow along, open the Base64 Encoder & Decoder in another tab and type the examples in as you read. Everything below runs client-side, so nothing you paste leaves your machine.

What Base64 Actually Does

Base64 is a binary-to-text encoding. The problem it solves is old and stubborn: a lot of transports were designed to carry text, not raw bytes. Email bodies, JSON string fields, URL query parameters, HTTP headers, and XML attributes all choke on control characters, null bytes, or anything outside a safe printable range. Base64 takes any byte sequence and re-expresses it using only 64 "safe" characters — A-Z, a-z, 0-9, plus + and /, with = reserved for padding. The full standard alphabet lives in RFC 4648, the same RFC that defines the URL-safe variant we'll get to shortly.

The key idea is that 64 is a power of two: 2^6 = 64. So each Base64 character carries exactly 6 bits of information. Bytes carry 8 bits each. The whole scheme is just a clean way to regroup an 8-bit stream into 6-bit chunks.

How 3 Bytes Become 4 Characters

Here is the core of the algorithm, and it explains every weird thing you've ever noticed about Base64 output.

Take 3 input bytes. That's 24 bits. Twenty-four divides evenly by both 8 (bytes) and 6 (Base64 characters): 24 / 6 = 4. So every group of 3 bytes encodes into exactly 4 characters. Walk through the word foo:

  • f = 0x66 = 01100110
  • o = 0x6F = 01101111
  • o = 0x6F = 01101111

Concatenate: 011001100110111101101111. Re-slice into 6-bit groups: 011001 100110 111101 101111 = 25, 38, 61, 47. Map each index into the alphabet and you get Zm9v. Decode Zm9v in the Base64 Encoder & Decoder and you get foo straight back.

When the input isn't a clean multiple of 3, the encoder pads with =. One leftover byte produces two real characters plus ==; two leftover bytes produce three characters plus a single =. That's why:

  • f (1 byte) -> Zg==
  • fo (2 bytes) -> Zm8=
  • foo (3 bytes) -> Zm9v

The padding isn't decoration. It tells the decoder how many real bytes the final group actually held, so it doesn't hand you a phantom trailing byte.

Why the Output Grows ~33%

This is the cost nobody warns you about until a payload blows up. Every 3 bytes of input become 4 bytes of output: 4/3 ≈ 1.333, so Base64 inflates size by roughly 33% (before counting any line-wrap newlines, which add a little more). A 9 MB image becomes about 12 MB of text. Inline a few of those as Data URLs and your "optimized" HTML page is suddenly heavier than the separate files it replaced.

I learned this the unglamorous way: I once inlined a folder of icons as Base64 Data URLs to "save HTTP requests," then watched the gzipped HTML balloon and the first-paint time get worse, not better. Base64 text doesn't compress as well as the original binary, so gzip couldn't claw back what the encoding added. The lesson stuck — Base64 is for transport correctness, not for shrinking anything.

This is also why, for files past a few hundred kilobytes, you usually want to serve the asset normally and reference it by URL instead of inlining it. The Base64 Encoder & Decoder will happily encode a 5–10 MB file in your browser via FileReader, but the resulting text is the wrong thing to paste into an editor or commit to a repo.

URL-Safe Base64 (RFC 4648 §5)

Standard Base64's +, /, and = are landmines in a URL. The / looks like a path separator, + decodes to a space in query strings, and = is the key/value delimiter. Drop a standard Base64 string into a query parameter and you have to percent-encode it, or things break in subtle ways.

RFC 4648 §5 defines the fix: URL-safe Base64 swaps + -> -, / -> _, and drops the = padding entirely (the length is enough to infer it back). Now the string drops straight into a URL path or query string with no escaping. This is exactly what JWTs use for their header and payload segments — split a token on its dots and the middle segment is base64url.

A nice detail worth knowing: decoding is alphabet-agnostic. A good decoder auto-detects both alphabets, so aGVsbG8td29ybGQ_ decodes fine whether or not you've flipped any "URL-safe" toggle — the toggle only changes what encoding produces. For heavy token work where you want a tool that splits whole JWTs into segments for you, reach for the dedicated Base64url encoder/decoder instead.

When NOT to Use Base64

The single most common Base64 mistake is treating it as security. It isn't. YWRtaW46czNjcmV0 decodes back to admin:s3cret for anyone with five seconds and any decoder — no password, no key. Base64 makes bytes transportable, not secret. If a value must stay confidential, encrypt it first with a real cipher (the AES text encryptor does AES with a passphrase), and only Base64 the already-encrypted bytes when your transport demands text.

A few other times to skip it:

  • Large binary assets over HTTP. You pay the 33% tax and lose browser caching. Serve the file and link it.
  • When you really wanted hex. Debugging byte-level data is often clearer in hexadecimal; a Base64-to-hex converter bridges the two when you're staring at a blob and need to read raw bytes.
  • Storing structured data. Base64 is a transport wrapper, not a database format. Decode it back to its real shape before you reason about it.

One last gotcha for international text: JavaScript's built-in btoa() only accepts Latin-1, so passing a Chinese or emoji string straight in throws a "character out of range" error. The correct path is to run the text through TextEncoder("utf-8") first to get real bytes, then encode those. Any Base64 tool worth using handles this for you so 密码 or 🔐 round-trips instead of corrupting.

Wrapping Up

Base64 is a 6-bits-per-character regrouping of an 8-bit byte stream: 3 bytes in, 4 characters out, padding to fill the gaps, and a ~33% size cost as the price of admission. Use the standard alphabet for files and storage, the URL-safe variant from RFC 4648 §5 for tokens and query strings, and never mistake either one for encryption. When you need to actually run a conversion — text, files, Data URLs, or a JWT segment — the Base64 Encoder & Decoder keeps it all local in your browser.


Made by Toolora · Updated 2026-06-13