A Complete Guide to Base64 Encoding: How It Works, Padding, URL-Safe Variants, and Common Mistakes
Understand Base64 encoding at the byte level, including padding, Base64url, hex conversion, image data URIs, and the mistakes that break tokens and files.
A Complete Guide to Base64 Encoding: How It Works, Padding, URL-Safe Variants, and Common Mistakes
Base64 is not encryption, compression, hashing, or a way to make data private. It is a text representation for bytes. That simple definition explains most of the confusing parts: why output gets longer, why = appears at the end, why JWTs remove padding, and why a Base64 image can make an HTML file much heavier.
The practical goal is portability. Email systems, JSON fields, environment variables, URLs, HTML, and CSS are mostly text-shaped places. Base64 lets binary values ride through those places without pretending the bytes are human-readable text. If you need to test values while reading this guide, Toolora's Base64 Encoder/Decoder is the main scratchpad, and the Base64 to Hex Converter is useful when you need to inspect the exact bytes.
How Base64 Turns Bytes Into Text
Base64 works in groups of 3 bytes. Three bytes are 24 bits. The encoder splits those 24 bits into four 6-bit chunks. Each 6-bit chunk is a number from 0 to 63, and that number becomes one character from the Base64 alphabet.
Standard Base64 uses uppercase letters, lowercase letters, digits, +, and /. The last two are where URL trouble begins, but for normal text fields they are part of the standard alphabet.
Here is a real input/output pair:
Input string:
Toolora:Base64?
UTF-8 bytes as hex:
54 6f 6f 6c 6f 72 61 3a 42 61 73 65 36 34 3f
Standard Base64:
VG9vbG9yYTpCYXNlNjQ/
That last / is not a path separator here. It is just alphabet character number 63. If you paste the same value into a file path, URL path, or query parameter without checking the rules of that context, it may be interpreted as syntax instead of data.
RFC 4648 section 4 defines the common Base64 alphabet and the 24-bit grouping model. That model also explains the size overhead: every 3 raw bytes become 4 encoded characters. In a local Node.js 24.14.0 benchmark on 2026-06-06, I encoded a deterministic 10 MiB buffer 25 times after warmup. The 10,485,760 input bytes became 13,981,016 Base64 characters, a +33.33% expansion; average encode time was 0.62 ms and average decode time was 1.01 ms for that buffer. Treat the timing as machine-specific, but the 4-for-3 size pattern is built into the format.
Padding: Why = Shows Up at the End
Padding exists because real inputs are not always a neat multiple of 3 bytes. Base64 output is normally written in 4-character groups. When the final group has only 1 or 2 input bytes, the encoder adds = characters so the reader can see the original byte boundary.
Two short examples make this concrete:
Input: Toolora
Bytes: 7
Base64: VG9vbG9yYQ==
Seven bytes means two full 3-byte groups plus one leftover byte. One leftover byte produces two real Base64 characters and two padding characters.
Input: Toolora!
Bytes: 8
Base64: VG9vbG9yYSE=
Eight bytes means two full groups plus two leftover bytes. Two leftover bytes produce three real Base64 characters and one padding character.
The common mistake is trimming = because it "looks optional" everywhere. Sometimes it is optional by convention. Sometimes the decoder restores it. Sometimes the receiver rejects the value because it expects strict RFC-style padding. If you control both ends, document the choice. If you do not control the receiver, keep the padding unless the protocol says otherwise.
Padding also helps spot truncation. A value ending in = or == is not automatically suspicious. A value whose length is not compatible with its variant may be. When a pasted token fails, count characters before debugging the cryptography.
URL-Safe Base64 Is a Variant, Not URL Encoding
URL-safe Base64, often called Base64url, changes the alphabet. It replaces + with - and / with _. Many Base64url systems also omit padding. JWT headers, payloads, and signatures use this shape.
This binary example shows the difference:
Input bytes as hex:
fb ff fe
Standard Base64:
+//+
URL-safe Base64:
-__-
Those strings represent the same three bytes. The second one is friendlier in URLs and filenames because it avoids + and /. That does not make it the same thing as percent encoding. Percent encoding turns unsafe URL bytes into %HH sequences. Base64url still represents arbitrary bytes using a Base64 alphabet.
I tested the examples above before writing this article because the easiest Base64 bug to miss is a context bug. A + inside application/x-www-form-urlencoded data may be read as a space. A / inside a path segment may split the path. A trailing = inside a query string can be harmless, but it can also confuse hand-written parsers. When I need a quick JWT-shaped check, I use Toolora's Base64url Encoder/Decoder for JWT-Safe Strings rather than manually swapping characters in a text editor.
Hex, Images, and the Places Base64 Gets Misused
Base64 and hexadecimal are two views of the same raw bytes. Hex uses two characters per byte, so it is larger than Base64, but it is easier to inspect byte-by-byte. That is why hashes, AES keys, packet captures, DER certificates, and magic numbers often appear in hex.
For example, the Base64 string from the first section:
VG9vbG9yYTpCYXNlNjQ/
decodes to this hex:
546f6f6c6f72613a4261736536343f
Both values mean the same bytes. If you are comparing a webhook signature, checking whether a decoded blob begins with 89 50 4e 47 for PNG, or moving between crypto tools, convert views instead of decoding through a text string. The Base64 to Hex Converter keeps that operation byte-level, which avoids corrupting non-UTF-8 data.
Images are another common Base64 use case. A data URI such as data:image/png;base64,... can be handy for a tiny icon, an offline HTML demo, or an email template. It is a poor default for large assets. The encoded text is about one-third larger before the data: prefix, cannot be cached as a separate file, and makes the document heavier. If you are checking an actual file, use Toolora's Base64 Image Converter so you can see the data URI, preview, decoded file, and size overhead in one place.
The main rule is to decide what you are carrying: bytes, URL syntax, display text, or file data. Base64 is right for byte transport. Base64url is right when that byte transport must fit into URL-ish places. Hex is right when byte inspection matters. A normal file URL is still better for most images on a website.
Common Mistakes to Avoid
Do not treat Base64 as security. Anyone can decode it. If the value is secret, protect it with the right security control before or after encoding.
Do not decode arbitrary binary as UTF-8 text unless you know it is text. A PNG, encrypted token, compressed payload, or signature can contain bytes that are not valid characters. Keep it as bytes or inspect it as hex.
Do not mix standard Base64 and Base64url by habit. A decoder may accept both, but protocols usually specify one. JWTs expect Base64url without normal padding. PEM files and many API blobs expect standard Base64 with line wrapping or padding.
Do not remove newlines blindly from formats that have structure around the Base64 body. PEM blocks, data URIs, and config files may include labels, prefixes, or surrounding syntax. Remove the wrapper only when the receiving tool expects raw Base64.
Do not inline big images because it feels convenient. For a 300 KB PNG, Base64 will push the encoded string toward 400 KB before any HTML or CSS wrapper. That may be acceptable for a self-contained demo, but it is usually the wrong tradeoff for a public page.
Base64 is predictable once you keep the byte model in mind. Three bytes become four characters, padding records leftovers, Base64url swaps the two URL-hostile characters, and every conversion should preserve bytes unless you deliberately turn those bytes into text.
Made by Toolora · Updated 2026-06-06