Skip to main content

Quoted-Printable Explained: How Email Encodes Café as caf=C3=A9

How quoted-printable encoding keeps email text readable, escapes non-ASCII bytes as =XX, handles soft line breaks, and when MIME picks it over Base64.

Published By Li Lei
#quoted-printable #email #mime #encoding #rfc-2045

Quoted-Printable Explained: How Email Encodes Café as caf=C3=A9

Open the raw source of almost any email written in English and you will see something odd in the body: an = sign at the end of a line, the occasional =3D, and stray tokens like =C3=A9 sitting in the middle of an otherwise normal sentence. That is quoted-printable, one of the two content-transfer-encodings that MIME uses to push text through mail systems that were never designed for anything past plain 7-bit ASCII. It is the quiet workhorse behind every accented name, every Chinese character, and every emoji that survives the trip from your outbox to someone else's inbox.

This post walks through what quoted-printable actually does, why it escapes some bytes and leaves others alone, how soft line breaks keep long lines under control, and when an email client reaches for it instead of Base64.

The core idea: keep printable ASCII, escape the rest

Quoted-printable is defined in RFC 2045, and its design goal is narrow and specific. It assumes your data is mostly readable text with only a few bytes that cannot travel safely. So the rule is simple: any byte that is already a printable ASCII character is left exactly as it is, and any byte that is not gets escaped as =XX, where XX is the two-digit upper-case hexadecimal value of that byte.

That means the word the stays the. The digits 2026 stay 2026. A comma stays a comma. Only the bytes that would confuse an old mail relay, the high-bit bytes above 127 and the control characters below 32, become escape tokens. The = character itself is the one printable exception: because it starts every escape, a literal equals sign has to be encoded as =3D, otherwise a decoder would mistake it for the beginning of a token.

The payoff is readability. An English sentence with one accented word stays almost entirely legible in the raw source, and the encoded message grows only slightly. Compare that to Base64, which scrambles every single byte into a 64-character alphabet, and you can see why a mail client treats quoted-printable as the default for text.

A worked example: encoding café

Take the single word café. The first three letters, c, a, f, are plain ASCII, so they pass through untouched. The interesting part is é.

In UTF-8, é is not one byte. It is two: 0xC3 and 0xA9. Quoted-printable works on bytes, not characters, so it escapes each of those two bytes separately. 0xC3 becomes =C3, 0xA9 becomes =A9, and the word comes out as:

caf=C3=A9

That string is exactly what you would find in a raw email header preview or a .eml body. Run the same logic on the Chinese character , which is three UTF-8 bytes (E4 B8 AD), and you get three tokens: =E4=B8=AD. An emoji like 😀 is four bytes (F0 9F 98 80) and encodes to =F0=9F=98=80. This is the single most common confusion I see: people count one token per visible character, then panic when one Chinese glyph produces three =XX pairs. It is not a bug. It is the encoding faithfully escaping every byte, and decoding the same tokens rebuilds the original text without mojibake.

You can paste any of these strings into the quoted-printable encoder and decoder and flip between encode and decode mode to watch the bytes resolve back to readable text.

Soft line breaks: the trailing equals sign

Here is the concrete detail that trips up most people reading raw email for the first time. Quoted-printable limits every encoded line to 76 characters. When a line of text would run longer than that, the encoder folds it by inserting a soft line break: an = as the very last character of the line, followed by a CRLF.

That trailing = is a signal, not data. It tells the decoder, "this line break is artificial, I added it only to stay under 76 columns." So the decoder removes the = and the newline and stitches the two halves back into one logical line. A hard newline that you actually typed is encoded differently and is preserved as a real break.

This is the heart of why quoted-printable beats Base64 for human-readable mail. Printable ASCII stays as-is, other bytes become =XX hex, and an = at the end of a line is a soft break that vanishes on decode. Put those three rules together and a mostly-text email stays scannable in its raw form, line by line, instead of becoming an opaque wall of Base64. If you ever need the output as one unbroken string, most tools let you toggle the 76-column wrapping off so nothing folds.

Why trailing whitespace gets escaped too

There is one more rule that looks strange until you know the reason. A space or a tab sitting at the end of a line gets escaped to =20 or =09, even though space and tab are perfectly printable ASCII characters everywhere else.

The reason is defensive. Many mail servers and gateways strip trailing whitespace from lines as a "cleanup" step, which would silently delete a space you actually meant to keep. By encoding a trailing space as =20, quoted-printable protects it: the escape token is not whitespace, so no server trims it, and the decoder restores the real space on the other end. If you hand-edit encoded output and remove those escapes, you reintroduce exactly the data loss the encoding was built to prevent.

Quoted-printable versus Base64: which one and when

Both quoted-printable and Base64 are MIME content-transfer-encodings, and a single multipart message often uses both, one per part. They suit opposite kinds of data.

Quoted-printable shines when the content is largely ASCII with a sprinkle of non-ASCII: an English email with a few accented names, a German message, a note with one emoji. Almost every byte passes through untouched, the raw source stays readable, and the size overhead is tiny. Base64, by contrast, re-encodes everything into its 64-character alphabet, inflates the payload by roughly a third, and makes the content completely unreadable. That trade is exactly right for binary like images, PDFs, and attachments, where nearly every byte would need escaping under quoted-printable anyway and the line-by-line readability buys you nothing.

The rule of thumb: text that is largely ASCII goes quoted-printable; binary or heavily non-ASCII data goes Base64. If you want to see the contrast directly, encode the same accented string in both this tool and the Base64 encoder and decoder and compare the output length and legibility side by side.

Wrapping up

Quoted-printable is one of those formats that looks cryptic until you learn its three small rules: keep printable ASCII as-is, escape every other byte as =XX, and use a trailing = for soft line breaks. Once those click, raw email source stops being intimidating. You can read a .eml body, spot a double-encoded value, or hand-build a test message without guessing. It is a forty-year-old encoding that still quietly carries your accented names and emoji to their destination, one escaped byte at a time.


Made by Toolora · Updated 2026-06-13