Skip to main content

Base64, URL Encoding, and HTML Escaping: Which One Do You Actually Need?

Three different encoding schemes, three different jobs. Learn when to use Base64, percent-encoding, or HTML entity escaping — with real input/output examples.

Published
#encoding #base64 #url-encoding #html-escaping #web-development

Base64, URL Encoding, and HTML Escaping: Which One Do You Actually Need?

I've seen developers — including myself on a bad day — reach for Base64 when they needed percent-encoding, or stuff HTML entities where a URL escape was required. The three schemes look similar from a distance: they all turn "unsafe" characters into something safer. But they solve completely different problems, and mixing them up produces bugs that are painful to diagnose.

This is the comparison I wish I'd had when I first started building web applications.

What Each Scheme Is Actually Trying to Solve

The three encoding methods come from three different contexts, and the context defines everything.

Base64 was invented to carry arbitrary binary data through systems that only understand ASCII text — originally email (SMTP). It takes raw bytes and maps every 6 bits to one of 64 printable characters. It does not care about HTML, URLs, or any web standard. It is a byte-to-text transport layer.

Percent-encoding (commonly called URL encoding) was defined in RFC 3986 to protect characters that carry special meaning inside a URI. A space, an ampersand, or a # in a query-string value would break the URI structure if left raw. Percent-encoding replaces each unsafe byte with %XX where XX is the hex value of that byte.

HTML entity escaping is about protecting the HTML parser. The characters <, >, &, and " are control characters in HTML. If you render user input directly into a template, an attacker can inject a <script> tag. Entity escaping turns < into &lt; so the browser renders the angle bracket as text rather than interpreting it as a tag.

Same surface appearance — "replace characters with something else" — but three fundamentally separate jobs.

Real Input/Output Examples

Let me use one string to show what each scheme actually produces.

Input: Hello <World> & "friends"!

| Scheme | Output | |--------|--------| | Base64 | SGVsbG8gPFdvcmxkPiAmICJmcmllbmRzIiE= | | URL encoding | Hello%20%3CWorld%3E%20%26%20%22friends%22%21 | | HTML escaping | Hello &lt;World&gt; &amp; &quot;friends&quot;! |

Notice what each does to spaces and angle brackets. Base64 ignores the semantics entirely — it just sees bytes. URL encoding converts the space to %20 and < to %3C. HTML escaping leaves the space alone (it's fine in HTML) but turns < into &lt; because that's the dangerous character in an HTML context.

You can try each transformation on Toolora's Base64 encoder, URL encoder, and HTML entity encoder.

Overhead: How Much Do They Inflate Your Data?

This is where Base64 has a real cost that developers frequently overlook.

Base64 encodes 3 bytes into 4 characters — a 33% size increase. Per the RFC 4648 specification, a 1 MB binary file becomes approximately 1.37 MB as Base64 text (the exact figure is ceil(n/3) * 4 bytes). Percent-encoding inflates only the characters that need escaping — plain ASCII letters and digits are left untouched, so an English query string grows by less than 5% in most cases. HTML escaping is similarly selective: &amp; is 5 chars for 1, but most page content doesn't need escaping at all.

This matters in practice. I once saw an API payload triple in size after a developer Base64-encoded a JSON body "just to be safe" before putting it in a URL query parameter. The right fix was a single encodeURIComponent() call — no size overhead.

When to Use Which Scheme

Use Base64 when:

  • Embedding binary data (images, files, certificates) in a text medium — email attachments, data: URIs, JSON fields, HTTP Basic Auth headers.
  • The receiving system expects a Base64 string, not raw bytes or percent-encoding.

Use URL encoding when:

  • Placing a value inside a URL query string or path segment.
  • Sending form data with application/x-www-form-urlencoded.
  • Building any URI where user input might contain &, =, #, ?, or spaces.

Use HTML escaping when:

  • Rendering user-supplied text into an HTML template (the primary XSS defense).
  • Inserting dynamic content into an HTML attribute value.
  • Output is going into an HTML context — not a URL, not a database, not JSON.

One clean rule of thumb: match the escaping scheme to the output context. If the output is HTML, HTML-escape. If the output goes into a URL, URL-encode. If the output needs to survive a text-only channel as binary, Base64 it.

Common Mistakes and Double-Encoding Traps

Mistake 1: Base64-encoding data that goes into a URL. Standard Base64 uses + and / as part of its alphabet. Both characters have special meanings in URLs (+ is a space in form data; / separates path segments). The result is silent data corruption. The fix is Base64url encoding, which swaps +- and /_ — or just use encodeURIComponent() on the Base64 output.

Mistake 2: HTML-escaping a URL before putting it in an href attribute. href="&lt;script&gt;" is still valid HTML — the browser unescapes the entities and sees <script> as the URL. You need URL encoding for the URL and HTML escaping for the attribute, applied in that order.

Mistake 3: Double-encoding. Applying encodeURIComponent() twice turns a single space into %2520 instead of %20. The % itself gets encoded on the second pass. This is particularly common when middleware and application code both try to "sanitize" input independently.

I ran into the double-encoding trap while debugging a broken file download link where a filename contained parentheses. The filename was correctly percent-encoded by the browser, then re-encoded by a proxy — and the server couldn't match the path. Removing the proxy's encoding step fixed it immediately.

For more complex cases involving multiple escaping layers, Toolora's string escape converter can help you inspect what's happening at each layer.

Quick Reference

| Property | Base64 | URL encoding | HTML escaping | |----------|--------|--------------|---------------| | Purpose | Binary-to-text transport | URI safety | HTML injection prevention | | Output charset | A–Z a–z 0–9 + / = | Original + %XX sequences | Original + named entities | | Size overhead | ~33% always | Only for special chars | Only for < > & " ' | | Reversible? | Yes | Yes | Yes | | Safe in HTML? | Yes (plain text) | Often not (without escaping) | Yes | | Safe in URLs? | Not standard (use Base64url) | Yes | No |


Made by Toolora · Updated 2026-07-02