Skip to main content

Six Encodings Developers Use Every Day: URL, Base64, Hex, Unicode Escapes, HTML Entities, and Percent Encoding

A practical hub for telling URL encoding, Base64, Hex, Unicode escapes, HTML entities, and percent encoding apart, with real outputs and conversion rules.

Published By Lei Li
#encoding #base64 #urls #unicode #developer-tools

Six Encodings Developers Use Every Day: URL, Base64, Hex, Unicode Escapes, HTML Entities, and Percent Encoding

Encoding bugs usually start with a vague instruction: "escape this value." That can mean six different things. A URL builder, a JSON parser, an HTML renderer, a shell command, a hex editor, and a JWT verifier all expect different shapes of text.

This guide is a hub for the encodings developers touch constantly: URL encoding, Base64, Hex, Unicode escapes, HTML entities, and percent encoding. If you want to test values while reading, open Toolora's URL Encoder / Decoder, Base64 Encoder/Decoder, Text to Hex Converter, Unicode Escape Converter, and HTML Entities Encoder. For byte-level Base64 and Hex swaps, use the Base64 to Hex Converter.

The Mental Model: Syntax, Bytes, and Characters

The fastest way to choose the right encoding is to ask what you are protecting.

URL encoding protects URL structure. If a value contains &, =, ?, #, spaces, Chinese text, or emoji, those characters can change how a browser or server parses the address. Component encoding turns one value into a safe URL component.

Percent encoding is the byte notation behind that process. A byte is written as %HH, where HH is a two-digit hexadecimal value. For example, a space is %20, & is %26, and the UTF-8 bytes for are %E4%B8%AD. In practice, people often say "URL encoding" and "percent encoding" as if they are identical. The useful distinction is context: percent encoding is the representation; URL encoding is the job of applying it correctly to a URL part.

Base64 represents bytes as text. It is common in JSON payloads, data URLs, Basic Auth headers, certificates, and tokens. Standard Base64 uses +, /, and = padding, while Base64url uses - and _ and often drops padding.

Hex also represents bytes as text, but it uses two hex digits per byte. Hex is less compact than Base64, but it is easier to inspect one byte at a time. That is why hashes, signatures, packet dumps, color bytes, and binary fixtures often appear as hex.

Unicode escapes represent characters or code points for a language parser. JavaScript and JSON use \uXXXX; modern JavaScript can also use \u{1F600} for full code points. Emoji above U+FFFF become surrogate pairs in classic \uXXXX form.

HTML entities protect HTML rendering. They turn characters such as &, <, >, ", and ' into forms such as &amp;, &lt;, &gt;, &quot;, and &#39;, so text displays literally instead of becoming markup.

One Real String, Six Outputs

Here is the exact input I tested:

name=Zoë & tag=中文/emoji 😀

Its UTF-8 length is 33 bytes. The actual outputs below are intentionally different because each encoding answers a different question.

URL component encoding:

name%3DZo%C3%AB%20%26%20tag%3D%E4%B8%AD%E6%96%87%2Femoji%20%F0%9F%98%80

Full URL encoding, applied to https://toolora.dev/search?q= plus the input:

https://toolora.dev/search?q=name=Zo%C3%AB%20&%20tag=%E4%B8%AD%E6%96%87/emoji%20%F0%9F%98%80

Notice the bug risk in that full URL output: the literal & remains a URL separator. If the whole input is meant to be one query value, component encoding is the right operation.

Base64:

bmFtZT1ab8OrICYgdGFnPeS4reaWhy9lbW9qaSDwn5iA

Hex, shown as UTF-8 bytes:

6e 61 6d 65 3d 5a 6f c3 ab 20 26 20 74 61 67 3d e4 b8 ad e6 96 87 2f 65 6d 6f 6a 69 20 f0 9f 98 80

JavaScript-style Unicode escapes:

\u006e\u0061\u006d\u0065\u003d\u005a\u006f\u00eb\u0020\u0026\u0020\u0074\u0061\u0067\u003d\u4e2d\u6587\u002f\u0065\u006d\u006f\u006a\u0069\u0020\ud83d\ude00

HTML entity encoding for the dangerous HTML characters:

name=Zoë &amp; tag=中文/emoji 😀

I tested this example because it contains the usual troublemakers in one short value: an equals sign, a literal ampersand, a slash, a Latin accent, two CJK characters, and an emoji. It also shows why "convert to Unicode" is not a precise request. Hex shows UTF-8 bytes. Unicode escapes show code points or UTF-16 code units. HTML entities are for markup. Percent encoding is for URL bytes.

Benchmark: Size and Speed Are Not the Same Thing

I ran a local Node.js v24.14.0 benchmark on arm64 with a deterministic 1,024-byte payload, where byte[i] = (i * 31 + 17) & 255. For size, I encoded the same raw bytes as standard Base64 and as byte-level percent encoding, leaving only RFC 3986 unreserved bytes literal.

| Encoding | Output length | Overhead vs 1,024 raw bytes | Benchmark source | | --- | ---: | ---: | --- | | Base64 | 1,368 characters | +33.6% | local Node benchmark, RFC 4648 grouping | | Percent encoding | 2,544 characters | +148.4% | local Node benchmark, RFC 3986 unreserved set |

In the same run, encoding a 1 KiB ASCII buffer to Base64 completed about 8,757,079 operations per second, while encodeURIComponent on a 1 KiB ASCII string completed about 499,924 operations per second. That speed result is not a universal law; engines, input mix, and allocation behavior matter. The reliable takeaway is narrower: Base64 has predictable 3-byte to 4-character expansion, while percent encoding depends heavily on how many bytes are already URL-safe.

For mostly English query text, URL component encoding can stay compact because letters and digits remain literal. For arbitrary binary data, Base64 is usually the better transport format. For byte inspection, Hex wins even though it costs exactly two characters per byte.

How to Convert Without Corrupting Data

Start with the receiver, not the source. If the receiver expects a query parameter, use URL component encoding. If it expects an HTML text node, encode HTML entities. If it expects JSON with a byte payload, Base64 may be correct. If it expects a digest or raw bytes in developer tooling, Hex may be easier to compare.

Avoid double encoding. %20 is a space after one decode. %2520 is a percent sign followed by 20, which usually means someone encoded an already encoded value. The same problem appears in HTML as &amp;amp;: one decode gives &amp;, and a second decode gives &.

Do not use Base64 as secrecy. YWRtaW46czNjcmV0 is just admin:s3cret in another alphabet. It is fine for transport and formatting, but it is not encryption.

Keep byte and character views separate. The Chinese character is Unicode code point U+4E2D, JavaScript escape \u4e2d, UTF-8 hex e4 b8 ad, and URL component bytes %E4%B8%AD. Those are all valid answers in different layers.

A Practical Routing Guide

Use URL component encoding when a value goes inside a query parameter, path segment, or fragment. Use full URL encoding only when you are preserving URL separators on purpose.

Use Base64 when bytes need to travel through text-only fields: JSON, environment variables, certificates, small data URLs, or token segments. Use Base64url when the format explicitly says URL-safe Base64, as JWT does.

Use Hex when you want to read or compare bytes: hashes, HMAC output, binary fixtures, protocol captures, and UTF-8 debugging.

Use Unicode escapes when a programming language, JSON fixture, CSS rule, or test snapshot needs an escaped character form. Use HTML entities when text is going into HTML and must render literally.

The safe conversion order is simple: decode until you have the real value, inspect what layer it is entering next, then encode exactly once for that layer.


Made by Toolora · Updated 2026-06-06