HTML Character Entities Explained: Named Entities, Numeric Codes, and When to Use Them

Every web developer eventually pastes a < into an HTML template, watches it break the markup, and wonders which encoding form they should have used. The answer depends on three things: whether a named entity exists for the character, where in the document you're inserting it, and who (or what) will be reading the output.

This guide covers all three forms — named entities, decimal numeric references, and hexadecimal numeric references — and gives you a clear rule for picking the right one each time.

What Is an HTML Character Entity?

An HTML character entity is a text sequence that the browser substitutes with a specific character during parsing. Every entity starts with & and ends with ;. Between those delimiters you can write either a name (like amp) or a number (like #38 or #x26).

The three forms for a single ampersand look like this:

| Form | Syntax | Example | |------|--------|---------| | Named entity | &name; | & | | Decimal numeric reference | &#decimal; | & | | Hexadecimal numeric reference | &#xHEX; | & |

All three render as & in the browser. The difference is readability, browser support history, and how well each survives being processed by template engines or XML parsers.

The HTML5 specification defines exactly 2,231 named character references — everything from & and < to obscure math symbols like &weierp; (℘) and currency signs like € (€). Any Unicode code point without a named entity still has a numeric reference, which covers the full Unicode range of over 1.1 million code points.

Named Entities: When Readability Wins

Named entities were introduced in HTML 2.0 and cover the characters developers need most often. The short list that matters daily:

& → &
< → <
> → >
" → "
' → '
  → non-breaking space

I use named entities whenever I'm writing HTML by hand. Scanning a template and seeing — is instantly meaningful — an em dash. Seeing — tells me nothing until I look it up.

The one trap with named entities: they are case-sensitive. À (À) and à (à) are different characters. &LT; is not a valid alias for <. If you've ever copied an entity from a PDF or Word document and it silently failed, mismatched case is usually why.

Named entities also fail in XHTML and SVG-embedded HTML unless the & sign itself is encoded as &amp;. In those contexts, numeric references are the safer choice — which brings us to the next section.

Numeric References: The Universal Fallback

Numeric character references bypass the named-entity lookup table entirely. The browser maps the number directly to a Unicode code point. That means they work for every character ever defined, including emoji and CJK ideographs.

Decimal (&#N;): familiar to anyone who thinks in decimal. The check mark ✓ is ✓.

Hexadecimal (&#xN;): compact for characters whose Unicode code points are commonly written in hex. The same check mark in hex is ✓. Developers working with Unicode charts — where code points appear as U+2713 — often prefer hex because the number transfers directly without conversion.

Real input/output example from encoding a French phrase:

| Input text | Encoded HTML output | |-----------|---------------------| | Café & résumé | Café & résumé |

The é character (Unicode U+00E9, decimal 233) can also be written é. Whether you use the named entity or the decimal reference, the browser renders identical output. The difference only matters in the source file you maintain six months from now.

When to Use Which Form: A Decision Tree

The choice is not arbitrary. Here is the rule I apply on every project:

Must-escape characters (<, >, &, " inside attributes, ' inside single-quoted attributes): always use the named entities. They are the shortest form and universally supported since HTML 2.0.

Characters with a well-known named entity (em dash, copyright symbol, non-breaking space): use the named entity. © beats © for every reader of your source code.

Arbitrary Unicode characters you want to keep in source (emoji, rare symbols): if your file is saved as UTF-8, you can paste the character directly and skip the entity entirely. Browsers handle UTF-8 source files natively. Only encode when your toolchain strips or mangles non-ASCII bytes.

XHTML, SVG, or RSS feeds: skip named entities entirely except for the five defined in XML (&, <, >, ", '). Use decimal or hex for everything else.

JavaScript template literals or JSX: HTML entities do not apply. Use Unicode escape sequences (é) or paste the literal character.

One practical benchmark: GitHub's HTML sanitizer, used for rendering README files and issue comments, accepts named entities defined in HTML5. It will correctly render → (→) in Markdown cells. If you're targeting that environment specifically, named entities are fine.

Tools That Do the Encoding for You

Manual encoding works for a handful of characters. For full documents, forms output, or user-generated content that must be safely embedded in HTML, you want a tool that encodes the entire string in one pass.

The HTML Entity Encoder / Decoder on Toolora handles all three forms — named, decimal, and hex — and lets you switch between them without retyping anything. Paste in a string like Tom & Jerry's "big" adventure, select your encoding mode, and the output appears instantly:

Named mode: Tom & Jerry's "big" adventure
Decimal mode: Tom & Jerry's "big" adventure
Hex mode: Tom & Jerry's "big" adventure

I reached for this when auditing a legacy CMS export where the original developer had mixed all three forms inconsistently in the same file. Being able to round-trip through decode → re-encode in one mode saved about two hours of regex cleanup.

For simpler cases where you just need to encode a snippet of HTML-unsafe content, the HTML Entities Encoder is the faster single-purpose option.

The UTF-8 Case for Skipping Entities Altogether

There is a valid argument that for most production HTML you should encode only the five characters that are structurally significant (<, >, &, ", ') and let everything else be raw UTF-8. The argument holds when:

Your HTTP response includes Content-Type: text/html; charset=UTF-8
Your HTML file's <meta charset="UTF-8"> tag appears before any non-ASCII content
Your build pipeline doesn't corrupt multi-byte sequences

Under those conditions, pasting — directly is safer than writing —, because the entity depends on the parser interpreting the & as a reference start — which won't happen if the parser is in CDATA context or if the document is accidentally served as Latin-1.

The practical takeaway: treat named and numeric entities as the encoding of last resort for characters that break the parser structure, and as a readability tool for the handful of characters ( , —, ©) that developers read frequently enough to recognize on sight.

Summary

Named entities (&, <, ©): use for must-escape characters and commonly recognized symbols. Case-sensitive, HTML-only.
Decimal numeric references (&, ©): use in XML/XHTML/SVG and when no named entity exists. Maps directly to Unicode code points.
Hex numeric references (&, ©): same reach as decimal, but matches how Unicode charts present code points (U+0026 → &).
Raw UTF-8: the right choice for non-structural characters when your encoding pipeline is reliable.

When you're unsure, paste the content into the HTML Entity Encoder / Decoder and compare all three outputs before choosing.

Made by Toolora · Updated 2026-07-01