What is a code point and how is it different from a byte?

A code point is a Unicode number assigned to a character — for example, the letter 'A' is U+0041 and the snowman is U+2603. A byte is a unit of storage. In UTF-8 encoding, code points from U+0000 to U+007F take 1 byte each, U+0080–U+07FF take 2 bytes, U+0800–U+FFFF take 3 bytes, and U+10000–U+10FFFF take 4 bytes. This tool's UTF-8 column shows the exact hex bytes so you can calculate storage costs precisely.

Why does one emoji show as multiple rows?

Some emoji are built by joining simpler ones using U+200D ZERO WIDTH JOINER, or by appending a skin-tone modifier (U+1F3FB–U+1F3FF). For example, 👋🏽 is two code points: U+1F44B (WAVING HAND) + U+1F3FD (MEDIUM SKIN TONE). The inspector shows one row per code point because the underlying storage is per code point, even if the text renderer combines them into one visible glyph.

How do I copy just one field from a row?

Click any data cell in the table to copy its content to your clipboard. A brief highlight confirms the copy. To copy the entire table as TSV (tab-separated, for Excel or Google Sheets), click the 'Copy as TSV' button. To copy as JSON (an array of objects), click 'Copy as JSON'.

What does the 'Cat' column mean?

Cat stands for Unicode general category — a two-letter code that describes the character type. Common ones: Lu = uppercase letter, Ll = lowercase letter, Nd = decimal digit, Po = punctuation, Sm = math symbol, So = other symbol, Cc = control character, Mn = non-spacing combining mark, Zs = space separator. The full list is in the Unicode standard, but these covers most everyday characters.

Is my text uploaded anywhere?

No. All processing happens in your browser tab. The text you paste stays on your machine. The only data that leaves is whatever you deliberately put in a shared URL — so avoid sharing links that contain passwords or private information.

Find exactly why a regex that "looks right" fails to match

You write `/^\w+$/` to validate a username and it keeps rejecting "café". Paste "café" here: the table shows two rows for the final character — one for the 'e' (U+0065) and one for the combining acute accent (U+0301, COMBINING ACUTE ACCENT, category Mn). The regex engine sees two code points where you expected one glyph. The inspector gives you the exact JS escape (`́`) and the full character name so you can decide whether to normalize to NFC (`café`, one code point) or extend your regex to accept combining marks.

Understand emoji byte costs before sizing a database column

You need a VARCHAR column wide enough for user display names that may contain emoji. Paste a string like "Hi 👋🏽" and the inspector immediately shows that the wave emoji is two code points (U+1F44B + U+1F3FD skin-tone modifier) and their UTF-8 byte sequences consume 4 + 4 = 8 bytes. One row per code point makes the UTF-8 cost crystal-clear so you can calculate the exact `VARCHAR(N)` or `NVARCHAR` length needed without guessing.

Audit a copy-pasted config snippet for invisible characters

Your YAML parser throws on a line that looks perfectly fine in your editor. Paste it into the inspector and scan the Name column: if you spot U+00A0 NO-BREAK SPACE, U+200B ZERO WIDTH SPACE, or U+FEFF BYTE ORDER MARK, you've found the culprit. The JS Escape column gives you the exact `\uXXXX` form to use in a `replace()` call to strip the offending character precisely.

Generate the right HTML entity or JS escape for a special character

You need to hardcode an em dash, a non-breaking hyphen, or a copyright symbol in source code. Type the character and the inspector's table immediately shows its named HTML entity (e.g. `—`), decimal entity (`—`), and JS escape (`—`), all in one row. Click any cell to copy the exact form your code needs.

Unicode Code Point Inspector — per-character table with UTF-8, UTF-16, HTML & JS

Paste text → get a per-code-point table: U+XXXX, UTF-8 bytes, UTF-16, HTML entity, JS escape, character name.

Runs locally
Category Text
Best for Removing repetitive cleanup work from everyday writing and operations.

Text input

Enter text above to see its code-point breakdown.

What this tool does

A free, browser-only tool that breaks any text into a per-code-point table. Paste or type text and instantly see every code point as a row: the character itself, its code point notation (U+1F600), decimal and hex values, official Unicode name, general category, UTF-8 byte sequence, UTF-16 code units, HTML entity, and JavaScript escape — all in one scannable view. The summary counts total code points and UTF-8 bytes. Export the whole table as TSV (paste straight into Excel or Google Sheets) or as JSON. Ideal for debugging encoding issues, sizing database columns, generating precise HTML entities or JS escapes for tricky glyphs, and understanding how emoji or CJK characters are stored byte-by-byte. Handles BMP characters, surrogate-pair emoji, combining marks, control codes, and invisible characters. 100% client-side — nothing is uploaded.

Tool details

Input: Text + Numbers; The page exposes text boxes, numeric controls, file pickers, or structured inputs depending on the tool.
Output: Live result + Copy; The result area focuses on usable output, with copy, download, or preview actions when supported.
Privacy: Browser-side processing; The main tool logic does not call an external API, so inputs normally stay in the current tab.
Save / share: Shareable URL state; Key settings are encoded in the URL so another person can reopen the same setup.
Performance budget: Initial JS <= 30 KB; No WASM budget is declared, keeping the tool quick to open on mobile.
Best fit: Text · Developer; Category and role tags drive related tools, internal links, and quick fit checks.

How to use

1. Input

Paste or drop your content into the tool panel.
2. Process

Click the button. All processing is local in your browser.
3. Copy / Download

Copy the result or download to disk in one click.

How Unicode Code Point Inspector fits into your work

Use it to clean, compare, reshape, or extract plain text before it goes into a document, CMS, spreadsheet, or prompt.

Text jobs

Removing repetitive cleanup work from everyday writing and operations.
Making text easier to compare, paste, publish, or feed into another tool.
Working with content locally when the text is private or unfinished.

Text checks

Scan for unintended whitespace, duplicate lines, and lost punctuation.
For long text, test the first few lines before applying the whole change.
Copy the final output only after checking the preview.

Good next steps

These links move the current task into a more complete workflow.

Real-world use cases

Find exactly why a regex that "looks right" fails to match
You write `/^\w+$/` to validate a username and it keeps rejecting "café". Paste "café" here: the table shows two rows for the final character — one for the 'e' (U+0065) and one for the combining acute accent (U+0301, COMBINING ACUTE ACCENT, category Mn). The regex engine sees two code points where you expected one glyph. The inspector gives you the exact JS escape (`́`) and the full character name so you can decide whether to normalize to NFC (`café`, one code point) or extend your regex to accept combining marks.
Understand emoji byte costs before sizing a database column
You need a VARCHAR column wide enough for user display names that may contain emoji. Paste a string like "Hi 👋🏽" and the inspector immediately shows that the wave emoji is two code points (U+1F44B + U+1F3FD skin-tone modifier) and their UTF-8 byte sequences consume 4 + 4 = 8 bytes. One row per code point makes the UTF-8 cost crystal-clear so you can calculate the exact `VARCHAR(N)` or `NVARCHAR` length needed without guessing.
Audit a copy-pasted config snippet for invisible characters
Your YAML parser throws on a line that looks perfectly fine in your editor. Paste it into the inspector and scan the Name column: if you spot U+00A0 NO-BREAK SPACE, U+200B ZERO WIDTH SPACE, or U+FEFF BYTE ORDER MARK, you've found the culprit. The JS Escape column gives you the exact `\uXXXX` form to use in a `replace()` call to strip the offending character precisely.
Generate the right HTML entity or JS escape for a special character
You need to hardcode an em dash, a non-breaking hyphen, or a copyright symbol in source code. Type the character and the inspector's table immediately shows its named HTML entity (e.g. `—`), decimal entity (`—`), and JS escape (`—`), all in one row. Click any cell to copy the exact form your code needs.

Common pitfalls

Counting characters with `.length` in JavaScript gives you UTF-16 code units, not characters. `'😀'.length` is 2, not 1. The code-point count shown in this tool's summary (`[...str].length`) is the correct number of characters for most use cases — though even that counts multi-code-point emoji sequences as more than one if they have skin tone modifiers or ZWJ chains.
A code point is NOT always the same as a rendered character. The combining character é (e + U+0301) is one visible glyph but two code points. Byte-level database column widths depend on UTF-8 byte length (up to 4 bytes per code point), not glyph count. This table's UTF-8 column shows exactly how many bytes each code point needs.
HTML entities use decimal by default (`😀`) but named entities exist only for a small subset of Unicode. If you paste your entity into this tool and the Name column shows the correct Unicode name, the named entity is safe. For everything else, numeric entities (`&#NNNNN;` or `&#xHHHHH;`) are universally supported.

Privacy

Every operation — code-point extraction, UTF-8 / UTF-16 byte math, HTML entity and JS escape generation, character name lookup — runs entirely in your browser. The text you paste is never uploaded or logged. The input is stored in the shareable URL query string, so a shared link re-opens the same view; avoid sharing links built from passwords, tokens, or private messages.

FAQ

Tool combos

Folks in your role tend to reach for these alongside this tool.

Developer

Browse all tools for this role

Unicode Code Point Inspector — per-character table with UTF-8, UTF-16, HTML & JS

What this tool does

Tool details

How to use

1. Input

2. Process

3. Copy / Download

How Unicode Code Point Inspector fits into your work

Text jobs

Text checks

Good next steps

Real-world use cases

Find exactly why a regex that "looks right" fails to match

Understand emoji byte costs before sizing a database column

Audit a copy-pasted config snippet for invisible characters

Generate the right HTML entity or JS escape for a special character

Common pitfalls

Privacy

FAQ

Unicode Character Inspector

Unicode Escape Converter

Unicode Normalizer

HTML Entities Encoder

Base64 Encoder & Decoder

Text to Binary Converter

UTF-8 Byte Counter

A1Z26 Cipher (Letter ⇄ Number)

Chinese Acupoint Locator

Ad Copy Checklist

Add Line Numbers

Aesthetic Text Generator