Skip to main content

Unicode Code Point Inspector — per-character table with UTF-8, UTF-16, HTML & JS

Paste text → get a per-code-point table: U+XXXX, UTF-8 bytes, UTF-16, HTML entity, JS escape, character name.

  • Runs locally
  • Category Text
  • Best for Removing repetitive cleanup work from everyday writing and operations.

Enter text above to see its code-point breakdown.

What this tool does

A free, browser-only tool that breaks any text into a per-code-point table. Paste or type text and instantly see every code point as a row: the character itself, its code point notation (U+1F600), decimal and hex values, official Unicode name, general category, UTF-8 byte sequence, UTF-16 code units, HTML entity, and JavaScript escape — all in one scannable view. The summary counts total code points and UTF-8 bytes. Export the whole table as TSV (paste straight into Excel or Google Sheets) or as JSON. Ideal for debugging encoding issues, sizing database columns, generating precise HTML entities or JS escapes for tricky glyphs, and understanding how emoji or CJK characters are stored byte-by-byte. Handles BMP characters, surrogate-pair emoji, combining marks, control codes, and invisible characters. 100% client-side — nothing is uploaded.

Tool details

Input
Text + Numbers
The page exposes text boxes, numeric controls, file pickers, or structured inputs depending on the tool.
Output
Live result + Copy
The result area focuses on usable output, with copy, download, or preview actions when supported.
Privacy
Browser-side processing
The main tool logic does not call an external API, so inputs normally stay in the current tab.
Save / share
Shareable URL state
Key settings are encoded in the URL so another person can reopen the same setup.
Performance budget
Initial JS <= 30 KB
No WASM budget is declared, keeping the tool quick to open on mobile.
Best fit
Text · Developer
Category and role tags drive related tools, internal links, and quick fit checks.

How to use

  1. 1. Input

    Paste or drop your content into the tool panel.

  2. 2. Process

    Click the button. All processing is local in your browser.

  3. 3. Copy / Download

    Copy the result or download to disk in one click.

How Unicode Code Point Inspector fits into your work

Use it to clean, compare, reshape, or extract plain text before it goes into a document, CMS, spreadsheet, or prompt.

Text jobs

  • Removing repetitive cleanup work from everyday writing and operations.
  • Making text easier to compare, paste, publish, or feed into another tool.
  • Working with content locally when the text is private or unfinished.

Text checks

  • Scan for unintended whitespace, duplicate lines, and lost punctuation.
  • For long text, test the first few lines before applying the whole change.
  • Copy the final output only after checking the preview.

Good next steps

These links move the current task into a more complete workflow.

  1. 1 Unicode Character Inspector Inspect any text character-by-character: code points, UTF-8/UTF-16 bytes, HTML entities, JS escapes, names, and hidden zero-width / confusable glyphs. Open
  2. 2 Unicode Escape Converter Text ⇄ Unicode escapes — \uXXXX, \u{1F600}, &#128512;, CSS \1F600 — emoji + CJK done right, browser-only Open
  3. 3 Unicode Normalizer Normalize text to NFC, NFD, NFKC or NFKD, see code-point and byte counts shift, spot which characters changed, copy in one click, all in your browser Open

Real-world use cases

  • Find exactly why a regex that "looks right" fails to match

    You write `/^\w+$/` to validate a username and it keeps rejecting "café". Paste "café" here: the table shows two rows for the final character — one for the 'e' (U+0065) and one for the combining acute accent (U+0301, COMBINING ACUTE ACCENT, category Mn). The regex engine sees two code points where you expected one glyph. The inspector gives you the exact JS escape (`́`) and the full character name so you can decide whether to normalize to NFC (`café`, one code point) or extend your regex to accept combining marks.

  • Understand emoji byte costs before sizing a database column

    You need a VARCHAR column wide enough for user display names that may contain emoji. Paste a string like "Hi 👋🏽" and the inspector immediately shows that the wave emoji is two code points (U+1F44B + U+1F3FD skin-tone modifier) and their UTF-8 byte sequences consume 4 + 4 = 8 bytes. One row per code point makes the UTF-8 cost crystal-clear so you can calculate the exact `VARCHAR(N)` or `NVARCHAR` length needed without guessing.

  • Audit a copy-pasted config snippet for invisible characters

    Your YAML parser throws on a line that looks perfectly fine in your editor. Paste it into the inspector and scan the Name column: if you spot U+00A0 NO-BREAK SPACE, U+200B ZERO WIDTH SPACE, or U+FEFF BYTE ORDER MARK, you've found the culprit. The JS Escape column gives you the exact `\uXXXX` form to use in a `replace()` call to strip the offending character precisely.

  • Generate the right HTML entity or JS escape for a special character

    You need to hardcode an em dash, a non-breaking hyphen, or a copyright symbol in source code. Type the character and the inspector's table immediately shows its named HTML entity (e.g. `&mdash;`), decimal entity (`&#8212;`), and JS escape (`—`), all in one row. Click any cell to copy the exact form your code needs.

Common pitfalls

  • Counting characters with `.length` in JavaScript gives you UTF-16 code units, not characters. `'😀'.length` is 2, not 1. The code-point count shown in this tool's summary (`[...str].length`) is the correct number of characters for most use cases — though even that counts multi-code-point emoji sequences as more than one if they have skin tone modifiers or ZWJ chains.

  • A code point is NOT always the same as a rendered character. The combining character é (e + U+0301) is one visible glyph but two code points. Byte-level database column widths depend on UTF-8 byte length (up to 4 bytes per code point), not glyph count. This table's UTF-8 column shows exactly how many bytes each code point needs.

  • HTML entities use decimal by default (`&#128512;`) but named entities exist only for a small subset of Unicode. If you paste your entity into this tool and the Name column shows the correct Unicode name, the named entity is safe. For everything else, numeric entities (`&#NNNNN;` or `&#xHHHHH;`) are universally supported.

Privacy

Every operation — code-point extraction, UTF-8 / UTF-16 byte math, HTML entity and JS escape generation, character name lookup — runs entirely in your browser. The text you paste is never uploaded or logged. The input is stored in the shareable URL query string, so a shared link re-opens the same view; avoid sharing links built from passwords, tokens, or private messages.

FAQ

Tool combos

Folks in your role tend to reach for these alongside this tool.

Made by Toolora · 100% client-side · Updated 2026-07-01