Skip to main content

Unicode Character Inspector — code points, bytes, entities & hidden glyphs

Inspect any text character-by-character: code points, UTF-8/UTF-16 bytes, HTML entities, JS escapes, names, and hidden zero-width / confusable glyphs.

  • Runs locally
  • Category Text
  • Best for Removing repetitive cleanup work from everyday writing and operations.
0characters0code points0UTF-8 bytes0UTF-16 units
View

Per-character breakdown appears here.

What this tool does

A free, fully browser-side Unicode inspector. Paste or type any text and get a per-character breakdown: the glyph itself, its code point (U+XXXX), decimal value, UTF-8 bytes in hex, UTF-16 code units, the named and numeric HTML entities, the JavaScript escape, and the Unicode block plus general category. It splits text into true grapheme clusters, so a flag, a skin-toned thumbs-up, or a multi-person family emoji is counted as one character even though it spans several code points. The summary tallies graphemes, code points, and UTF-8 bytes at a glance. Most usefully, it flags invisible and dangerous characters — zero-width spaces and joiners, the BOM, no-break spaces, right-to-left overrides, and Latin / Cyrillic / Greek homoglyphs that spoof one script as another — so you can find the one character quietly breaking a parser, a username check, or a security boundary. Every field has a one-click copy button. Nothing is uploaded; all decoding happens locally.

Tool details

Input
Text
The page exposes text boxes, numeric controls, file pickers, or structured inputs depending on the tool.
Output
Live result + Copy
The result area focuses on usable output, with copy, download, or preview actions when supported.
Privacy
Browser-side processing
The main tool logic does not call an external API, so inputs normally stay in the current tab.
Save / share
Shareable URL state
Key settings are encoded in the URL so another person can reopen the same setup.
Performance budget
Initial JS <= 16 KB
No WASM budget is declared, keeping the tool quick to open on mobile.
Best fit
Text · Developer
Category and role tags drive related tools, internal links, and quick fit checks.

How to use

  1. 1. Input

    Paste or drop your content into the tool panel.

  2. 2. Process

    Click the button. All processing is local in your browser.

  3. 3. Copy / Download

    Copy the result or download to disk in one click.

How Unicode Character Inspector fits into your work

Use it to clean, compare, reshape, or extract plain text before it goes into a document, CMS, spreadsheet, or prompt.

Text jobs

  • Removing repetitive cleanup work from everyday writing and operations.
  • Making text easier to compare, paste, publish, or feed into another tool.
  • Working with content locally when the text is private or unfinished.

Text checks

  • Scan for unintended whitespace, duplicate lines, and lost punctuation.
  • For long text, test the first few lines before applying the whole change.
  • Copy the final output only after checking the preview.

Good next steps

These links move the current task into a more complete workflow.

  1. 1 HTML Entities Encoder Encode/decode HTML entities — &amp; &lt; &gt; &quot; &#39; and all numeric refs — browser-only Open
  2. 2 Text to Binary Converter Text to binary (and back) — UTF-8 aware, 8/16/32 bit grouping, emoji safe. Open
  3. 3 Emoji Finder Unicode 15.1 / 1500+ emojis with bilingual search — one-click copy, browser-only Open

Real-world use cases

  • Find the invisible character breaking a "duplicate" username check

    A signup form keeps rejecting "john" as taken, but the support team swears the account is free. Paste both strings here. One of them turns out to be `j o h n` where the second character is actually U+0435 CYRILLIC SMALL LETTER IE, not the Latin `o` — a homoglyph. The tool flags it red as a confusable, shows the code point, and you immediately know the original signup used a Cyrillic letter to squat the name. Without per-character code points you would have stared at two visually identical strings forever.

  • Debug a JSON parser that chokes on a "clean" pasted config

    Your YAML/JSON loader throws "unexpected token" on line 1, but the file looks perfect in the editor. Paste the first line here. The summary flags a U+FEFF BYTE ORDER MARK sitting before the opening brace, or a U+00A0 NO-BREAK SPACE where you expected a regular space. Both are invisible in most editors. The tool highlights them as zero-width / invisible, gives you the `` JS escape so you can strip it with a precise `replace`, and the parse error disappears.

  • Verify exactly how an emoji is encoded before storing it

    You are sizing a `VARCHAR` column and a single 👨‍👩‍👧‍👦 family emoji blows past your length limit. Paste it: the grapheme count says 1, the code-point count says 7, and the UTF-8 byte count says 25. The tool breaks the cluster into its parts — four person emoji joined by three U+200D ZERO WIDTH JOINER characters — so you understand why "one emoji" needs 25 bytes and can size the column (or your Twitter-style character budget) correctly.

  • Build a precise HTML entity or JS escape for a tricky glyph

    You need to hardcode an em dash, a non-breaking space, or a mathematical symbol into source without pasting the raw character (which a teammate's editor might mangle). Type or paste the glyph, and each row gives you the named HTML entity (`&mdash;`), the numeric forms (`&#8212;` / `&#x2014;`), and the JS escape (`—` or `\u{1F600}` for astral characters). One click copies the exact form your file needs.

  • Audit user-supplied text for spoofing before it hits production

    You are reviewing a display name, a domain label, or a coupon code submitted by a user. Paste it and scan the right column: any zero-width space, right-to-left override (U+202E, the classic filename-spoofing trick), or mixed-script confusable gets a colored flag. You get a per-character verdict instead of trusting a string that "looks normal" — which is exactly how homograph phishing and RTL-override attacks slip through naive validation.

Common pitfalls

  • A grapheme is not a code point. The flag emoji 🇯🇵 is one thing you see but two code points (regional indicators J + P), and 👍🏽 is one emoji but two code points (thumbs-up + skin-tone modifier). If you slice a string by code point or by UTF-16 unit you can cut an emoji in half and corrupt it. Use the grapheme count this tool shows for anything user-facing.

  • `String.length` in JavaScript counts UTF-16 code units, not characters. `'𝄞'.length` is 2, not 1, because the G-clef lives above U+FFFF and needs a surrogate pair. Use the code-point count (`[...str].length`) for real character counts, and never assume one index step equals one character.

  • Stripping spaces with `.replace(/ /g, '')` misses the invisible ones. U+00A0 no-break space, U+200B zero-width space, U+FEFF BOM, and U+3000 ideographic space all read as whitespace to a human but not to a naive regex. Copy the exact code point from this tool and target it specifically.

Privacy

Every lookup — code-point decoding, UTF-8 / UTF-16 byte math, entity and escape generation, and the confusable / zero-width detection — runs as plain JavaScript in your browser tab. The text you inspect is never uploaded, logged, or sent to any server. The one caveat: your input is stored in the shareable URL query string so a "share link" reproduces the same view, which means a pasted link will record that text in the destination's logs. Do not share links built from passwords, tokens, or private messages — inspect those locally and copy the fields you need instead.

FAQ

Tool combos

Folks in your role tend to reach for these alongside this tool.

Made by Toolora · 100% client-side · Updated 2026-06-14