What's the difference between a grapheme and a code point?

A grapheme (grapheme cluster) is one "user-perceived character" — what a person would call a single character. A code point is one Unicode scalar value (U+XXXX). They often match, but not always: 👍🏽 is one grapheme made of two code points (thumbs-up + skin-tone modifier), and é can be one code point (U+00E9) or two (e + combining accent U+0301). This tool counts both so you know which number to trust.

Why does one emoji count as several code points?

Many emoji are built by joining simpler ones. A family 👨👩👧👦 is four person emoji glued together with U+200D ZERO WIDTH JOINER characters, and flags are pairs of regional-indicator letters. Your eyes see one symbol, but the encoding stores several code points (and even more UTF-8 bytes). The tool expands the cluster so you can see every piece.

How do I find a hidden zero-width or invisible character?

Paste the suspect text. Any zero-width space (U+200B), zero-width joiner (U+200D), BOM (U+FEFF), no-break space (U+00A0), or directional override gets a colored "invisible" or "control" flag in the per-character list, along with its code point and a JS escape like you can use to strip it. The summary also reports how many invisible characters were found.

What does the confusable / homoglyph flag mean?

Some characters from different scripts look identical — Latin "a" (U+0061) vs Cyrillic "а" (U+0430), or Greek "ο" vs Latin "o". Attackers use these to spoof domains, usernames, and brand names. When the tool detects a character that's a common homoglyph of a basic Latin letter, it flags it so you can catch script-mixing that the eye can't.

Is my text sent anywhere?

No. All decoding and detection runs in your browser with no network calls. The only data that leaves is whatever you deliberately put in a shared URL, so avoid sharing links built from secrets.

Find the invisible character breaking a "duplicate" username check

A signup form keeps rejecting "john" as taken, but the support team swears the account is free. Paste both strings here. One of them turns out to be `j o h n` where the second character is actually U+0435 CYRILLIC SMALL LETTER IE, not the Latin `o` — a homoglyph. The tool flags it red as a confusable, shows the code point, and you immediately know the original signup used a Cyrillic letter to squat the name. Without per-character code points you would have stared at two visually identical strings forever.

Debug a JSON parser that chokes on a "clean" pasted config

Your YAML/JSON loader throws "unexpected token" on line 1, but the file looks perfect in the editor. Paste the first line here. The summary flags a U+FEFF BYTE ORDER MARK sitting before the opening brace, or a U+00A0 NO-BREAK SPACE where you expected a regular space. Both are invisible in most editors. The tool highlights them as zero-width / invisible, gives you the `` JS escape so you can strip it with a precise `replace`, and the parse error disappears.

Verify exactly how an emoji is encoded before storing it

You are sizing a `VARCHAR` column and a single 👨👩👧👦 family emoji blows past your length limit. Paste it: the grapheme count says 1, the code-point count says 7, and the UTF-8 byte count says 25. The tool breaks the cluster into its parts — four person emoji joined by three U+200D ZERO WIDTH JOINER characters — so you understand why "one emoji" needs 25 bytes and can size the column (or your Twitter-style character budget) correctly.

Build a precise HTML entity or JS escape for a tricky glyph

You need to hardcode an em dash, a non-breaking space, or a mathematical symbol into source without pasting the raw character (which a teammate's editor might mangle). Type or paste the glyph, and each row gives you the named HTML entity (`—`), the numeric forms (`—` / `—`), and the JS escape (`—` or `\u{1F600}` for astral characters). One click copies the exact form your file needs.

Audit user-supplied text for spoofing before it hits production

You are reviewing a display name, a domain label, or a coupon code submitted by a user. Paste it and scan the right column: any zero-width space, right-to-left override (U+202E, the classic filename-spoofing trick), or mixed-script confusable gets a colored flag. You get a per-character verdict instead of trusting a string that "looks normal" — which is exactly how homograph phishing and RTL-override attacks slip through naive validation.

Unicode Character Inspector — code points, bytes, entities & hidden glyphs

Inspect any text character-by-character: code points, UTF-8/UTF-16 bytes, HTML entities, JS escapes, names, and hidden zero-width / confusable glyphs.

Runs locally
Category Text
Best for Removing repetitive cleanup work from everyday writing and operations.

Text to inspect

0characters0code points0UTF-8 bytes0UTF-16 units

View

Per-character breakdown appears here.

What this tool does

A free, fully browser-side Unicode inspector. Paste or type any text and get a per-character breakdown: the glyph itself, its code point (U+XXXX), decimal value, UTF-8 bytes in hex, UTF-16 code units, the named and numeric HTML entities, the JavaScript escape, and the Unicode block plus general category. It splits text into true grapheme clusters, so a flag, a skin-toned thumbs-up, or a multi-person family emoji is counted as one character even though it spans several code points. The summary tallies graphemes, code points, and UTF-8 bytes at a glance. Most usefully, it flags invisible and dangerous characters — zero-width spaces and joiners, the BOM, no-break spaces, right-to-left overrides, and Latin / Cyrillic / Greek homoglyphs that spoof one script as another — so you can find the one character quietly breaking a parser, a username check, or a security boundary. Every field has a one-click copy button. Nothing is uploaded; all decoding happens locally.

Tool details

Input: Text; The page exposes text boxes, numeric controls, file pickers, or structured inputs depending on the tool.
Output: Live result + Copy; The result area focuses on usable output, with copy, download, or preview actions when supported.
Privacy: Browser-side processing; The main tool logic does not call an external API, so inputs normally stay in the current tab.
Save / share: Shareable URL state; Key settings are encoded in the URL so another person can reopen the same setup.
Performance budget: Initial JS <= 16 KB; No WASM budget is declared, keeping the tool quick to open on mobile.
Best fit: Text · Developer; Category and role tags drive related tools, internal links, and quick fit checks.

How to use

1. Input

Paste or drop your content into the tool panel.
2. Process

Click the button. All processing is local in your browser.
3. Copy / Download

Copy the result or download to disk in one click.

How Unicode Character Inspector fits into your work

Use it to clean, compare, reshape, or extract plain text before it goes into a document, CMS, spreadsheet, or prompt.

Text jobs

Removing repetitive cleanup work from everyday writing and operations.
Making text easier to compare, paste, publish, or feed into another tool.
Working with content locally when the text is private or unfinished.

Text checks

Scan for unintended whitespace, duplicate lines, and lost punctuation.
For long text, test the first few lines before applying the whole change.
Copy the final output only after checking the preview.

Good next steps

These links move the current task into a more complete workflow.

Real-world use cases

Find the invisible character breaking a "duplicate" username check
A signup form keeps rejecting "john" as taken, but the support team swears the account is free. Paste both strings here. One of them turns out to be `j o h n` where the second character is actually U+0435 CYRILLIC SMALL LETTER IE, not the Latin `o` — a homoglyph. The tool flags it red as a confusable, shows the code point, and you immediately know the original signup used a Cyrillic letter to squat the name. Without per-character code points you would have stared at two visually identical strings forever.
Debug a JSON parser that chokes on a "clean" pasted config
Your YAML/JSON loader throws "unexpected token" on line 1, but the file looks perfect in the editor. Paste the first line here. The summary flags a U+FEFF BYTE ORDER MARK sitting before the opening brace, or a U+00A0 NO-BREAK SPACE where you expected a regular space. Both are invisible in most editors. The tool highlights them as zero-width / invisible, gives you the `` JS escape so you can strip it with a precise `replace`, and the parse error disappears.
Verify exactly how an emoji is encoded before storing it
You are sizing a `VARCHAR` column and a single 👨‍👩‍👧‍👦 family emoji blows past your length limit. Paste it: the grapheme count says 1, the code-point count says 7, and the UTF-8 byte count says 25. The tool breaks the cluster into its parts — four person emoji joined by three U+200D ZERO WIDTH JOINER characters — so you understand why "one emoji" needs 25 bytes and can size the column (or your Twitter-style character budget) correctly.
Build a precise HTML entity or JS escape for a tricky glyph
You need to hardcode an em dash, a non-breaking space, or a mathematical symbol into source without pasting the raw character (which a teammate's editor might mangle). Type or paste the glyph, and each row gives you the named HTML entity (`—`), the numeric forms (`—` / `—`), and the JS escape (`—` or `\u{1F600}` for astral characters). One click copies the exact form your file needs.
Audit user-supplied text for spoofing before it hits production
You are reviewing a display name, a domain label, or a coupon code submitted by a user. Paste it and scan the right column: any zero-width space, right-to-left override (U+202E, the classic filename-spoofing trick), or mixed-script confusable gets a colored flag. You get a per-character verdict instead of trusting a string that "looks normal" — which is exactly how homograph phishing and RTL-override attacks slip through naive validation.

Common pitfalls

A grapheme is not a code point. The flag emoji 🇯🇵 is one thing you see but two code points (regional indicators J + P), and 👍🏽 is one emoji but two code points (thumbs-up + skin-tone modifier). If you slice a string by code point or by UTF-16 unit you can cut an emoji in half and corrupt it. Use the grapheme count this tool shows for anything user-facing.
`String.length` in JavaScript counts UTF-16 code units, not characters. `'𝄞'.length` is 2, not 1, because the G-clef lives above U+FFFF and needs a surrogate pair. Use the code-point count (`[...str].length`) for real character counts, and never assume one index step equals one character.
Stripping spaces with `.replace(/ /g, '')` misses the invisible ones. U+00A0 no-break space, U+200B zero-width space, U+FEFF BOM, and U+3000 ideographic space all read as whitespace to a human but not to a naive regex. Copy the exact code point from this tool and target it specifically.

Privacy

Every lookup — code-point decoding, UTF-8 / UTF-16 byte math, entity and escape generation, and the confusable / zero-width detection — runs as plain JavaScript in your browser tab. The text you inspect is never uploaded, logged, or sent to any server. The one caveat: your input is stored in the shareable URL query string so a "share link" reproduces the same view, which means a pasted link will record that text in the destination's logs. Do not share links built from passwords, tokens, or private messages — inspect those locally and copy the fields you need instead.

FAQ

Tool combos

Folks in your role tend to reach for these alongside this tool.

Developer

Browse all tools for this role

Unicode Character Inspector — code points, bytes, entities & hidden glyphs

What this tool does

Tool details

How to use

1. Input

2. Process

3. Copy / Download

How Unicode Character Inspector fits into your work

Text jobs

Text checks

Good next steps

Real-world use cases

Find the invisible character breaking a "duplicate" username check

Debug a JSON parser that chokes on a "clean" pasted config

Verify exactly how an emoji is encoded before storing it

Build a precise HTML entity or JS escape for a tricky glyph

Audit user-supplied text for spoofing before it hits production

Common pitfalls

Privacy

FAQ

HTML Entities Encoder

Text to Binary Converter

Emoji Finder

Base64 Encoder & Decoder

URL Encoder / Decoder

Morse Code Translator

A1Z26 Cipher (Letter ⇄ Number)

Chinese Acupoint Locator

Add Line Numbers

Aesthetic Text Generator

AI Eval Planner

AI Model Comparison