What is a Unicode code point?

A code point is a unique number assigned to every character in the Unicode standard, written as U+XXXX in hexadecimal. For example, the Latin letter A is U+0041, the emoji 😀 is U+1F600, and the CJK ideograph 你 is U+4F60. Unicode 15.1 covers 149,813 assigned characters across 161 scripts, all within the range U+0000 to U+10FFFF (1,114,112 possible code points).

How does UTF-8 encode a code point?

UTF-8 is a variable-width encoding. Code points U+0000–U+007F use 1 byte (identical to ASCII). U+0080–U+07FF use 2 bytes. U+0800–U+FFFF use 3 bytes. U+10000–U+10FFFF use 4 bytes. Each continuation byte starts with 10xxxxxx, and the lead byte signals the length (0xxxxxxx / 110xxxxx / 1110xxxx / 11110xxx). For example, 你 (U+4F60) encodes as the 3-byte sequence E4 BD A0.

When does UTF-16 use a surrogate pair?

UTF-16 stores code points in 16-bit units. Characters U+0000–U+FFFF fit in one unit. Characters U+10000–U+10FFFF (the supplementary planes, which include most emoji) require two units called a surrogate pair — a high surrogate (U+D800–U+DBFF) followed by a low surrogate (U+DC00–U+DFFF). The formula is to subtract 0x10000, split the result into a 10-bit high part and a 10-bit low part, then add 0xD800 / 0xDC00 respectively. For example, 😀 (U+1F600) becomes the surrogate pair 0xD83D 0xDE00.

What is the Unicode General Category?

Every code point belongs to a General Category that describes its fundamental type. The two-letter codes group into broad classes — L for letters (Lu uppercase, Ll lowercase, Lt titlecase, Lm modifier, Lo other), M for marks, N for numbers (Nd decimal, Nl letter number, No other), P for punctuation, S for symbols (Sm math, Sc currency, So other), Z for separators, C for other/control. Knowing the category lets you write robust regex patterns, for instance \p{Lu} matches all uppercase letters across every script.

How do I use a Unicode code point in HTML or CSS?

In HTML write 😀 (hex) or 😀 (decimal) for the numeric character reference, or a named entity like & for ampersand. In CSS use the escape \1F600 (backslash then hex digits, no U+ prefix) inside a content property or selector. In JavaScript strings write \u{1F600} for supplementary characters (ES6+) or the explicit surrogate pair for older code. This tool generates all three forms for you automatically.

Debugging why a character breaks a JSON or SQL query

A curly apostrophe (U+2019, RIGHT SINGLE QUOTATION MARK, UTF-8: E2 80 99) looks identical to an ASCII apostrophe but breaks string literals in SQL and JSON parsers that expect U+0027. Paste the suspicious character into this tool, confirm the code point, then replace it with the correct ASCII equivalent — or use the HTML entity ’ for HTML-safe rendering.

Understanding emoji encoding for mobile app development

Emoji like 😀 (U+1F600) live in Unicode's supplementary planes and need a 4-byte UTF-8 sequence (F0 9F 98 80) and a UTF-16 surrogate pair (D83D DE00). iOS Swift, Android Kotlin, and JavaScript each handle these differently. Enter any emoji here to see the exact byte sequences and surrogate pair values you need for your target platform.

Verifying CJK character encoding in Chinese/Japanese/Korean text

Chinese, Japanese, and Korean characters (U+4E00–U+9FFF and extensions) each take 3 bytes in UTF-8. If a database column stores them as latin1 instead of utf8mb4, every Chinese character corrupts. Paste suspect characters here to see their exact UTF-8 encoding and confirm what collation your table must use.

Unicode Code Point Explorer — UTF-8, UTF-16, Category & Script

Inspect any character — code point, UTF-8 bytes, UTF-16 encoding, Unicode category, script, and block — instant, browser-only

Runs locally
Category Encoding & Crypto
Best for Checking small payloads, tokens, hashes, and encoded values quickly.

Type or paste text above to inspect each character.

What this tool does

Free online Unicode code point explorer. Paste any text or enter a code point like U+1F600 and instantly see the Unicode code point (U+XXXX), official character name, Unicode general category (Lu, Ll, Nd…), script (Latin, Han, Arabic…), Unicode block, UTF-8 byte sequence, UTF-16 encoding with surrogate pairs, HTML entity, JavaScript escape (\u or \u{...}), and CSS escape. Handles emoji, CJK ideographs, Arabic, Devanagari, and all 1.1 million Unicode code points. 100% client-side — nothing is sent to any server.

Tool details

Input: Text + Numbers; The page exposes text boxes, numeric controls, file pickers, or structured inputs depending on the tool.
Output: Live result + Copy; The result area focuses on usable output, with copy, download, or preview actions when supported.
Privacy: Browser-side processing; The main tool logic does not call an external API, so inputs normally stay in the current tab.
Save / share: Shareable URL state; Key settings are encoded in the URL so another person can reopen the same setup.
Performance budget: Initial JS <= 28 KB; No WASM budget is declared, keeping the tool quick to open on mobile.
Best fit: Encoding & Crypto · Developer; Category and role tags drive related tools, internal links, and quick fit checks.

How to use

1. Input

Paste or drop your content into the tool panel.
2. Process

Click the button. All processing is local in your browser.
3. Copy / Download

Copy the result or download to disk in one click.

How Unicode Code Point Explorer fits into your work

Use it for quick browser-side encoding, decoding, hashing, token checks, and share-safe transformations.

Encoding jobs

Checking small payloads, tokens, hashes, and encoded values quickly.
Preparing values for APIs, URLs, docs, or support tickets.
Avoiding account-based tools when the input might be sensitive.

Encoding checks

Do not paste live secrets unless you are comfortable with local browser handling.
Confirm whether the operation is reversible before sharing the result.
For hashes, compare the exact algorithm and casing expected by the receiver.

Good next steps

These links move the current task into a more complete workflow.

Real-world use cases

Debugging why a character breaks a JSON or SQL query
A curly apostrophe (U+2019, RIGHT SINGLE QUOTATION MARK, UTF-8: E2 80 99) looks identical to an ASCII apostrophe but breaks string literals in SQL and JSON parsers that expect U+0027. Paste the suspicious character into this tool, confirm the code point, then replace it with the correct ASCII equivalent — or use the HTML entity ’ for HTML-safe rendering.
Understanding emoji encoding for mobile app development
Emoji like 😀 (U+1F600) live in Unicode's supplementary planes and need a 4-byte UTF-8 sequence (F0 9F 98 80) and a UTF-16 surrogate pair (D83D DE00). iOS Swift, Android Kotlin, and JavaScript each handle these differently. Enter any emoji here to see the exact byte sequences and surrogate pair values you need for your target platform.
Verifying CJK character encoding in Chinese/Japanese/Korean text
Chinese, Japanese, and Korean characters (U+4E00–U+9FFF and extensions) each take 3 bytes in UTF-8. If a database column stores them as latin1 instead of utf8mb4, every Chinese character corrupts. Paste suspect characters here to see their exact UTF-8 encoding and confirm what collation your table must use.

Common pitfalls

Confusing "Unicode code point" with "UTF-8 byte value". U+00E9 (é) is one code point but encodes as two UTF-8 bytes (0xC3 0xA9). Always check the byte sequence separately from the code point number.
Assuming every JavaScript string character is one code point. JS strings are UTF-16, so supplementary characters (U+10000+) have .length 2 (surrogate pair). Use for...of or Array.from to iterate by real code points.
Using \uXXXX escape for supplementary plane characters in JavaScript. \uXXXX only handles U+0000–U+FFFF. For emoji and other high code points, use \u{1F600} (ES6 template) or the explicit surrogate pair.

Privacy

All analysis runs entirely in your browser using the built-in TextEncoder API and JavaScript's Unicode property escapes. The text you paste or any code point you enter is never sent to any server and is not stored anywhere. The URL state encodes your input in the query string for shareability — avoid sharing links if your input contains sensitive identifiers.

FAQ

Tool combos

Folks in your role tend to reach for these alongside this tool.

Developer

Browse all tools for this role

Unicode Code Point Explorer — UTF-8, UTF-16, Category & Script

What this tool does

Tool details

How to use

1. Input

2. Process

3. Copy / Download

How Unicode Code Point Explorer fits into your work

Encoding jobs

Encoding checks

Good next steps

Real-world use cases

Debugging why a character breaks a JSON or SQL query

Understanding emoji encoding for mobile app development

Verifying CJK character encoding in Chinese/Japanese/Korean text

Common pitfalls

Privacy

FAQ

URL Encoder / Decoder

HTML Entities Encoder

Base64 Encoder & Decoder

Text Deduplicator

AES Text Encryptor

Affine Cipher Encoder & Decoder

Atbash Cipher

Bacon Cipher Encoder & Decoder

Base32 / Base58 Encoder & Decoder

Base62 Encoder & Decoder

Base64 Block Deduplicator

Base64 Block Extractor