How many bytes is one Chinese character in UTF-8?

A common Chinese character such as 中 is 3 bytes in UTF-8 (the bytes e4 b8 ad). Most CJK characters fall in this 3-byte range. That is why a VARCHAR(10) column under a UTF-8 charset can hold 10 Latin letters but only enough room for about 3 Chinese characters if the byte limit is 10. Paste 中文 into this tool and you will see 6 UTF-8 bytes for 2 characters.

How many bytes is an emoji?

Most emoji are 4 bytes in UTF-8 because they sit in Unicode astral planes above U+FFFF. The grinning face 😀 (U+1F600) is f0 9f 98 80, four bytes, one single code point. Combined emoji are larger: a flag or a family emoji is several code points joined by zero-width joiners, so 👨👩👧 can be 18 or more UTF-8 bytes even though it looks like one symbol on screen.

What is the difference between bytes and characters?

A character is a human unit; a byte is a storage unit. In ASCII text they match (one letter is one byte), but the moment you use accents, CJK or emoji they diverge. The word café is 4 characters but 5 UTF-8 bytes, because é needs 2 bytes. Always size storage and network limits by bytes, never by character count, or non-English text will overflow.

Why does a database VARCHAR limit count bytes?

Databases store the encoded form on disk, so a VARCHAR length expressed in bytes (common in PostgreSQL, older MySQL utf8, and many fixed buffers) is a byte budget, not a character budget. A name field of 20 bytes holds 20 ASCII letters, roughly 6 Chinese characters, or 5 four-byte emoji. Use the UTF-8 byte number here to predict whether a value will fit before the insert fails.

Why is JavaScript .length not the number of characters?

String .length returns UTF-16 code units, not characters. Characters outside the Basic Multilingual Plane are stored as a surrogate pair of two code units, so '😀'.length is 2 even though it is one character. To count real characters use the code point count ([...str].length), and to count what a person sees use the grapheme count. This tool shows all three side by side so you can see exactly where they disagree.

Size a database column before it overflows

You are adding a display-name field and the column is VARCHAR with a byte limit. Paste a few worst-case names with accents and CJK, read the UTF-8 byte count, and pick a column width that will not reject real users at insert time.

Fit text into a fixed network or protocol buffer

A binary protocol gives you a fixed number of bytes for a string field. Paste your candidate value and check the UTF-8 byte total against the cap, so you trim by bytes rather than guessing by character count and corrupting a multibyte sequence at the boundary.

Check SMS and message length limits

An SMS segment and many chat APIs are limited by encoded size, not by visible characters. Drop your message in, watch the byte and code point counts, and know in advance whether it splits into a second billed segment once an emoji or two pushes it over.

Debug why .length disagrees with your backend

Your frontend says a string is 8 long but the API rejects it as too big. Paste it here and compare UTF-16 length, code point count and UTF-8 bytes; the gap usually exposes a surrogate pair or a stack of multibyte characters that the byte-based backend counts differently.

UTF-8 Byte Counter for Strings

Count UTF-8 bytes, UTF-16 code units, Unicode code points and characters in any string, right in your browser

Runs locally
Category Developer & DevOps
Best for Formatting, validating, shrinking, or inspecting code-adjacent text.

Text

UTF-8 bytes

TextEncoder · file / DB / socket size

UTF-16 code units

JavaScript .length

Unicode code points

[...str].length

Characters (graphemes)

what a human counts

UTF-16 bytes

code units × 2

Lines

split on newlines

What this tool does

A free byte counter that tells you exactly how many bytes a string takes once it is encoded. Paste any text and read five numbers at once: UTF-8 bytes (what a file, a socket or a database stores), UTF-16 code units (JavaScript .length and the unit most languages call a "char"), Unicode code points (real characters including astral ones), grapheme characters (what a human counts), and line count. The byte total uses the browser's own TextEncoder, so multibyte text is exact: a Chinese character is 3 UTF-8 bytes, a plain emoji is 4. This is the tool to reach for when you are sizing a VARCHAR column, fitting a label into a fixed buffer, checking an SMS segment, or trimming text to a network packet limit. It runs fully in your browser with nothing uploaded, and the input round-trips through the URL so you can share a link that reopens the same text. One click copies every count.

Tool details

Input: Text; The page exposes text boxes, numeric controls, file pickers, or structured inputs depending on the tool.
Output: Live result + Copy; The result area focuses on usable output, with copy, download, or preview actions when supported.
Privacy: Browser-side processing; The main tool logic does not call an external API, so inputs normally stay in the current tab.
Save / share: Shareable URL state; Key settings are encoded in the URL so another person can reopen the same setup.
Performance budget: Initial JS <= 9 KB; No WASM budget is declared, keeping the tool quick to open on mobile.
Best fit: Developer & DevOps · Developer; Category and role tags drive related tools, internal links, and quick fit checks.

How to use

1. Input

Paste or drop your content into the tool panel.
2. Process

Click the button. All processing is local in your browser.
3. Copy / Download

Copy the result or download to disk in one click.

How UTF-8 Byte Counter fits into your work

Use it in the small gaps between coding, reviewing, debugging, and shipping.

Developer jobs

Formatting, validating, shrinking, or inspecting code-adjacent text.
Preparing snippets for documentation, tickets, commits, or handoff.
Checking a small payload quickly without switching tools.

Developer checks

Run irreversible transforms like minify or obfuscate on a copy.
Keep secrets out of pasted snippets unless the tool explicitly stays local.
Use your normal tests or linter before shipping transformed code.

Good next steps

These links move the current task into a more complete workflow.

Real-world use cases

Size a database column before it overflows
You are adding a display-name field and the column is VARCHAR with a byte limit. Paste a few worst-case names with accents and CJK, read the UTF-8 byte count, and pick a column width that will not reject real users at insert time.
Fit text into a fixed network or protocol buffer
A binary protocol gives you a fixed number of bytes for a string field. Paste your candidate value and check the UTF-8 byte total against the cap, so you trim by bytes rather than guessing by character count and corrupting a multibyte sequence at the boundary.
Check SMS and message length limits
An SMS segment and many chat APIs are limited by encoded size, not by visible characters. Drop your message in, watch the byte and code point counts, and know in advance whether it splits into a second billed segment once an emoji or two pushes it over.
Debug why .length disagrees with your backend
Your frontend says a string is 8 long but the API rejects it as too big. Paste it here and compare UTF-16 length, code point count and UTF-8 bytes; the gap usually exposes a surrogate pair or a stack of multibyte characters that the byte-based backend counts differently.

Common pitfalls

Validating length with .length and assuming it equals characters. For an emoji or any astral character .length counts 2 per character, so a 140-unit limit rejects text a user thinks is well under 140 characters.
Sizing storage by character count instead of bytes. Ten Chinese characters look like 10 but take 30 UTF-8 bytes, so a 16-byte buffer that fits 16 Latin letters overflows on the third Chinese character.
Treating one emoji as one code point. Many emoji are sequences joined by zero-width joiners or modifiers, so a single glyph on screen can be several code points and a dozen or more bytes.

Privacy

Every count runs as plain JavaScript inside your browser tab using the built-in TextEncoder. Your text is never uploaded and nothing is logged. The one thing to note: the input is encoded into the page URL so a share link reopens the same text, which means a link you paste into chat carries that text in the query string and lands in the recipient server's access log. For anything sensitive, copy the counts instead of sharing the URL.

FAQ

Tool combos

Folks in your role tend to reach for these alongside this tool.

Developer

Browse all tools for this role

UTF-8 Byte Counter for Strings

What this tool does

Tool details

How to use

1. Input

2. Process

3. Copy / Download

How UTF-8 Byte Counter fits into your work

Developer jobs

Developer checks

Good next steps

Real-world use cases

Size a database column before it overflows

Fit text into a fixed network or protocol buffer

Check SMS and message length limits

Debug why .length disagrees with your backend

Common pitfalls

Privacy

FAQ

JSON Formatter & Validator

Regex Tester

Word Counter

Data Storage Converter

Text to Hex Converter

Unicode Character Inspector

AI Eval Planner

Apache Cheatsheet

API Key Generator

API Rate Limit Cheatsheet

ASCII Table Generator

ASCII Table Reference