Skip to main content

How to Normalize UUIDs and Microsoft GUIDs Into One Canonical Form

A UUID can be lowercase hyphenated, uppercase, stripped to 32 hex chars, or brace-wrapped as a GUID — all the same value, none string-equal. Here is how to pick one form.

Published By Li Lei
#uuid #guid #data-cleanup #developer-tools

How to Normalize UUIDs and Microsoft GUIDs Into One Canonical Form

A UUID is supposed to be a single, unambiguous identifier. In practice it is written four or five different ways, and two systems holding the same value will swear they hold different ones. I have watched a join return zero rows because one table stored 550e8400-e29b-41d4-a716-446655440000 and another stored 550E8400E29B41D4A716446655440000. Same 128 bits. Same record. Different strings. The database compares strings, so the match never happens.

This post is about that gap: lowercase versus uppercase hex, the hyphenated 8-4-4-4-12 layout versus the bare 32-character form versus the brace-wrapped {GUID} that Windows and .NET hand you, and why two correct systems can disagree about identity. Then it walks through normalizing a messy column into one canonical form with UUID Normalizer.

Four ways to write the same identifier

A version-4 UUID is 128 bits — 16 bytes, 32 hex digits. Everything past that is presentation, and the presentation varies by ecosystem:

  • Lowercase, hyphenated: 550e8400-e29b-41d4-a716-446655440000. This is the canonical 8-4-4-4-12 form from RFC 4122, and what most Postgres and Linux tooling emits.
  • Uppercase, hyphenated: 550E8400-E29B-41D4-A716-446655440000. Hex is case-insensitive as a number, but a string comparison treats e and E as different bytes.
  • Bare 32-char, no hyphens: 550e8400e29b41d4a716446655440000. Common in URL slugs, compact log lines, and storage that drops separators to save space.
  • Brace-wrapped GUID: {550e8400-e29b-41d4-a716-446655440000}. The Microsoft style — .NET's Guid.ToString("B"), the Windows registry, and COM all print the curly braces.

Here is the concrete trap worth pinning down: a single UUID can appear lowercased and hyphenated, uppercased, stripped to 32 hex characters, or brace-wrapped as a Microsoft GUID — all the same value, and none of them string-equal to each other. So the moment you join two systems, or dedupe a pasted column, you have to pick one canonical form and rewrite everything to match it. Skip that step and your unique index lets duplicates through, your WHERE id = ... misses, and your "distinct" count is wrong.

Why a database and an API disagree on the same UUID

The disagreement is rarely a bug in either system. It is a defaults mismatch.

A .NET service serializes a Guid and, depending on the format specifier, ships {550E8400-E29B-41D4-A716-446655440000} — uppercase, braces and all. The receiving Postgres column is typed uuid, which stores the value canonically and prints it back lowercase and hyphenated without braces. Now an analyst exports both sides to CSV, pastes them side by side, and the rows that should reconcile do not, because one column is upper-with-braces and the other is lower-plain. Both stores are internally consistent. They just never agreed on a wire format.

The same thing happens with logs. A request ID gets logged uppercase by one middleware and lowercase by the next, and your grep for the trace finds half the lines. Nothing is corrupted — the text is just inconsistent, and inconsistent text does not deduplicate, sort, or join.

A worked example: mixed forms in, one form out

Say a support ticket and two exports left you with this column, every line a different spelling of overlapping values:

550e8400-e29b-41d4-a716-446655440000
{550E8400-E29B-41D4-A716-446655440000}
550e8400e29b41d4a716446655440000
6ba7b810-9dad-11d1-80b4-00c04fd430c8
6BA7B810-9DAD-11D1-80B4-00C04FD430C8
not-a-uuid-12345

Paste that into the normalizer with deduplicate and sort on. It lowercases the hex, strips the braces from the {…} form, and re-inserts the canonical 8-4-4-4-12 hyphens on the bare 32-character string. The first three lines collapse to one value; the next two collapse to another. The clean output:

550e8400-e29b-41d4-a716-446655440000
6ba7b810-9dad-11d1-80b4-00c04fd430c8

The not-a-uuid-12345 row has too few hex digits to fit the 8-4-4-4-12 layout, so it cannot be coerced. If you turn on include invalid rows, it stays visible in the output marked with its reason — exactly the IDs that need a human to look. That keeps malformed values from silently vanishing before you have decided what to do with them.

From there you switch the output format. Plain lines for a quick paste, SQL IN to drop straight into a WHERE id IN (...) clause, JSON for a fixture, TypeScript union for a typed literal, CSV or Markdown when you want line numbers as an audit trail. One canonical column, several artifacts, no hand-adding quotes and commas.

Choosing your canonical form (and verifying it)

Lowercase hyphenated is the safe default — it is the RFC 4122 canonical form and what most databases store, so normalizing toward it usually means everyone else converges on what the database already holds. If you are feeding a system that demands brace-wrapped GUIDs or bare 32-char strings, decide that once, write it down next to the schema, and apply it everywhere that data crosses a boundary.

Whatever target you pick, verify the tool actually produces it before you trust a batch. The UUID Normalizer's documented behavior is to lowercase the hex, drop braces, and re-insert the 8-4-4-4-12 hyphens — a canonical lowercase hyphenated result. If your downstream needs uppercase or unhyphenated output specifically, confirm that against the tool's actual options rather than assuming, and run a few known mixed-form values through first to check the result matches your expectation.

One caution from the manifest worth repeating: a UUID passing validation only means the format is well-formed. It is not proof that the account, row, or resource it points to actually exists. Normalization fixes spelling, not truth.

Where this fits in a cleanup workflow

In my own runs, normalization is step one — before anything else touches the list. Copied web text and CSV cells carry hidden whitespace and zero-width characters, so deduplicating before normalizing leaves near-duplicates that differ only by an invisible byte or a stray brace. Normalize first, then dedupe, then export.

The whole thing runs locally: parsing, validation, deduplication, copy, and download all happen in the browser tab, and uploaded text files are read with the File API rather than sent to a server — which matters when the column is full of customer identifiers or access tokens.

If your job is narrower than full cleanup — just confirming a column is all well-formed before an import — reach for UUID List Validator instead, then come back to UUID Normalizer when you actually need every row rewritten into one shape. Pick one canonical form, apply it at every boundary, and the database and the API finally agree.


Made by Toolora · Updated 2026-06-13