How to Normalize Hex Color Codes into One Canonical Form
Normalize hex colors from messy text into one lowercase six-digit form so a design system can diff, dedupe, and search hex color codes that never string-match.
How to Normalize Hex Color Codes into One Canonical Form
A color value in a design system can show up in a dozen written forms and still point at the exact same pixel. #abc, #AABBCC, aabbcc, #aabbcc — all four are one color, yet a plain text search treats them as four strangers. That mismatch is where palettes quietly drift: a button defined as #0a84ff in one stylesheet and #0A84FF in another reads as two brand blues to any diff tool, even though they paint identically.
The Hex Color Normalizer exists to collapse that ambiguity. It reads hex colors out of pasted text, logs, CSV exports, copied HTML, or an uploaded local file, and rewrites every valid one into a single canonical shape: lowercase, six digits, no hash decoration getting in the way of comparison. Everything runs in the browser, so the source text never leaves your tab.
Why one canonical form matters
A design system is only coherent if you can answer two questions cheaply: is this color already in the palette? and did this color change between versions? Both questions are string comparisons under the hood. The moment the same color exists in three written forms, both questions break.
I learned this the slow way. I was auditing a component library where the "danger" red was supposedly one token, and grep for #ff3b30 found 31 matches. It felt clean until I searched #FF3B30 and found 14 more, then ff3b30 (hashless, pasted from a spreadsheet) and found 9 more. The same red was hiding in three spellings across 54 lines, and my diff between the old and new theme had been silently ignoring two-thirds of them. Normalizing the whole file first — one form, lowercase, six digits — turned 54 scattered rows into one deduplicated entry I could actually reason about.
That is the entire point of canonicalization. You are not changing what the colors mean; you are removing every difference that is not a real difference, so the only variation left is the variation you care about.
Shorthand, case, and the hash: the three differences that aren't differences
There are exactly three ways the same hex color gets written differently, and the normalizer flattens all three.
Three-digit shorthand versus six-digit. This is the one people forget. #abc and #aabbcc are the same color because each shorthand digit is doubled when expanded: a becomes aa, b becomes bb, c becomes cc. So #f00 is #ff0000, #0c9 is #00cc99, and #fff is #ffffff. Two stylesheets can hold the identical color in forms that never string-match until you expand the shorthand. The normalizer expands shorthand to the full six-digit form so the doubled version and the long version land on the same row.
Uppercase versus lowercase. #FF8800 and #ff8800 are identical to a browser and different to grep. The canonical output here is lowercase, so case never splits a color again.
With or without the hash. A value copied out of a CSV cell or a code snippet often arrives hashless — ffffff instead of #ffffff. The parser accepts both and produces one consistent shape, so a hashless ffffff and a #FFFFFF from a copied theme come out matching.
A worked detail on alpha: the canonical form this tool produces is the lowercase six-digit RGB shape. Four-digit and eight-digit alpha hex are a different shape, and the normalizer treats values it cannot fold into the six-digit form as invalid rows rather than guessing — a #1234 (four digits) is surfaced with a reason instead of silently reshaped. Always verify the target form and how alpha is handled against the tool itself before you trust the output downstream, because "normalize" means different things in different pipelines.
A worked example: mixed forms in, clean list out
Say you paste this jumble, scraped from three different files:
#abc
#AABBCC
aabbcc
#FF8800
#ff8800
#1234
red
#00ff00 trailing-junk
The normalizer reads each line, rewrites the valid colors into the canonical lowercase six-digit form, and keeps the unique rows. The clean list becomes:
#aabbcc
#ff8800
Two rows, not eight. #abc, #AABBCC, and aabbcc all collapsed into #aabbcc because shorthand expansion and lowercasing made them string-identical; #FF8800 and #ff8800 merged the same way. The rest does not vanish — it surfaces. With invalid rows preserved for review you still see the #1234 (four digits, not a six-digit color), the named color red that slipped into the list, and the #00ff00 trailing-junk line, each kept with the reason it could not be cleaned. Nothing breaks downstream because nothing was quietly dropped.
That preserve-invalid behavior matters more than it looks. Silent deletion is how a malformed token disappears from a palette audit and resurfaces as a production bug three weeks later. Seeing the reason next to the bad row is the difference between cleaning data and losing it.
Deduping and diffing once everything is canonical
After normalization the useful operations become trivial. Keep unique rows only, and your palette dedupes itself — the 54 scattered reds from my audit become one entry. Sort the normalized output, and two theme files diff cleanly because both sides are in the same order and the same form. A diff that used to flag dozens of phantom changes now shows only the colors that genuinely moved.
If your job is purely removing repeats from an already-clean list, the focused hex color deduplicator handles that case directly. If you need to confirm which entries are well-formed before you trust them, the hex color list validator reports the bad rows with reasons. And when the colors are buried inside stylesheets, the CSS variable extractor pulls the custom-property values out first so you have a list to normalize.
Exporting a list your pipeline can actually use
A clean list of colors is only half the job; the other half is handing it to the next system in the right shape. The normalizer switches the output between plain lines, CSV, JSON, Markdown, a SQL IN clause, and a TypeScript union, then downloads the exact artifact. That means a JSON fixture for a snapshot test, a SQL IN (...) filter for a query against a colors table, or a type BrandColor = '#aabbcc' | '#ff8800' union without hand-adding quotes and commas. CSV and Markdown can carry line numbers, which is what you want when the deliverable needs an audit trail rather than just a final answer.
The practical workflow is one pass: paste or upload, normalize to the canonical form, keep unique rows, sort, then export the format your destination expects. The source text stays in your browser the whole time, which is the right default when the colors are tangled up with customer data or internal identifiers in the same paste.
Normalizing hex colors is unglamorous work, but it is the load-bearing step under every clean palette, every honest theme diff, and every dedupe that actually deduplicates. Get every color into one canonical form first, and the rest of the design-system hygiene falls out almost for free.
Made by Toolora · Updated 2026-06-13