Skip to main content

How to Normalize Semantic Versions into Clean, Comparable Semver

Strip the leading v, pad partial versions like 1.2 to 1.2.0, and standardize tags so your semver list sorts and compares correctly every time.

Published By Li Lei
#semver #versioning #developer-tools #text-processing

How to Normalize Semantic Versions into Clean, Comparable Semver

Version strings rot in surprisingly quiet ways. One CI export writes v1.2.3, a teammate types 1.2 in a release note, a Git tag list mixes 1.10.0 with 1.9.0, and a support ticket pastes Latest. Each of those describes a real release, but none of them sort or compare cleanly against the others, because to a computer they are just strings. v1.2, 1.2.0, and 1.2 all point at the same release, yet they are not string-equal, so a naive sort or a === check treats them as three different things.

That gap is exactly what the Semantic Version Normalizer closes. It rewrites every semantic version it finds into one uniform canonical form, entirely in your browser, so the whole column lines up before you hand it to a script, a spreadsheet, or another system.

Why a canonical semver actually matters

Semantic versioning has a precise shape: three numeric parts (MAJOR.MINOR.PATCH), an optional prerelease tag, and optional build metadata. The trouble is that the strings people paste rarely arrive in that shape. They carry a decorative v, they drop the patch number, they vary their casing, or they smuggle in trailing whitespace from a copied web page.

Until those strings are reduced to a single canonical form, three operations all misbehave:

  • Sorting. A plain string sort puts 1.10.0 before 1.9.0 because 1 sorts before 9 character by character. Numeric, part-aware semver ordering is the only sort that reads 1.10.0 as the later release.
  • Comparing. v1.2 and 1.2.0 are the same release, but "v1.2" === "1.2.0" is false. Deduplication built on raw string equality will keep both and quietly inflate your list.
  • Importing. A downstream filter, a WHERE version IN (...) clause, or a TypeScript union expects clean, predictable tokens. One stray v or a missing patch part is enough to make a row miss its match.

Normalizing first turns all three from guesswork into arithmetic.

The two transformations that do most of the work

Two rules carry most of the cleanup, and both are worth verifying against the tool's own behavior before you trust a column.

Strip the leading v. v1.2.3 becomes 1.2.3. The v is a human convention from tag lists and changelogs; it carries no meaning the parser needs, and leaving it in place breaks numeric comparison.

Pad partial versions to three parts. 1.2 becomes 1.2.0, and a bare 1 becomes 1.0.0. A semver value needs all three numeric components for MAJOR.MINOR.PATCH ordering to work, so a short tag is padded out with zeros until it reaches the canonical shape. This is the step that makes 1.2 and 1.2.0 finally string-equal after normalization, which is what lets the deduplicator collapse them into one row.

On top of those, the tool standardizes casing, punctuation, and spacing across the column, so the output is uniform rather than a patchwork of whatever each source happened to write. Prerelease tags get folded into that same canonical pass — verify the exact prerelease and build-metadata rules against the tool itself for the inputs you care about, since how a prerelease suffix is treated changes whether two near-identical tags collapse or stay distinct.

A worked example

Here is the kind of mixed list that lands in a real cleanup task — a pasted blend of tag exports and hand-typed notes:

v1.2.3
1.2
1.10.0
1.9.0
v1.2
1.2.0
latest
1.x

Run it through the normalizer with deduplication and sorting on, and the canonical output settles into something a script can trust:

1.2.0
1.9.0
1.10.0

Walk through what happened. v1.2.3 lost its v and stayed 1.2.3. 1.2, v1.2, and 1.2.0 all normalized to 1.2.0 and collapsed into a single row — three string-distinct inputs, one canonical release. 1.10.0 now sorts after 1.9.0, the way a numeric semver comparison demands, instead of jumping ahead the way a string sort would put it. And latest and 1.x? Those cannot be coerced into a valid semver at all, so they are flagged as invalid rather than silently mangled into something wrong. You can keep those invalid rows visible for review, which tells you exactly which tags need a human decision instead of a mechanical fix.

That distinction matters more than it looks. A normalizer that quietly dropped 1.x would hide a real ambiguity; one that guessed a value for it would invent data. Surfacing it as invalid is the honest behavior.

How I use it on a real cleanup

I reach for this most often when I am reconciling tag lists from two sources that should agree but do not. The last time, I had a Git tag export on one side and a hand-maintained release table on the other, and the diff between them was full of false positives — v2.0 against 2.0.0, 1.10 against 1.9.0 in the wrong order. I pasted both columns into the Semantic Version Normalizer, let it strip the vs and pad the short tags, kept invalid rows on so I could see the handful of nightly and latest entries that needed a call, and exported the cleaned list. The real diff shrank from dozens of phantom mismatches down to the three genuine gaps. Everything ran in the tab; no version data left my machine, which is the right default when the source is an internal release table.

From clean list to a usable artifact

Normalizing is only half the job. Once the column is canonical, you usually need it in a specific shape for the next system, and the tool exports the cleaned list directly:

  • CSV with line numbers, for an import or an audit trail.
  • JSON, for a fixture or a config file.
  • SQL IN, so a WHERE version IN (...) filter is ready to paste.
  • TypeScript union, so a literal type drops straight into your code without hand-adding quotes and commas.
  • Plain lines or Markdown, for a note or a ticket.

If your task is purely about turning a clean list into one of those formats, the Semantic Version List Converter covers that conversion step on its own. And because copied web text often hides stray whitespace, normalize before you deduplicate or import — the cleanup pass is what makes the dedupe honest.

The takeaway

A version string is only as useful as its canonical form. Strip the leading v, pad partial versions to three numeric parts, standardize the casing and spacing, and flag what genuinely cannot be parsed — do those four things and v1.2, 1.2, and 1.2.0 finally agree, 1.10.0 finally sorts after 1.9.0, and your comparisons stop lying to you. The work runs locally, the output is an artifact you can copy or download, and the messy column you started with becomes a list another system can trust.


Made by Toolora · Updated 2026-06-13