How to Extract Version Numbers From Any Changelog or Log With a Semver Extractor

A changelog is written for humans, not for scripts. It mixes version numbers with dates, author names, ticket IDs, and paragraphs of release notes. So when someone asks "which versions are mentioned in this file," you end up scrolling and copying by hand, missing a few, and double-counting the ones that appear twice. A version extractor does that reading for you: it scans the text, finds every version string, and hands back a clean, deduplicated list.

This post walks through what counts as a semantic version, how extraction and dedup work, and three real jobs where pulling versions out of messy text saves a few tedious minutes. The Semantic Version Extractor runs the whole thing in your browser, so the changelog or log you paste never leaves the page.

What a semantic version actually looks like

Semantic versioning gives a release a number with a defined shape, and knowing that shape is the whole trick to extracting one reliably. A semantic version is major.minor.patch with an optional -prerelease and +build metadata appended. So 1.4.2, 2.0.0-rc.1, and 1.4.2-rc.1+build.27 are all valid, while 2024.10.3 (a date) and 1.2 (missing the patch field) are not, even though they look version-like at a glance.

Each field carries meaning:

major bumps when a release breaks backward compatibility.
minor bumps when features are added without breaking anything.
patch bumps for bug fixes only.
-prerelease marks an unstable build like -alpha.2, -rc.1, or -beta.
+build is metadata the version comparison ignores, like a commit hash or build number.

The extractor recognizes that full pattern. It finds every version string in a changelog or log and dedupes them, dropping the prose, dates, and markup around each match so you get only the versions themselves.

How extraction and dedup work

The flow is three steps. First the parser scans the text and grabs each token that satisfies the SemVer pattern. Second, it normalizes the matches: trimming whitespace, stripping the hidden characters that copied web text loves to smuggle in. Third, it dedupes, so a version that shows up in five places collapses to one row.

What makes the output trustworthy is that it keeps a record. Each row carries its source line number, the normalized value, a validity flag, and a reason. When the regex grabs something it should not, like a date written 2024.10.3, that row is flagged invalid with a reason instead of being silently dropped. You decide whether to keep it for review or filter it out. That audit trail is the difference between "here is a list" and "here is a list I can defend."

You also pick the output shape. The clean list can come out as plain lines, CSV with line numbers, JSON, Markdown, a SQL IN clause, or a TypeScript union type, so the result drops straight into whatever you are building next without hand-adding quotes and commas.

A worked example: a changelog reduced to its versions

Here is a snippet that looks like a hundred real changelogs:

## 2.4.1 - 2024-10-03
- Fixed a crash in the export worker (#812)
- Bumped the parser from 1.4.2 to 1.4.2

## 2.4.0 - 2024-09-20
- New dashboard, shipped first as 2.4.0-rc.1
- Reverted a regression introduced in 2.3.9

## 2.3.9 - 2024-08-30
- Security patch backported from 2.4.0-rc.1

Reading by hand, you would probably write down 2.4.1, 2.4.0, 2.3.9 and call it done, quietly missing the prereleases and the parser bump. Run the snippet through the extractor and the unique, valid versions are:

1.4.2
2.3.9
2.4.0
2.4.0-rc.1
2.4.1

Notice what happened. 1.4.2 appeared twice on one line and collapsed to a single row. 2.4.0-rc.1 showed up in two different sections and also collapsed to one. The date 2024-10-03 was not pulled in as a version, because it does not match the major.minor.patch shape with dot separators between exactly three numeric fields. Five unique versions, sorted, ready to paste into a dependency review or a release tracker.

Three jobs this actually saves time on

Building a version list. When you are documenting which releases a customer has touched, or assembling the set of versions a migration script must handle, you need the distinct values, not the duplicates. Paste the source, dedupe, sort, and export. The list is done before you would have finished scrolling.

Finding every version mentioned in a changelog. Audits and compliance reviews often ask "which versions does this document reference?" The honest answer requires catching the prereleases and the in-line bumps a human skim misses. Keeping line numbers means you can jump back to the exact context for any version that looks surprising.

Dependency review. Logs and lockfile diffs are full of version strings buried in noise. Extracting them gives you a quick inventory of what is actually in play, which is the starting point for spotting an outdated or duplicated dependency. When you only need to confirm that a pile of versions are well-formed rather than collect them, the Semantic Version List Validator does the focused check.

Why local processing matters here

I reach for this most when a teammate drops a CI log into chat and asks which builds it covers. The log is forty kilobytes of timestamps and stack traces with a dozen versions scattered through it. I paste it in, take the deduped list, and answer in under a minute, and because the parsing runs entirely in my browser, I never have to think about whether that log contains an internal hostname or a token I should not be uploading somewhere. For changelogs, lockfiles, and log snippets, a few megabytes of text is plenty; for a gigabyte CI archive, grep the relevant lines locally first, then paste those.

That local-only guarantee is the quiet reason to prefer a browser tool over a random web service for this. Version strings travel with the context around them, and that context is frequently something you would rather not hand to a server.

Where this fits

Extraction is usually the first step. Once you have the clean list, you might normalize edge cases, collapse near-duplicates, or reformat for a specific target. The extractor covers the common path end to end, and when one stage needs more attention, the dedicated tools take over from there without making you re-paste your source.

Start with the Semantic Version Extractor the next time a changelog, a log, or a wall of release notes lands on your desk. Paste, dedupe, export, and move on.

Made by Toolora · Updated 2026-06-13