How to Deduplicate JWT Tokens From a Captured Log Set

When you pull a capture from a request log, an auth proxy, or a stack of support tickets, the same JWT shows up over and over. The same session token rides every request a client makes, so a one-hour window of traffic might carry a thousand lines and only forty distinct tokens. Before you can count distinct sessions, audit who held a token, or feed a clean list into a script, you have to collapse those repeats down to one copy each. That is what deduplication does here, and the way it works for JWTs is stricter than the deduplication you are used to.

Why JWTs Compare as Exact Strings

A JWT is a base64url string: three segments — header, payload, signature — joined by dots. Base64url is case-significant. Uppercase A and lowercase a are different bytes in the alphabet, and the dedup engine treats them that way. Two tokens are duplicates only if every single character matches, end to end. There is no case-folding the way there is with email addresses, where Alice@example.com and alice@example.com route to the same inbox. With JWTs, changing one character changes the underlying bytes, and if it lands in the signature segment it changes the cryptographic meaning entirely. So the rule is simple and unforgiving: exact-match only. Same string, same token. One character off, two different tokens.

That strictness is a feature, not a quirk. You never want a dedup pass to silently merge two tokens that differ by a character, because that difference might be the whole point — a refreshed token, a tampered copy, or a token from a different signing key.

Why You Normalize Before You Dedupe

Exact-match is honest, but it is also literal. It will happily report two copies of the same token as distinct if the surrounding text differs by even one stray character. In real captures, that happens constantly:

A Bearer prefix on one line but not another: Bearer eyJhbG... versus the raw eyJhbG....
Trailing whitespace, a tab, or a carriage return pasted from a Windows log.
Quotes or commas dragged in from a CSV cell or a JSON value.

To an exact-match comparison, Bearer eyJ...abc and eyJ...abc are two different strings, so a naive pass keeps both and your "distinct token" count is wrong. This is why you normalize first. Normalization strips the wrapper noise — the Bearer keyword, the surrounding whitespace, the stray punctuation — so the comparison runs on the actual token bytes and nothing else. If you want to inspect or standardize that step on its own, the JWT token normalizer does exactly that cleanup as a dedicated pass.

How This Tool Deduplicates

Here is precisely what happens when you paste a list into the JWT token deduplicator. It parses the tokens in your browser tab, normalizes each one, then folds duplicates together: exact repeats and tokens that only differ after normalizing collapse into a single canonical row, and the first occurrence of each is the one it keeps. Every kept row carries a duplicate count and the first source line number, so you can still trace where each token came from. A malformed token — a two-segment string, or a chunk that is not valid base64url — cannot be safely matched against the others, so instead of being silently merged under a lookalike it is set aside in the report with its reason, and you can choose to keep those invalid rows for review.

Two things this tool does not do, and you should not expect from it. It does not decode the payload — it will not show you the claims, the exp, the sub, or any field inside the token. And it does not verify signatures — a token passing the format check is not proof that it is authentic, unexpired, or issued by anyone you trust. This is a string-cleanup tool: extract, normalize, dedupe, export. Validation here means "this looks like a well-formed JWT string," nothing more.

Because JWTs are sensitive, the output masks the token values rather than printing them in full, while still giving you the useful signals — the duplicate count, the first line, the valid/invalid verdict and reason. You get an audit-ready report without splashing live bearer tokens across a screen, a copied cell, or a downloaded file. Everything runs locally; nothing is sent to a server to be compared.

A Worked Example

Say your capture, after grepping the Authorization headers, looks like this — five lines, with a Bearer prefix on some and not others:

Bearer eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxMSJ9.AAAAA
eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxMSJ9.AAAAA
eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIyMiJ9.BBBBB
  eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxMSJ9.AAAAA
eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIyMiJ9.bbbbb

Lines 1, 2, and 4 are the same token wearing different clothes: one has a Bearer prefix, one has leading whitespace, one is bare. After normalization they are byte-for-byte identical, so they fold into a single row with a count of 3. Line 3 is a distinct token. Line 5 looks almost like line 3 — but its signature segment ends in lowercase bbbbb instead of uppercase BBBBB, and base64url is case-significant, so it is a genuinely different token, not a duplicate. The deduplicated result (values masked):

token            count   first_line   valid
eyJh…AAAAA          3         1        true
eyJh…BBBBB          1         3        true
eyJh…bbbbb          1         5        true

Three distinct tokens out of five lines. Notice the engine did not merge the BBBBB and bbbbb rows — that case difference is exactly the kind of thing you want surfaced, not swallowed.

My Workflow With It

I keep this tool open whenever I am triaging an auth incident. The first time I ran it on a real capture, I had merged two proxy exports by hand and was sure I was looking at roughly two hundred active sessions. After a dedupe pass the count dropped to thirty-one, and the duplicate counts told the actual story: a handful of clients were retrying aggressively and inflating the log. What I appreciate most is the masking — I can paste a report into a ticket for a teammate without each line being a live token someone could lift. And because I can keep the invalid rows, the truncated and copy-mangled tokens stay visible with a reason instead of vanishing from the tally. When I need the surviving list standardized further, I hand it to the JWT token normalizer and move on.

What to Remember

Deduplicating JWTs is exact-match because a JWT is a case-significant base64url string — two tokens are the same only when every character matches, with no case-folding. Normalize first so a Bearer prefix or stray whitespace does not split one token into two. Let the masking protect the values, keep the invalid rows when you need the full count, and remember the tool cleans strings — it never decodes claims or verifies signatures. Treat a clean, deduplicated list as a starting point for investigation, not as proof that any token is real.

Made by Toolora · Updated 2026-06-13