How to Deduplicate HTTP Headers When Case Variants Hide Among Them
Content-Type and content-type are the same field, but a plain text dedup keeps both. Here is how to fold case variants and clean a captured header set.
How to Deduplicate HTTP Headers When Case Variants Hide Among Them
The first time I pasted a curl -v trace into a generic line dedup tool and counted the result, the math was wrong. I had seven header lines, I expected five unique ones, and I got six. The extra line was a Content-Type echoed twice by a caching proxy, once as Content-Type and once as content-type. To me those are the same field. To a plain text dedup, they are two different strings.
That gap is the whole reason a header-specific deduplicator exists, and it is worth understanding before you trust any cleaned list.
Why a plain dedup keeps duplicates that are not unique
Most dedup utilities compare lines as raw text. Two strings are "the same" only if every byte matches. That rule is correct for arbitrary text, but HTTP headers do not play by it.
RFC 9110 says header field names are case-insensitive. A server can send Content-Type, a proxy can rewrite it to content-type, and a logging layer can flatten it to CONTENT-TYPE. All three name the same field. A byte-for-byte comparison sees three distinct strings and keeps all three. You end up with a "deduplicated" list that still contains duplicates, which is exactly the failure that hides during a debugging session when you are scanning for a header that appears once but shows up three times.
The fix is conceptually small: case-fold the field name before comparing. Lowercase Content-Type and content-type both become content-type, the comparison collapses them, and you keep one copy. The value after the colon is left alone, because values are not always case-insensitive.
What this tool actually does to the field name
I checked the implementation before writing this, because a blog post that claims case-folding without verifying it is worse than useless. The HTTP Header Deduplicator runs a normalizeHeader step on every parsed line. It splits on the first colon, lowercases the field name, then re-applies canonical Title-Case (content-type and CONTENT-TYPE both become Content-Type), and rebuilds the line as Name: value.
The dedup key is derived from that normalized form, and the HTTP header profile is not marked case-sensitive. So two lines whose names differ only by case produce the same key and fold into a single canonical row. The first occurrence is the one that stays, and the duplicate count tells you how many lines collapsed into it. Everything runs in the browser tab; nothing is uploaded.
That is the behavior the tool's own FAQ describes, and it matches the code.
A worked example
Here is a header set pasted straight from a verbose response capture, with the casing the proxy actually produced:
Content-Type: application/json
Cache-Control: no-store
content-type: application/json
X-Request-Id: 7c19
Cache-Control: no-store
CONTENT-TYPE: application/json
X-Foo bar
A plain text dedup of those seven lines keeps six, because the three Content-Type variants and the two Cache-Control lines are not all byte-identical. After case-folding the field name, the deduplicated output is:
Content-Type: application/json (×3, first seen line 1)
Cache-Control: no-store (×2, first seen line 2)
X-Request-Id: 7c19 (first seen line 4)
Three canonical rows. The Content-Type count of three is the signal you wanted: it tells you the field arrived three times under three casings, which is itself a hint that something between you and the origin is rewriting headers. The malformed X-Foo bar line has no colon, so it cannot be merged into a field; keeping invalid rows for review (an option in the tool) lets you see exactly what failed to parse instead of silently dropping it.
Cleaning a captured header set, step by step
A real capture is messier than three lines. When I clean one, the order that works is this:
- Strip the noise first. Verbose traces carry
>/<direction markers, timestamps, and indentation. Trim those down to bareName: valuelines before deduplicating, otherwise the prefix becomes part of the string and defeats the merge. - Watch for hidden whitespace. Text copied from a browser inspector or a rendered web page often carries non-breaking spaces and trailing tabs. Two
Set-Cookielines that look identical can carry different invisible bytes. Normalize the spacing so genuine duplicates actually match. - Then deduplicate. With the names case-folded and the values trimmed, the duplicate counts become trustworthy.
- Export with line numbers. If you need to explain to a teammate where a duplicate came from, copy or download CSV or Markdown that keeps the first source line, not just the final flat list.
That ordering matters because deduplication is only as honest as the normalization that runs before it. Fold case but leave a stray tab in the value, and you will still split a pair that should have merged.
When you do not want to fold case
Case-folding the name is safe. Case-folding the value is not, and the tool does not do it. A Set-Cookie value, an opaque ETag, or a bearer token in Authorization can be case-sensitive, and collapsing two values that differ only by case would destroy real information. The rule that holds across HTTP is: names are case-insensitive, values are not. The deduplicator applies case-folding to exactly the half where it is correct.
If your capture is an export from a tool that already normalized everything to lowercase, you can dedupe as-is and the case question never arises. The folding is there for the common case where a proxy, a log shipper, and an origin server each picked a different casing for the same field.
Try it on your own capture
Paste a curl -v trace, a copied response inspector panel, or a saved .http file and let the parser fold the casing for you: HTTP Header Deduplicator. If your input still has direction markers or indentation that you want gone before deduplicating, run it through the text file cleaner first, then bring the trimmed lines back here.
The point is not that deduplication is hard. It is that "the same header" means something specific in HTTP, and a tool that does not know the spec will quietly leave duplicates behind.
Made by Toolora · Updated 2026-06-13