How to Extract IPv6 Addresses From Logs and Messy Text
A practical guide to pulling IPv6 addresses out of logs and pasted text: the colon-hex format, :: compression, why they hide better than IPv4, and clean dedup.
How to Extract IPv6 Addresses From Logs and Messy Text
The first time I had to pull every unique IPv6 address out of a firewall log, I reached for the same grep pattern I used for IPv4 and got nothing useful back. IPv4 is friendly to a quick regex: four numbers, three dots, done. IPv6 is a different animal. It mixes hexadecimal, colons, an optional shorthand for zeros, and sometimes a tail of embedded IPv4. By the time you account for every legal way to write the same address, your one-line pattern has turned into a small parser. This guide walks through what makes IPv6 hard to spot, how the format actually works, and how to turn a wall of log text into a clean, deduplicated list you can act on.
The colon-hex format, briefly
A full IPv6 address is eight groups of four hexadecimal digits separated by colons, for example 2001:0db8:0000:0000:0000:0000:0000:0001. Each group is 16 bits, so the whole address is 128 bits, four times the size of IPv4's 32 bits. That extra width is the whole point of IPv6, but it also means addresses are long and tedious to write out in full.
Two rules let people shorten them, and both rules are why a naive search misses so much:
- Leading zeros in a group can be dropped, so
0db8becomesdb8and0001becomes1. - A single run of all-zero groups can be replaced with
::, the double colon. So the address above collapses to2001:db8::1.
The catch is that :: may appear only once in an address, because it stands for "as many zero groups as needed to fill 128 bits." Use it twice and the address is ambiguous, and therefore invalid. A good extractor recognizes both the full and the compressed forms and treats 2001:0db8:0000:0000:0000:0000:0000:0001 and 2001:db8::1 as the same address when it dedupes. That single behavior saves a lot of manual reconciliation later.
There is one more wrinkle worth knowing: IPv6 can carry an IPv4 tail, as in ::ffff:192.168.0.1. These show up constantly in dual-stack systems, and any parser that only knows hex-and-colons will skip right over them.
Why IPv6 hides better than IPv4
IPv4 stands out in text. The dotted format rarely collides with anything else you see in a log line, and the digits are decimal, so your eye catches them. IPv6 blends in. Colons already appear everywhere in logs: timestamps like 14:32:07, key:value pairs, port suffixes, MAC addresses, and code snippets. A string like 2001:db8::1 is surrounded by other colon-heavy noise, and the hex digits look like ordinary identifiers.
Addresses also arrive wrapped. When a port is attached, IPv6 gets bracketed: [2001:db8::1]:443. Interface-scoped link-local addresses carry a zone index: fe80::1%eth0. Copied web pages bring along hidden whitespace and HTML fragments. Each of these decorations breaks a simple pattern and forces you to clean up by hand. Doing that across a few thousand log lines is exactly the kind of work that should be automated.
A worked example
Here is a small slice of the kind of log text I deal with, with addresses repeated across lines and written in different forms:
2026-06-12 09:14:02 conn from [2001:db8::1]:51820 accepted
2026-06-12 09:14:03 conn from 2001:0db8:0000:0000:0000:0000:0000:0001 reused
2026-06-12 09:15:41 deny fe80::1%eth0 -> ff02::1
2026-06-12 09:16:10 conn from [2001:db8::1]:51999 accepted
2026-06-12 09:16:55 mapped client ::ffff:203.0.113.7
2026-06-12 09:17:30 malformed 12345::1 dropped
Run that through the IPv6 Address Extractor with dedup turned on, and the noise falls away. The brackets, ports, and surrounding words are dropped, the two spellings of the first address are recognized as one, and you are left with the unique set:
2001:db8::1
fe80::1
ff02::1
::ffff:203.0.113.7
The line with 12345::1 is flagged as invalid rather than silently dropped, because 12345 is five hex digits and a group can hold only four. That distinction matters: in a security review you want broken or spoofed-looking entries surfaced, not quietly removed. Keeping invalid rows with a reason next to them lets you separate genuine addresses from typos and tampering.
Where this actually pays off
Two jobs come up again and again. The first is modern log analysis. Server, firewall, and application logs are full of IPv6 now, and answering a simple question like "which distinct clients hit this endpoint" means collapsing dozens of spellings into one canonical list. The second is security review. When you triage an incident, you want the unique addresses touching a system, you want the odd ones called out, and you want it done without pasting customer traffic into a random web service.
Dedup is the quiet hero in both cases. The same host appears across many lines, in compressed and uncompressed form, bracketed and bare. Normalizing every match to one canonical spelling and then removing duplicates turns "847 log lines" into "12 unique addresses," which is a list a human can reason about. From there you can sort it, export it to CSV for a ticket, or drop it straight into a SQL IN clause or a TypeScript union without hand-adding quotes and commas.
If you also need to confirm each entry is well-formed before importing, pair the extractor with the IPv6 Address List Validator. And when your text is messy in other ways first, with stray blank lines or inconsistent endings, a quick pass through the Text File Cleaner before extraction keeps the results tidy.
Keep it local
One reason I keep coming back to this approach is that the parsing happens in the browser. Logs and traffic captures often contain customer data, internal hostnames, and access patterns you do not want leaving your machine. With local processing, the text you paste and any file you drop in are read with the File API in the current tab, scanned, and turned into a list without a round trip to a server. That keeps the workflow honest for security work: you get the deduped audit table, the line numbers, and the validity reasons, and nothing sensitive is shipped off to be logged somewhere else.
A few habits make the output more reliable. Normalize before you dedupe, since copied web text carries invisible whitespace that can make two identical addresses look different. Treat a valid format as exactly that, a format check, not proof that a host is reachable or that an account exists. And when you need an audit trail, download the CSV or Markdown with line numbers rather than copying only the final list, so you can always trace a result back to the line it came from.
IPv6 will keep showing up in more of your logs as the internet finishes its long migration. A reliable way to pull addresses out, fold the compressed and full forms together, flag the broken ones, and export a clean artifact turns a fiddly chore into a few seconds of work.
Made by Toolora · Updated 2026-06-13