A Practical Guide to Regular Expressions: Classes, Groups, and Flags

Regular expressions look like line noise until they click, and then they stay clicked forever. A regex is just a tiny pattern language for describing the shape of text — "a digit, then three letters, then an @ sign." Once you can read the shapes, you can search log files, validate input, and rewrite messy data in one line instead of fifty.

I rewrite my regex notes every couple of years because I keep forgetting the corners, and the fastest way I know to relearn them is to type a pattern, paste real text under it, and watch what lights up. That feedback loop is the whole point of the regex tester — you see matches highlighted as you type, so the rules below stop being abstract.

Character classes: the building blocks

A character class matches one character from a set. The literal way is square brackets: [aeiou] matches any single vowel, and [a-z] matches any lowercase letter using a range. Negate it with a caret inside the brackets — [^0-9] matches anything that is not a digit.

You will reach for the shorthand classes far more often:

\d — any digit, the same as [0-9]
\w — a "word" character: letters, digits, or underscore
\s — whitespace: spaces, tabs, newlines
. — any character at all (except newline, unless you set the right flag)

Uppercase versions invert them. \D is "not a digit," \W is "not a word character," \S is "not whitespace." So \d\d:\d\d reads as "two digits, a colon, two digits" — a clock time. Almost every pattern you write is a sequence of these little classes glued together.

Quantifiers: how many times

A class on its own matches exactly one character. Quantifiers say how many:

* — zero or more
+ — one or more
? — zero or one (optional)
{3} — exactly three
{2,4} — between two and four
{2,} — two or more

So \d{4} is exactly four digits, and colou?r matches both "color" and "colour" because the u is optional. By default quantifiers are greedy: they grab as much as they can. Add a ? after a quantifier to make it lazy — <.+?> stops at the first > instead of swallowing the rest of the line.

Anchors and groups: position and structure

Anchors don't match characters, they match positions. ^ is the start of the string, $ is the end, and \b is a word boundary — the edge between a word character and a non-word character. \bcat\b matches "cat" as a whole word but not the "cat" inside "category."

Parentheses group a sub-pattern so a quantifier can apply to the whole thing: (ab)+ matches "ababab." Group with the pipe for alternation: gr(a|e)y matches "gray" or "grey." Groups are also where capturing happens, which is the part that turns regex from a search tool into a rewriting tool.

Capture groups: pulling pieces out

Every pair of parentheses creates a numbered capture group. When the pattern matches, each group remembers the slice of text it caught. The full match is group 0; the first parenthesis is group 1, the next is group 2, and so on.

This is what makes structured rewriting possible. Take a date pattern:

(\d{4})-(\d{2})-(\d{2})

Run it against the text 2026-06-13, and you capture 2026 in group 1, 06 in group 2, 13 in group 3. Now a replacement string of $3/$2/$1 produces 13/06/2026 — the same date in day-first order. You wrote no loops, no string slicing; the groups did the bookkeeping. I lean on this constantly for reformatting CSV columns and renaming things in bulk, and previewing the replace before pasting it back is exactly what the find and replace text tool is built around.

If you only need to group without capturing — say, for alternation — use a non-capturing group (?:...) so it doesn't consume a group number.

A worked example: emails and phone numbers

Here is a real input/output run you can reproduce. Paste this text:

Contact: ada@toolora.info or call 555-0142. Backup: grace@example.com

Type the pattern [\w.]+@[\w.]+ and turn on the global flag. The tester highlights two matches and reports a match count of 2:

ada@toolora.info
grace@example.com

That pattern is deliberately loose — "word characters or dots, an @ sign, more word characters or dots." It is great for pulling addresses out of a log dump, but it is not a real validator; it would happily accept ..@... For genuine address checking with the rules that actually matter, the email validator handles the edge cases a quick pattern skips.

A simple US phone pattern in the same text would be \d{3}-\d{4}, which matches 555-0142. Tighten it to $?\d{3}$?[-.\s]?\d{3}[-.\s]?\d{4} and it absorbs formats like (555) 012-3456 and 555.012.3456. Phone numbers are a famous regex trap — a strict pattern misses real numbers, a loose one matches junk — so I always test against a spread of formats rather than trusting it on the first try.

Flags: changing the rules of the match

Flags sit after the closing slash and switch the engine's behavior:

g (global) — find every match, not just the first. Forget this and only your first match highlights.
i (ignore case) — cat then matches "Cat" and "CAT."
m (multiline) — ^ and $ match at the start and end of each line, not just the whole string.
s (dotall) — . now matches newlines too, so a pattern can span lines.
u (unicode) — handle astral characters and \u{...} escapes correctly.
y (sticky) — match only at the current position, useful when writing a tokenizer.

The two that bite people most are g and i. Most "why does only one match show up" questions are a missing g. According to MDN's RegExp documentation, flags are immutable on a constructed regex — you set them when the pattern is created and can't toggle them mid-search — which is why a tester that lets you flip them and re-run instantly saves so much guesswork.

Where to go from here

Regex rewards practice more than reading. Build patterns up one class at a time, watch the match count, and add quantifiers only when the simple version works. When you're escaping backslashes copied from a Java or Python string literal, write \d once here, not \\d — the double backslash matches a literal backslash and silently breaks the pattern.

Keep a real test corpus handy and run every pattern against it before it goes near your code. The whole flow — pattern, sample text, highlighted matches, replace preview — lives in the regex tester, and it never sends your input anywhere, so you can throw production log lines straight at it.

Made by Toolora · Updated 2026-06-13