Secret Scanner Basics: How to Find Leaked Secrets in .env Files and Diffs

The most expensive line of code I ever wrote was one I deleted within a minute. It was a quick debugging change: I pasted a live Stripe key straight into a config file to test a webhook, committed it with the rest of my work, and pushed. The key was valid for about three minutes before a bot found it on GitHub and started probing my account. Rotating it, auditing the logs, and writing the incident note took the rest of my afternoon. That key followed a pattern any scanner would have caught instantly. I just never ran one.

A secret scanner is a small safety net for exactly that mistake. You paste a chunk of text, a .env file, a diff, a CI log, and it flags the strings that look like real credentials before they reach somewhere public. This post walks through what those scanners actually look for, how to read their output, and how to fold a quick scan into your normal commit habits.

What a secret actually looks like

Credentials are not random noise to a scanner. Most providers stamp their keys with a recognizable shape, and that shape is the first thing a scanner matches against.

A handful of well-known prefixes do most of the work:

AWS access key IDs start with AKIA followed by sixteen uppercase characters.
Stripe secret keys begin with sk_live_ (and test keys with sk_test_).
GitHub personal access tokens start with ghp_, with finer-grained and OAuth tokens using github_pat_, gho_, and similar markers.
Slack tokens open with xoxb-, xoxp-, or xoxa-.
Google API keys start with AIza.
OpenAI-style keys use a sk- prefix.

Then there are the credentials with no fixed prefix at all: a database connection string like postgres://user:pass@host:5432/db, a JWT with its three dot-separated Base64 segments, or a private key wrapped in a -----BEGIN PRIVATE KEY----- block. Scanners recognize these by structure rather than by a leading tag.

The second signal is entropy. A real secret is dense, near-random data, so it carries far more information per character than ordinary prose or a placeholder. A scanner measures that density and flags any unbroken string that scores high enough, even when it does not match a known provider. That is how a custom token from an internal service still gets caught: your_api_key_here reads as low-entropy filler, while f3Kq9XmZ2pLvW8nR4tB7yH1cJ6dA0sE reads as something you should not be committing.

A worked example: reading a leaked key in a diff

Say you are about to commit this .env change and you run the scan first. Here is the input:

# .env.example
DATABASE_URL=postgres://demo:demo@localhost:5432/app
STRIPE_KEY=sk_live_4eC39HqLyjWDarjtT1zdp7dc
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
DEBUG=true
APP_NAME=invoicer

A scanner walks line by line and reports something like:

Line 2 DATABASE_URL — database URL (masked: postgres://demo:***@localhost:5432/app)
Line 3 STRIPE_KEY — Stripe secret key, prefix sk_live_ (masked: sk_live_4eC3***)
Line 4 AWS_ACCESS_KEY_ID — AWS access key ID, prefix AKIA (masked: AKIA****************)

The DEBUG and APP_NAME lines are left alone. Notice the output never reprints the full value. Masking matters because the report itself is something you might paste into a ticket or a chat with a teammate, and a scanner that echoes your live key in plaintext just moves the leak somewhere else.

The takeaway from this example: the local database URL with the demo password is probably fine to commit in an example file, but the sk_live_ Stripe key is a genuine production credential that should never be in version control. The AKIA...EXAMPLE value is AWS's documented placeholder, so it is harmless here, though the scanner flags it because it matches the pattern exactly. Reading the output means deciding, per line, which flags are real and which are intentional samples.

You can try this yourself with the ENV Secret Scanner: paste the block above, watch it mark the three credential lines, and confirm the masking before you share anything.

Why local-only scanning is the whole point

There is an obvious tension in a tool whose job is to inspect secrets: to check your credentials, it has to see your credentials. If that scan happens on someone's server, you have handed your keys to a third party to find out whether you leaked them, which defeats the purpose.

A good scanner runs entirely in your browser. The text you paste never crosses the network. No upload, no logging, no copy sitting in a server's request history. That design constraint is what lets you safely paste a real .env file or a fresh production log instead of carefully sanitizing it first. The scanner reads the input in memory, matches the patterns, masks the findings, and forgets everything the moment you close the tab.

This is also why a scanner is honest about its limits. It catches the common patterns and high-entropy assignments that show up in everyday developer work, but it cannot guarantee a file is clean. A bespoke internal secret with an unusual shape and moderate entropy can slip past. Treat a clean result as "no obvious leaks found," not as a certificate of safety.

Folding scans into pre-commit hygiene

The leak that cost me an afternoon happened because the scan was not part of my routine. The fix is to make checking secrets as automatic as checking that your tests pass.

A few habits that keep credentials out of your history:

Scan before you stage. Run your git diff through a scanner before git add, not after the commit lands. The cheapest leak to fix is the one that never gets committed.
Scan logs before you share them. A failed deploy log often contains expanded environment variables and full connection strings. Paste it into a scanner before dropping it into a support ticket or a public issue.
Keep real values out of example files. A .env.example should hold placeholders only. If a scanner flags a high-entropy string in your example file, that is a value that escaped from your real .env.
Rotate first, clean up second. If a real key was committed, logged, or sent to anyone, assume it is compromised. Rotate it immediately, then remove the source exposure. Deleting the line does not un-leak the key.

When a scan turns up a JWT and you need to confirm what is actually inside it, the JWT Decoder breaks the token into its header and claims so you can tell a harmless signed payload from a session token that should never have been logged. Pairing the two means you not only spot the leak but understand exactly what was exposed.

The mindset, not just the tool

A scanner is a backstop, not a strategy. The deeper habit is treating every secret as something that wants to escape: into a commit, a log line, a screenshot, a paste into the wrong channel. Prefixes like sk_live_, AKIA, and ghp_ are gifts because they make the dangerous strings visible. Entropy scoring covers the rest. Run the check locally, read each flag on its merits, and rotate without hesitation when something real slips through. That two-minute scan is far cheaper than the afternoon I spent learning the lesson the hard way.

Made by Toolora · Updated 2026-06-13