Mock Data the Smart Way: Generate Realistic Test Data in JSON, CSV, and SQL
Why tests and demos need fake data, how to generate names, emails, and addresses, export to JSON, CSV, or SQL, and keep it all running locally with reproducible seeds.
Mock Data the Smart Way: Generate Realistic Test Data in JSON, CSV, and SQL
Every developer hits the same wall the moment a feature is half-built: the UI is ready, the table component renders, the API endpoint responds — but there is nothing to show. An empty list proves nothing. A single hard-typed row hides every bug. You need a hundred plausible users, or a thousand, and you need them now. That is exactly the gap mock data fills, and a mock data generator closes it in seconds.
Why Tests and Demos Need Fake Data
Real production data is the wrong tool for development, and not just for privacy reasons. Pulling a snapshot of live users into a staging database drags in names, emails, and addresses that belong to actual people — a compliance headache under GDPR and similar regimes, where personal data in non-production systems is a recurring audit finding. The safer and faster path is data that looks real but is fictional.
There is also a coverage argument. When you hand-type a fixture, you type the happy path: clean ASCII names, valid emails, round numbers. Your code passes. Then a user named O'Brien uploads a CSV and your importer chokes on the apostrophe. Generated data, drawn from realistic name and place lists, sprays apostrophes, accents, long strings, and edge-case lengths across your test inputs automatically — exercising the quoting and escaping paths a tidy five-row fixture never touches.
And for demos, mock data buys consistency. A sales demo that shows different fake people every time it loads looks unfinished. Reproducible data keeps your screenshots, your video walkthroughs, and your visual regression snapshots stable.
Generating Names, Emails, Addresses, and More
The core idea is a schema: a list of fields, each with a name and a type. A good generator covers the field types that show up in almost every app — id, uuid, firstName, lastName, fullName, email, username, phone, address, city, country, date, boolean, int, float, an enum of your own custom values, paragraph, and url.
That set is enough to model most tables you'll build. A user record is id + fullName + email + country + a created_at date. A product table is id + paragraph description + float price + boolean in_stock. A team roster is fullName + email + an enum role with values like admin, editor, viewer. You assemble the shape, not the rows.
Realistic output comes from embedded word lists — a couple hundred first names, a couple hundred last names, a hundred-plus cities, fifty-plus countries — that ship with the page. No external API, no locale megabytes downloaded at runtime: the lists are roughly 25 KB total. If you need standalone identifiers without a full schema, a dedicated UUID generator is the focused companion for that one job.
Exporting to JSON, CSV, and SQL
Generated rows are only useful if they drop cleanly into where you work, so the three output formats cover the three common destinations.
JSON is the format for front-end fixtures and API mocks. Paste the array straight into a .fixtures.ts file, a Storybook story, or an MSW handler.
CSV is the format for spreadsheets, BI imports, and — crucially — testing your own upload parsers. Feed a thousand-row CSV with messy names into your importer and watch how it handles the rows a small fixture never would.
SQL is the format for seeding a database directly. The tool emits a single batched statement:
INSERT INTO table_name (col1, col2, …) VALUES (…), (…), …;
That standard form is accepted by PostgreSQL, MySQL, SQLite, and SQL Server. Strings are single-quoted with embedded quotes escaped by doubling (O'Brien becomes 'O''Brien'), booleans render as TRUE/FALSE, and dates are ISO 8601. Rename the target table before downloading and you can paste the result straight into psql.
A Real Example: 5 Fake Users
Here is what the loop actually looks like. Define four fields — id (int), fullName, email, enum role with values admin, editor, viewer — set the row count to 5, fix the seed at 42, and generate. The JSON output looks like this:
[
{ "id": 1, "fullName": "Marcus Holloway", "email": "marcus.holloway@example.net", "role": "editor" },
{ "id": 2, "fullName": "Priya Nadkarni", "email": "priya.nadkarni@example.com", "role": "admin" },
{ "id": 3, "fullName": "Élodie Tran", "email": "elodie.tran@example.org", "role": "viewer" },
{ "id": 4, "fullName": "Samuel O'Brien", "email": "samuel.obrien@example.io", "role": "editor" },
{ "id": 5, "fullName": "Wei Zhang", "email": "wei.zhang@example.net", "role": "viewer" }
]
Switch the format to SQL and the same five rows become a ready-to-run batch:
INSERT INTO users (id, fullName, email, role) VALUES
(1, 'Marcus Holloway', 'marcus.holloway@example.net', 'editor'),
(2, 'Priya Nadkarni', 'priya.nadkarni@example.com', 'admin'),
(3, 'Élodie Tran', 'elodie.tran@example.org', 'viewer'),
(4, 'Samuel O''Brien', 'samuel.obrien@example.io', 'editor'),
(5, 'Wei Zhang', 'wei.zhang@example.net', 'viewer');
Notice row 4: O'Brien arrives pre-escaped as O''Brien. That single character is the difference between a clean import and a syntax error at 2 a.m.
Reproducible Seeds and 100% Local Generation
The seed is the quiet feature that turns a toy into a tool. With a numeric seed set, generation is deterministic: the same seed plus the same schema plus the same row count always produces byte-identical output. Internally that's a small mulberry32 PRNG seeded from your value. Leave the seed blank and it falls back to fresh random data on every run.
Why it matters: when CI fails on a generated fixture, every engineer reproduces the identical input by regenerating with the same seed — no need to commit a 5,000-line dump or guess which random run triggered the edge case. The reproducible failing test case becomes a one-liner.
I leaned on this the first time I had to debug a flaky pagination test. The component only broke when a long email truncated mid-cell, and I could not reliably recreate the row. Generating 50 rows with seed 7, saving the JSON, and loading it in the test pinned the input forever. The bug stopped hiding.
Everything runs in your browser. The schema, the seed, the row count, and every generated row stay in the page; the download never touches a server, and nothing is written to the URL, so a shared link carries only the tool address — not your field definitions. Once loaded, generation works fully offline. Pair it with a JSON formatter to pretty-print the output before it lands in your repo, and the whole pipeline — define, generate, format, commit — never leaves your machine.
One caution worth repeating: do not send real mail to generated emails. The domains may resolve to live servers, so route any test blast through a sink like MailHog. Treat the data like Lorem Ipsum — structurally realistic, semantically fictional.
Made by Toolora · Updated 2026-06-13