JSON to CSV: Flattening Nested Objects, Arrays, and the Excel Encoding Trap
A practical guide to converting JSON to CSV — how nested objects flatten to dotted columns, how arrays and missing fields are handled, RFC 4180 escaping, and the UTF-8 BOM that stops Excel mangling Chinese text.
JSON to CSV: Flattening Nested Objects, Arrays, and the Excel Encoding Trap
Converting JSON to CSV sounds like a one-liner until you hit real data. A JSON array of flat objects with identical keys maps to a spreadsheet cleanly. But the moment you have nested objects, arrays inside fields, records with different keys, or non-Latin characters headed for Excel, a careless conversion silently corrupts your output. This guide walks through each of those edge cases and how to handle them correctly — the same problems I built the JSON to CSV Converter around.
Why "just map keys to columns" breaks
The naive approach is: take the keys of the first object, make them the header, then write one row per object. That works for a clean array like this:
[
{ "id": 1, "name": "Ada", "active": true },
{ "id": 2, "name": "Linus", "active": false }
]
Which becomes:
id,name,active
1,Ada,true
2,Linus,false
The trouble starts when the second object has a key the first one didn't, or when a value is itself an object or an array. Real API responses are messy: optional fields, embedded sub-objects, multi-select arrays. If your converter only looks at the first record's keys, every later record with extra fields gets silently dropped from the output.
The fix is to compute the union of keys across every object, kept in first-seen order. A row missing a key gets an empty cell in that column, so rows still line up. Export ten orders where only three carry a discount_code field, and you still get one discount_code column with blanks in the other seven rows — nothing falls off the edge.
Flattening nested objects into dotted columns
A spreadsheet cell holds a scalar, not a tree. So a value like this has no single home:
{ "user": { "address": { "city": "London" } } }
The standard answer is dotted-path flattening: the value lands in a column named user.address.city. This is the convention pandas uses for json_normalize, the one jq users expect, and the shape most BI imports assume. Take two environment config files as nested JSON, flatten both, and {"db":{"pool":{"max":20}}} becomes a db.pool.max column holding 20. Now you can drop both into a sheet and diff settings line by line instead of squinting at two deeply indented blobs.
If you'd rather keep a nested object intact, turn flattening off and the whole sub-object goes into one cell as a JSON string. That's lossless and re-parseable, at the cost of readability. Pick based on whether a human or a script reads the result next.
Arrays and missing fields: lossless vs. readable
Arrays force a choice between fidelity and legibility. Say a form library hands you multi-select answers:
{ "name": "Sam", "interests": ["design", "code", "ops"] }
You can export interests two ways. Keep it as a JSON string — ["design","code","ops"] in one cell — and it round-trips perfectly back to an array later. Or join the scalars with a separator, giving design | code | ops, which reads naturally when an analyst pivots the table but loses the array boundary if any value contains your separator.
One rule that saves you from quiet data loss: an array that contains objects can't be losslessly joined into a single cell, so a good converter keeps those as JSON no matter what the join setting says. If you genuinely need those nested fields as their own columns, restructure the JSON so each object's fields sit at the row level before converting — don't expect a join to invent columns it can't.
RFC 4180: the escaping rules that keep rows from shifting
CSV has no single official spec, but RFC 4180 is the de facto reference, and getting its quoting rules right is the difference between a clean import and a table where everything after one bad cell shifts a column to the right.
The rules that matter:
- A field containing a comma, a double quote, or a line break must be wrapped in double quotes.
- A literal double quote inside a quoted field is escaped by doubling it (
"becomes""). - Records are separated by CRLF.
So a value like he said "hi", ok must serialize as:
"he said ""hi"", ok"
Skip the quoting and the embedded comma splits one value into two cells, and every column after it is misaligned. This is exactly the bug that makes a "working" converter produce a spreadsheet nobody can use. When you need a CSV fixture for a test, the RFC 4180 escaping is what lets you exercise real edge cases — quotes, commas, newlines — instead of toy data.
Worth knowing too: not every spreadsheet defaults to a comma. German-locale Excel expects a semicolon as the delimiter, so a comma-separated file lands entirely in column A there. A good converter lets you switch between comma, semicolon, tab, and pipe for that reason.
The Excel encoding trap: UTF-8 and the BOM
Here's the one that burned me personally. I exported a JSON list of products with Chinese names, opened the CSV in Excel on Windows, and got 鍟嗗搧 — pure mojibake. My first instinct was that the converter had corrupted the bytes. It hadn't. The bytes were perfectly valid UTF-8.
The problem is Excel on Windows guesses the encoding when it opens a .csv, and it often guesses the system locale (like GBK or Windows-1252) instead of UTF-8. The bytes are right; Excel reads them wrong.
The fix is a 3-byte UTF-8 byte-order mark (EF BB BF) at the start of the file. That mark tells Excel "this is UTF-8," and 商品名称 renders correctly. One checkbox — "Add UTF-8 BOM" — is the difference between a usable export and ten minutes lost in the Data → From Text encoding wizard. Note the asymmetry: Google Sheets, Numbers, and text editors read UTF-8 fine without the BOM and may even show it as a stray character, so only add it when the target is Excel on Windows.
After the conversion
Converting one way is half the job. If you ever need to go back the other direction, csv-to-json takes the same delimiter and header options and reverses the mapping, so a round trip stays consistent. If your JSON won't paste cleanly in the first place — unbalanced brackets, a trailing comma — run it through the json-formatter to validate and pretty-print before you convert.
A few habits keep conversions clean: paste the array itself ([ ... ]), not a wrapper object like {"data": [ ... ]}, or the tool treats the wrapper as a single row; decide up front whether a human or a script consumes the output so you choose flattening and array handling accordingly; and remember the BOM only when Excel on Windows is the destination. Get those right and JSON to CSV really is the one-liner it looked like at the start.
Made by Toolora · Updated 2026-06-13