CSV to JSON: The Edge Cases That Break Naive Parsers (And How to Get It Right)
A practical guide to converting CSV to JSON correctly — quoted commas, embedded newlines, escaped quotes, type inference, and flat vs nested shapes, with real input and output.
CSV to JSON: The Edge Cases That Break Naive Parsers
Converting CSV to JSON looks like a one-liner. Split each line on \n, split each line on ,, zip the header against the values, done. I wrote exactly that parser years ago, shipped it, and watched it corrupt a customer export the same afternoon. One product description contained a comma inside quotes, and that single cell silently became two columns for every row beneath it. The whole file shifted right by one field.
That is the trap with CSV: it is deceptively simple until real data hits it. This guide walks through the cases that actually break, why they break, and what correct output looks like — using the CSV ⇄ JSON Converter as the reference for "done right."
Why "split on comma" is wrong
The comma is not a reliable field boundary. CSV allows a field to contain commas, newlines, and quote characters, as long as the field is wrapped in double quotes. This is not a vendor quirk — it is written down. The closest thing CSV has to a spec is RFC 4180, published by the IETF in 2005. It defines three rules that a naive splitter ignores:
- Fields with commas, newlines, or double quotes must be enclosed in double quotes.
- A double quote inside a quoted field is escaped by doubling it:
"". - Each record is a line, except that a quoted field may itself contain line breaks.
Rule 3 is the one that hurts most. If a field can hold a newline, then "one line equals one record" is false, and any parser that splits the raw text on \n before understanding quotes will tear records apart.
A real example, start to finish
Here is a small CSV with every nasty case in three rows: a quoted comma, an escaped quote, and an embedded newline.
id,name,note
1,"Smith, John","He said ""hi"""
2,"Acme Inc","Line one
Line two"
3,Plain,No quotes needed
A correct parser produces this JSON array of objects:
[
{ "id": "1", "name": "Smith, John", "note": "He said \"hi\"" },
{ "id": "2", "name": "Acme Inc", "note": "Line one\nLine two" },
{ "id": "3", "name": "Plain", "note": "No quotes needed" }
]
Notice what survived. Smith, John stayed one field instead of splitting into two columns. The doubled "" collapsed to a single literal quote inside the value. And the line break in row 2 stayed inside the note string instead of being read as the start of a fourth record. A split-on-comma approach gets all three of these wrong, and the failures are silent — you get JSON, it just describes the wrong data.
Type inference: convenient, and a quiet liability
Every value in a CSV is text. The file has no idea whether 007 is a number, a string, or a zip code prefix. Many converters try to be helpful and coerce values: 42 becomes a number, true becomes a boolean, empty becomes null.
That helpfulness has a cost. Coerce 007 to a number and you get 7 — the leading zeros vanish, which is a disaster for product codes, postal codes, and phone numbers. Coerce a column of ISBNs or long IDs and JavaScript's number type silently rounds anything past 2^53. Coerce a German price field and 19,90 collides with the decimal-comma problem below.
The defensible default is to keep every value a string and let the consumer decide. If you genuinely want typed output, do it as an explicit, opt-in step on columns you trust — never blanket-apply it to a file you have not inspected. When in doubt, strings preserve information; numbers throw it away.
The delimiter problem nobody warns you about
CSV is named for commas, but the comma is not universal. In Germany, France, and other locales where the comma is the decimal mark, Excel exports use a semicolon as the field separator instead — otherwise 19,90 would split into two fields. Open that file with a comma-only parser and every numeric column fractures.
The fix is to set the delimiter explicitly. Switch the separator to ; and 19,90 stays one field. The same applies to tab-separated logs (\t) and pipe-delimited dumps (|). The CSV ⇄ JSON Converter lets you pick comma, semicolon, tab, or pipe, and the choice only affects the active direction: on CSV→JSON it controls how fields are split, and on JSON→CSV it controls what separator gets written out.
Flat vs nested: where round-trips leak
CSV is a flat, two-dimensional grid. JSON is a tree. That mismatch is fine going from CSV to JSON — each row becomes one flat object, header cells become keys — but it bites hard on the way back.
If your JSON holds a nested value like { "address": { "city": "NYC" } }, there is no honest way to put a whole object into a single CSV cell. A careless serializer renders it as the literal string [object Object], and your data is gone. Before converting JSON to CSV, flatten nested fields yourself — for example promote address.city to a top-level city key. Decide on a flattening convention before you export, not after a stakeholder opens a spreadsheet full of [object Object].
If your end goal is a queryable shape rather than a spreadsheet, you might skip CSV entirely and send the data to CSV to SQL for INSERT statements, or pretty-print and inspect the structure with the JSON Formatter first. And when you do need the spreadsheet round-trip, the reverse path lives at JSON to CSV.
A short checklist before you trust the output
- Did the parser respect quotes? Check that any field with a comma inside it is still one column, not two.
- Is the first JSON object actual data, or did a header row get read as data (or vice versa)? Toggle "first row is header" if it is wrong.
- Are leading zeros and long IDs intact? If a converter typed them into numbers, prefer string output.
- Did nested JSON survive the trip back, or did it flatten to
[object Object]? Flatten before, not after.
Get those four right and CSV stops being a source of silent corruption. The format is forgiving to write and unforgiving to parse — which is exactly why it is worth using a parser that follows the rules instead of a split you wrote in a hurry.
Made by Toolora · Updated 2026-06-13