Skip to main content

JSON Lines (JSONL) Explained: Format, Validate, and Find the Bad Line

A practical guide to JSON Lines (JSONL/NDJSON): why it beats a JSON array for logs and streaming, how per-line validation finds the one bad record fast.

Published By Li Lei
#json #jsonl #ndjson #data #developer-tools

JSON Lines (JSONL) Explained: Format, Validate, and Find the Bad Line

If you work with logs, event streams, or machine-learning datasets, you have almost certainly opened a file where every line is its own JSON object. No surrounding brackets, no commas between records, just one object after another. That format is JSON Lines, often written as JSONL or NDJSON (newline-delimited JSON). It looks simple, and that simplicity is exactly the point.

The core idea is one sentence long: each line is an independent JSON value, parseable on its own. A standard JSON array forces the whole file to be one document, so a parser has to read it all before it can hand you anything. JSONL flips that. A program can read line one, do something with it, then read line two, never holding the entire file in memory. That single property is why JSONL shows up everywhere data is large, streamed, or appended over time.

Why JSONL beats a JSON array for real data

A JSON array ([ {...}, {...}, {...} ]) is fine when the data is small and complete. It falls apart at scale for three concrete reasons.

First, appending. To add a record to a JSON array you have to find the closing ], back up, insert a comma, then add your object. With JSONL you append one line and you are done. That is why log writers and event collectors emit JSONL: writing is a single print per event.

Second, streaming. A consumer reading a JSONL stream processes records as they arrive. A 4 GB array can't be parsed until the final byte lands, but a 4 GB JSONL file is just four billion bytes of independent lines you can iterate one at a time.

Third, failure isolation. In an array, a single misplaced quote anywhere makes the entire document invalid, so the parser throws and you get nothing. In JSONL, line 472 being broken has zero effect on lines 1 through 471. You keep the good data and isolate the bad record.

The one bad line problem

The third point above is the one that bites people in practice. You export a million events, hand the file to an import script, and it dies with Unexpected token at position 38211. Position 38211 is useless. You need a line number, and you need the other 999,999 records to survive.

This is where per-line validation matters. Instead of treating the file as one document, a good JSONL tool parses each line separately, collects the valid records, and reports each failure with its line number and the parser's message. You fix the one line, or drop it, and move on. The JSON Lines Formatter does exactly this in the browser: paste the data, and it splits the result into valid rows you can format or export and a list of invalid lines pinned to their line numbers.

A worked example

Here is a small JSONL file from an analytics export. Five events, one of them broken:

{"event":"signup","user":"u_001","plan":"free","ts":1718200000}
{"event":"login","user":"u_002","ts":1718200133}
{"event":"purchase","user":"u_003","amount":29.99,"currency":"USD"}
{"event":"login","user":"u_004,"ts":1718200201}
{"event":"logout","user":"u_005","ts":1718200260}

Look at line 4. The value "u_004, is missing its closing quote before the comma, so that object never closes. If you wrapped these five lines in [ ... ] and ran them through a normal JSON parser, the whole thing would be rejected and you would learn nothing about the other four events.

Run it as JSON Lines instead and you get a clean split:

  • Line 1: valid
  • Line 2: valid
  • Line 3: valid
  • Line 4: invalidUnterminated string in JSON at position 30
  • Line 5: valid

Four good records are ready to convert or export. The one broken record is named, with the line number and the reason, so the fix takes seconds: close the quote after u_004. That is the entire workflow, and it scales the same way whether the file has 5 lines or 5 million.

Three formatting jobs once the data is clean

Validation is only half the work. Once you know which lines are good, you usually want to reshape them.

Pretty JSONL. Compact one-liners are great for machines and miserable for humans. Reformatting each valid line as indented JSON makes a record readable when you are debugging a single event by eye.

Convert to a JSON array. Plenty of downstream tools — test fixtures, a quick script, an API request body — want a real JSON array, not newline-delimited records. Wrapping the valid lines into [ {...}, {...} ] bridges the two worlds. If you then need to clean up or re-indent that array, the JSON Formatter handles the pretty-printing and validation of the combined document.

Flatten to a table. When every line is an object with similar keys, a CSV-like table is the fastest way to scan for missing fields, type mismatches, and dirty records. Table mode builds columns from the object keys and drops anything that does not fit into a value column, so structural problems jump out visually rather than hiding inside a wall of braces.

How I actually use it

I spend a lot of my week staring at log exports and event dumps, and JSONL is the format I reach for first when I have to share a sample. The last time it earned its keep, a teammate sent me a 60,000-line event file that "wouldn't import." I pasted it in, and three lines came back invalid: two had a stray BOM character at the start, and one had a trailing comma from a hand-edit. The other 59,997 records were fine. I exported the valid rows to a JSON array, handed that back, and the import went through on the next try. The point that made it quick was the per-line split — I never had to bisect the file or guess which byte offset mapped to which event. The tool told me three line numbers, and that was the whole investigation.

Where JSONL fits in a data pipeline

JSONL is not a competitor to JSON; it is JSON wearing work clothes. Use a JSON array when the data is a single bounded document you load all at once — a config file, an API response, a small fixture. Use JSON Lines when records arrive over time, when the file is too big to hold in memory, or when you need each record to fail independently of the others. Logs, Kafka-style event streams, queue messages, batch-job output, and ML training samples all land naturally in JSONL for exactly those reasons.

A common move is converting between shapes as data crosses tool boundaries: a database export comes out as JSONL, you validate and flatten it to a table for review, then convert the clean rows to a JSON array for a script that expects one. None of that requires a server. Parsing, validating, converting, and exporting all run locally in the browser, which also means you can safely inspect event data that contains user identifiers without it leaving your machine.

The next time an import dies on "position 38211," skip the byte-offset archaeology. Run the file through per-line validation, read the line number, fix one record, and keep the rest.


Made by Toolora · Updated 2026-06-13