Skip to main content

How to Convert an HTML Table to JSON Without Writing a Scraper

Turn any HTML <table> into clean JSON: th cells become keys, each tr becomes an object, tags get stripped, and everything stays in your browser.

Published By Li Lei
#html #json #scraping #data #developer

How to Convert an HTML Table to JSON Without Writing a Scraper

Tables are the most data-rich thing on a web page and the most annoying to get out. You can see the rows, you can read the columns, but the moment you want them as structured data you are staring at a wall of <tr>, <td>, <b>, and &amp;. This guide walks through how to turn that markup into clean JSON, what rules govern the conversion, and where a copy-paste tool beats writing a one-off scraper.

The mental model: th becomes keys, each tr becomes an object

Every HTML table already carries the shape JSON wants. The header cells name the fields; the body rows hold the records. So the conversion is mechanical once you say it out loud:

  • The header row (the <th> cells, or the first <tr> if there is no real <thead>) supplies the keys.
  • Each following <tr> becomes one object, with its <td> cells matched to those keys in order.

That gives you an array of objects, which is the format almost every API, seed script, and front-end component expects. The only twist is tables that have no header at all. A stock-quote table is just symbols and prices with no labels, so keying on the first row would invent nonsense keys. For those you switch to a 2D array, an array of arrays, and skip the key step entirely.

A worked example

Here is a tiny table with a header row and two records:

<table>
  <tr><th>name</th><th>age</th><th>active</th></tr>
  <tr><td>Alice</td><td>30</td><td>true</td></tr>
  <tr><td>Bob</td><td>25</td><td>false</td></tr>
</table>

Paste it into the HTML Table to JSON converter and the default output is:

[
  { "name": "Alice", "age": "30", "active": "true" },
  { "name": "Bob",   "age": "25", "active": "false" }
]

Notice that 30 and true come out as strings. That is deliberate. Turn on type inference and the same input produces real types:

[
  { "name": "Alice", "age": 30, "active": true },
  { "name": "Bob",   "age": 25, "active": false }
]

Now age is a number, active is a boolean, and an empty cell would become null. The result drops straight into your code without a cleanup pass.

Cleaning the cells: tags, entities, and line breaks

Real tables copied off the web are filthy. A single cell might be <td><b>4K monitor</b></td>, an anchor like <a href="/x">London</a>, or two lines split by a <br>. Good conversion strips all of that down to the visible text:

  • <td><b>4K monitor</b></td> becomes "4K monitor".
  • An <a href=...>link</a> keeps only its visible text.
  • A <br> turns into a single space, so 221B Baker St<br>London reads as "221B Baker St London".

Entities decode too: &amp; becomes &, &nbsp; becomes a space, and &#233; becomes é. Anything inside <script> or <style> is dropped whole, so stray code never lands in your data. A merged cell with colspan repeats its value across the columns it spans, so the row stays aligned instead of shifting everything left.

Why I stopped writing throwaway scrapers for this

I used to reach for cheerio every time I needed a table out of a page. The pattern was always the same: spin up a Node script, install a parser, write a loop, fix the off-by-one when the header row sneaks into the data, then delete the whole thing an hour later because it was a one-off. The last time it happened I had scraped a product listing where the only structured part was one big table stuffed with <b> tags and &amp; entities. I pasted it into the converter, flipped on type inference, and had an array of objects with real numbers in about ten seconds. No install, no loop, no regex hunting for </td>. For a single table that you are never going to scrape again, a paste box wins on every axis that matters.

Type inference: when to leave it off

The safe default is everything-as-a-string, and there is a real reason for it. Plenty of table values look like numbers but must not be treated as numbers:

  • IDs and order numbers that you compare as text.
  • ZIP codes and phone numbers where formatting matters.
  • Anything with a leading zero, like "007", that would lose the zero as a number.

So type inference stays off until you ask for it. When you do turn it on, a clean 399 becomes the number 399, "true" and "false" become booleans, and blank cells become null. The converter still leaves "1,234" and "$99" as strings on purpose, because the thousands separator and the currency sign mean they are not plain numbers. If you want those as raw values you strip the symbols first, on your terms, not silently.

Multiple tables and the rest of your data pipeline

By default you get the first <table> in source order, which is exactly what you want when you copied one specific table out of a larger page. But if you pasted a whole document, navigation bars and footers sometimes ship their own tables, and the first one in the markup may not be the one you meant. Two ways to handle it: paste only the table you want, or turn on "parse every table" and the output becomes an array where each element is one table's JSON, in document order. Two tables come out as [[...rows...],[...rows...]] so you grab them in a single pass.

JSON is rarely the final stop. Once your table is structured you often need it in another shape. If a teammate wants a spreadsheet, you take the same JSON to a CSV to JSON converter flow in reverse, or pivot between formats depending on what the downstream system eats. The point of starting from clean JSON is that every other format is a short hop away once the messy HTML is behind you.

Everything stays in your browser

Scraped page source is sensitive in ways you do not always notice. It can carry email addresses, session tokens, or cookie-banner markup you would never want to leak. This conversion runs entirely as plain JavaScript inside your browser tab, finding the table, reading the rows, cleaning each cell, and serializing the JSON, with nothing uploaded and nothing logged. The parser matches strings rather than feeding the HTML to a live DOM, so any inline script or tracking pixel hidden in that source never runs. The pasted input is also kept out of the shareable URL on purpose, so a link you copy carries only your option choices, never the table data.

That combination, structured output plus local-only processing, is what makes pasting a raw table feel safe enough to do with production data. You read the JSON, you spot the misaligned column or the secretly empty cell, and you catch the bad row here instead of after it breaks your import.


Made by Toolora · Updated 2026-06-13