Skip to main content

Markdown to HTML: What Actually Happens to Your Tables, Code Blocks, and Angle Brackets

A practical guide to converting Markdown to HTML — CommonMark vs GFM differences, how tables and fenced code blocks render, why escaping matters, and when to sanitize before shipping.

Published By 李雷
#markdown #html #commonmark #converter #web-development

Markdown to HTML: What Actually Happens to Your Tables, Code Blocks, and Angle Brackets

Most people treat Markdown-to-HTML conversion as a black box: paste text, get tags, move on. That works right up until a table renders without borders, a code fence swallows the next paragraph, or a stray <script> you copied from a stranger's README ends up live on your page. The gap between "it looks fine in the preview" and "it's safe to ship" is where the interesting details live.

This guide walks through what a converter actually does to your input — and where the popular Markdown flavors quietly disagree. If you just want to convert something right now, the Markdown to HTML tool runs entirely in your browser and shows live HTML next to your source. The rest of this post explains why the output looks the way it does.

CommonMark vs GFM: the difference that bites you

There is no single Markdown. The original 2004 spec by John Gruber was deliberately loose, which meant every parser interpreted edge cases differently. CommonMark exists to fix that. Its specification (see commonmark.org/spec) defines exactly how nested lists, emphasis runs, and ambiguous link references resolve — down to the byte. When a tool says "CommonMark-compatible," it means a **bold _italic_** run nested three levels deep produces the same tree everywhere.

GitHub Flavored Markdown (GFM) is CommonMark plus a strict superset of extensions: tables, task lists (- [x]), strikethrough with ~~, and autolinked URLs. The catch is that not every converter implements the GFM extensions, and the ones that do may handle the corners differently. A pipe table with a missing trailing pipe is valid GFM but might fall back to a plain paragraph elsewhere.

The practical rule: if your Markdown came from a GitHub README, assume it uses GFM features. If it came from a docs platform or a static-site generator, check which flavor that tool emits before you trust the conversion. Mismatched expectations here are the single most common reason output "looks wrong."

A real conversion, start to finish

Here is a small but representative chunk of Markdown:

## Setup

Install with `npm i toolora` then run:

const x = parse(input)


| Step | Command   |
| ---- | --------- |
| 1    | build     |
| 2    | deploy    |

See the [docs](https://example.com) for more.

A CommonMark-plus-tables converter turns that into:

<h2>Setup</h2>
<p>Install with <code>npm i toolora</code> then run:</p>
<pre><code class="language-ts">const x = parse(input)
</code></pre>
<table>
  <thead><tr><th>Step</th><th>Command</th></tr></thead>
  <tbody>
    <tr><td>1</td><td>build</td></tr>
    <tr><td>2</td><td>deploy</td></tr>
  </tbody>
</table>
<p>See the <a href="https://example.com">docs</a> for more.</p>

Three things to notice. The inline code became a <code> element, not a styled span. The fenced block kept its language hint as class="language-ts" — that's a marker for a downstream highlighter like Prism or Shiki, not actual coloring. And the table produced bare <table> markup with no borders or padding. The semantics are correct; the styling is your job.

Tables, code blocks, and task lists: the three that surprise people

Tables are GFM, not CommonMark core. They render to clean <table>/<thead>/<tbody> structure with no inline styles. If you paste that into a CMS and see an unstyled grid, that's expected — wrap it in a .prose class or run it through an inline-styles step. I learned this the slow way: I once shipped a changelog page where every table looked like a wall of text because I assumed the converter would add borders. It won't, and it shouldn't.

Fenced code blocks preserve the language tag but never colorize. The output <pre><code class="language-ts"> is a hook for a syntax library you load separately. Without that library and its CSS theme, your code renders in plain monospace. That's a feature, not a bug — it keeps the output bundle tiny and lets you choose the highlighter.

Task lists (- [x] done) are the one to double-check. Plenty of lightweight converters, including this one, deliberately skip the checkbox rendering because it pulls in extra parsing rules for marginal benefit. If your source relies on task lists for meaning, confirm your target supports them before converting a 200-item backlog.

Escaping and security: where converters earn their keep

This is the part people skip and then regret. Any literal <, >, or & in your prose — outside of code blocks — should be escaped to &lt;, &gt;, and &amp; so the output is safe to drop into an HTML page without breaking the markup or opening an injection hole. A good converter does this automatically. If you specifically need to encode or decode a batch of entities by hand, the HTML entities encoder handles that as a standalone step.

There's a deeper trap with raw HTML pass-through. The CommonMark spec mandates that raw HTML blocks — including <script> and <iframe> — pass through untouched, because a converter shouldn't second-guess what you typed. That's correct behavior, but it means converting Markdown from an untrusted source and shipping the output straight to a page that holds a user session can ship someone else's cross-site scripting along with it. The fix is one line: run the HTML through a sanitizer like DOMPurify before it touches a logged-in page. Tools that implement only a CommonMark subset (no raw HTML pass-through) sidestep this, but never assume — verify.

Pick the right direction and the right tool

Conversion goes both ways, and the reverse trip has its own quirks. If you're pulling rich content back out of a web page into clean Markdown — say, archiving an article — a dedicated HTML to Markdown converter handles the messy nested-div soup that round-tripping introduces. Going from Markdown into a component tree instead of a string is a different job again, closer to JSX templating.

The decision tree is short. Need semantic HTML for a CMS or email body? Convert Markdown to HTML and add styling downstream. Need to clean up scraped or pasted HTML back into editable source? Reverse it. Need the output safe on an authenticated page? Sanitize regardless of direction. Get those three calls right and Markdown stops being a black box and starts being a predictable, debuggable pipeline.

The conversion itself takes a second. The thinking — which flavor, what styling, whether to sanitize — is what separates output that renders from output that ships.


Made by Toolora · Updated 2026-06-13