Skip to main content

Markdown to HTML Rendering: Complete Reference with Spec Differences and Examples

CommonMark, GFM, and original Markdown all render the same input differently. This reference shows exact HTML output for each spec, with real examples and parser comparison.

Published
#markdown #html #rendering #commonmark #github-flavored-markdown

Markdown to HTML Rendering: Complete Reference with Spec Differences and Examples

Markdown looks deceptively simple — until you paste the same file into two different parsers and get two different HTML outputs. The original 2004 Markdown spec left dozens of edge cases undefined, and every major parser filled those gaps differently. This reference covers what actually comes out of CommonMark, GitHub Flavored Markdown (GFM), and original Markdown for the inputs developers encounter every day.

Why There Is No Single "Correct" HTML Output

John Gruber's 2004 spec described Markdown in prose — it had no formal grammar and no test suite. According to the CommonMark project, the original spec left at least 652 ambiguous cases unresolved (the CommonMark spec ships exactly 652 numbered test examples as its formal definition). Parsers like Pandoc, Redcarpet, Kramdown, and Marked each made independent choices to fill those gaps, producing incompatible HTML for edge cases like:

  • Nested lists with inconsistent indentation
  • Setext headings immediately followed by a blank line
  • Inline code with backticks inside link titles
  • HTML blocks mixed with Markdown

CommonMark (published 2014, finalized in 0.31 in 2024) resolved most ambiguities with a formal spec and a reference implementation (cmark). GFM is CommonMark plus GitHub extensions. When you are choosing a parser or debugging unexpected output, the spec you are targeting matters more than the library you picked.

Core Spec Differences: A Side-by-Side Table

These are the six differences that trip developers up most often:

| Feature | Original Markdown | CommonMark | GFM | |---|---|---|---| | Fenced code blocks | Not supported | ` or ~~~ | ` or ~~~ | | Tables | Not supported | Not supported | Supported (pipe syntax) | | Task lists | Not supported | Not supported | - [x] syntax | | Autolinks | <url> only | <url> only | Bare URL autolinked | | Strikethrough | Not supported | Not supported | ~~text~~ | | Indented code | 4 spaces or 1 tab | 4 spaces or 1 tab | 4 spaces or 1 tab |

The practical takeaway: if you target GFM (which npm packages like marked and markdown-it default to or can be configured for), you get tables and task lists. If you target strict CommonMark, you get neither unless you add extensions.

Real Input → Output Examples

Example 1: Fenced Code Block vs Indented Code

Input:

const x = 1;

CommonMark / GFM output:

<pre><code class="language-javascript">const x = 1;
</code></pre>

Original Markdown (or a parser without fenced-block support): The triple-backtick line is treated as a paragraph, and const x = 1; becomes body text. You get no <pre> and no language class.

I tested this myself when migrating a project from showdown (which supports fenced blocks as an opt-in extension) to a strict Gruber-era parser. Every code block in the documentation silently became unstyled paragraph text — no error, just invisible regression.

Example 2: GFM Table Rendering

Input:

| Name    | Score |
|---------|-------|
| Alice   | 92    |
| Bob     | 87    |

GFM output:

<table>
  <thead>
    <tr><th>Name</th><th>Score</th></tr>
  </thead>
  <tbody>
    <tr><td>Alice</td><td>92</td></tr>
    <tr><td>Bob</td><td>87</td></tr>
  </tbody>
</table>

CommonMark (strict) output: The pipe characters are treated as literal text in a paragraph. No <table> is generated.

This distinction matters for documentation sites: if your static-site generator uses CommonMark strict mode (Hugo's Goldmark does by default), you must enable the tables extension explicitly, or every pipe table in your content renders as garbled prose.

Example 3: Setext Headings and the "Lazy" Continuation Rule

Input (three parsers, same file):

Foo
---

This is a paragraph.

CommonMark and GFM treat --- as a setext H2 underline, producing:

<h2>Foo</h2>
<p>This is a paragraph.</p>

But if the line above --- is blank, CommonMark treats --- as a thematic break (<hr>), not a heading. Original Markdown behaves inconsistently here — some implementations produce an H2, others produce an <hr>. The CommonMark spec test #80 pins the exact rule: a setext heading underline cannot interrupt a paragraph that ended with a blank line.

Example 4: HTML Entities in Code Spans

Input:

Use `<br>` to break a line.

All three specs agree: content inside backtick code spans is HTML-escaped before output.

Output (all parsers):

<p>Use <code>&lt;br&gt;</code> to break a line.</p>

The angle brackets become &lt; and &gt;. This is safe by design — code spans are never parsed for HTML. Outside of code spans, raw <br> in Markdown source passes through as literal HTML in most parsers (original Markdown and GFM), but CommonMark only allows it inside an HTML block context.

If you need to encode entities manually for pasting into HTML contexts, Toolora's HTML Entities Encoder handles the conversion in one step.

CommonMark's Paragraph-Interruption Rules

One of the most consequential — and least documented — spec decisions is which elements can interrupt a paragraph. CommonMark 0.31 defines this precisely:

  • A blank line always interrupts a paragraph.
  • ATX headings (##) interrupt paragraphs.
  • Fenced code blocks interrupt paragraphs.
  • Setext headings do not interrupt paragraphs (they require the underline to immediately follow the last line of a paragraph with no intervening blank line).
  • Ordered lists starting with 1. interrupt paragraphs; lists starting with other numbers do not.

That last rule surprises most developers. This input:

I bought 5 items.
5. Oranges
6. Apples

produces a paragraph followed by a list in some parsers, but a plain paragraph ending in "5. Oranges\n6. Apples" in strict CommonMark — because a list that starts at 5 cannot interrupt a paragraph.

Choosing the Right Tool for Your Workflow

For most documentation and blog workflows, GFM is the right target: it gives you tables, task lists, fenced code, and strikethrough without needing plugin configuration. Use Toolora's live Markdown editor to preview how your source renders before committing it to a CMS.

For server-side conversion pipelines where you need reproducible, spec-compliant output, cmark (the C reference implementation) or cmark-gfm (GitHub's fork) are the most reliable options. Both are audited against the CommonMark spec's 652 test examples and produce byte-identical output across versions.

For static-site generators:

  • Hugo (Goldmark): CommonMark-compliant, extensions configurable in config.toml
  • Jekyll (Kramdown): not CommonMark; has its own extensions and quirks with footnotes
  • Docusaurus (Remark): CommonMark base with MDX extensions
  • VitePress (markdown-it): GFM-compatible by default

If you are converting in the other direction — stripping HTML back to clean Markdown — Toolora's HTML to Markdown converter handles the reverse pass and preserves heading hierarchy, links, and code blocks.

Quick Reference: What Each Spec Guarantees

| Output element | Original | CommonMark | GFM | |---|---|---|---| | <h1><h6> | ATX + setext | ATX + setext | ATX + setext | | <code> (inline) | Backtick | Backtick | Backtick | | <pre><code> | 4-space indent | 4-space + fenced | 4-space + fenced | | <blockquote> | > prefix | > prefix | > prefix | | <table> | No | No | Yes | | <del> (strikethrough) | No | No | Yes | | Bare URL autolink | No | No | Yes | | HTML passthrough | Yes | Restricted | Yes (sanitized on github.com) |

The safest document for cross-parser compatibility uses only ATX headings (##), fenced code with language tags, inline backtick code, > blockquotes, and * or - unordered lists. Avoid setext headings, bare autolinks, and pipe tables if your output target is unknown.


Made by Toolora · Updated 2026-07-01