Extract HTML Form Fields: Map Every Action, Method, and Input

A form is a contract. When the user clicks submit, the browser bundles a set of named values and ships them to a URL using a verb. Most of the time you never see that contract written down. It lives scattered across a few hundred lines of markup, half of it generated by a template engine, the rest copied from a component someone wrote two years ago. To answer a simple question, "what exactly does this form send, and where," you end up scrolling through <div> soup counting input tags by hand.

That is the job the HTML Form Extractor does for you. Paste the markup or load a saved .html file, and it parses the HTML and lists each form's action and method plus every field's name, type, and whether it is required, so you can map exactly what a form submits without reading the markup. No guessing, no missed hidden inputs, no executing scripts. The page text goes in, a structured inventory comes out.

What it pulls from the markup

For every <form> on the page, the extractor reads:

The action (where the data goes) and the method (GET or POST).
Each <input>, <select>, <textarea>, and <button>, with its name, type, and required flag.
The autocomplete attribute on each field, so you can see which inputs hint to the browser's autofill.
Basic <label for> text matched back to its field, so a cryptic name="cc2" gets a human label like "Confirm password."
<select> elements with their full list of <option> values, so dropdowns are not a black box.
A count of password fields and any hidden inputs, including likely CSRF tokens.

The output comes in three shapes. Markdown gives you a readable checklist to paste into a ticket or a review doc. CSV turns the whole thing into a field inventory you can sort and filter in a spreadsheet. JSON keeps the identical structure for scripts and pipelines. Pick whichever fits the next step.

A worked example

Here is a trimmed signup form, the kind you find in a real template:

<form action="/account/create" method="post">
  <input type="hidden" name="csrf_token" value="a1b2c3">
  <label for="email">Email</label>
  <input id="email" name="email" type="email" required autocomplete="email">
  <label for="pw">Password</label>
  <input id="pw" name="password" type="password" required>
  <select name="plan">
    <option value="free">Free</option>
    <option value="pro">Pro</option>
  </select>
  <button type="submit">Create account</button>
</form>

Run it through the extractor and the noise collapses into a field list:

Form 1  -  action: /account/create  -  method: POST
  csrf_token   hidden    (required: no)
  email        email     (required: yes)   autocomplete: email
  password     password  (required: yes)   autocomplete: (none)
  plan         select    (required: no)    options: free, pro
  [button]     submit
Notes: 1 password field. Hidden token present. password field missing autocomplete hint.

In four lines I can see the destination, the verb, the four named values that travel, and one practical flag: the password field has no autocomplete attribute, so password managers may not offer to save it. That last note is the kind of thing nobody catches by eye until a user complains, and it is sitting right there in the report.

Where I actually reach for this

I inherited a checkout flow last month with three forms spread across a server-rendered template, and I needed to document what each one posted before I dared touch it. My first instinct was to grep for <input and count, which is exactly how you miss the hidden field that turns out to matter. Instead I dumped the rendered HTML into the extractor and had a clean CSV in seconds: every field name, every action, every required flag, one row each. I attached that CSV to the migration ticket as the source of truth. When we rebuilt the forms in the new frontend, the inventory told us precisely which names the backend still expected, and nothing silently dropped.

Three jobs it does well

Documenting a form. When you are writing API or integration docs, the form's field names are the payload keys. Extracting them gives you an accurate parameter list without transcribing anything, and the CSV doubles as a checklist reviewers can sign off on.

Scripting submissions. If you are writing a test that posts to a form, or a small script that automates a repetitive entry, you need the exact field names and the action URL. The JSON output hands you both in a shape you can feed straight into code. Pair it with a regex tester when you need to validate the values you plan to send before firing them at the endpoint.

Security review of fields. This is where the field-by-field view earns its keep. The extractor flags password fields submitted over GET (which leaks credentials into URLs and logs), form actions pointing at external domains, empty actions, and POST forms with no CSRF-style hidden token. None of these are proof of a vulnerability on their own, but each is a question worth asking, and the tool surfaces them so you ask it.

What it does and does not do

Two honest limits. First, the parser reads the HTML text you give it. A form that only exists after JavaScript runs will not appear unless you save the rendered DOM and paste that instead. Capture the live HTML first if the form is built client-side. Second, a hidden field that looks like a CSRF token is a heuristic. Seeing name="csrf_token" tells you a token field exists, not that the server actually validates it. Treat the flag as a prompt to verify, not a verdict.

Everything runs locally in the browser. The HTML you paste never leaves the page, which matters because form markup often carries internal route names, hidden parameters, and product details you would not want uploaded to a third-party service. Still, review the output before you share it, since those internal names end up in the report by design.

Fit it into a wider audit

A form inventory is one slice of a site review. Once you know what each form posts, you usually want to know how those endpoints behave under load and how the pages around them are wired up. The HAR performance analyzer covers the network side of the same pages, so the two pair naturally when you are auditing a flow end to end rather than a single template in isolation.

Forms are where users hand you their data and where a lot of quiet bugs and quiet leaks live. Reading that contract by hand is slow and error-prone. Letting the markup parse itself into a clean field list takes the guesswork out, whether you are documenting, scripting, or reviewing.

Made by Toolora · Updated 2026-06-13