How to Infer a JSON Schema From Sample JSON

Most JSON Schemas I have written started life the same way: I had a pile of real responses from an API and no document that described them. Writing the schema by hand from scratch is slow and error-prone. You squint at three sample payloads, type out "type": "object", list the properties, guess which ones are required, and inevitably forget the field that only appears in error responses. Inferring the schema from the samples instead flips the work around. You feed the tool the data you already have, and it produces a draft you correct rather than a blank file you fill.

This post walks through how schema inference works, what it can and cannot tell you, and how to use the JSON Schema Inferencer to bootstrap an API contract from real payloads.

What inference actually does

Inference reads one or more JSON samples and reports the structure they share. For an object, that means walking every key, recording the type of each value, and noting whether the key appeared in every sample. For an array, it means inspecting the items and describing the shape they have in common. The output is a JSON Schema document you can drop into a validator.

The two decisions that matter most are type per field and required. Type is read directly from each value: a string stays "string", an integer becomes "integer", true/false becomes "boolean", null is tracked separately so a nullable field can be expressed as a type union. Required is a coverage question, not a type question. A property is marked required only when it is present in every sample. Show the inferencer five records and one of them is missing phone, and phone lands in properties but stays out of the required list. That single rule is what makes multiple samples worth far more than one.

On top of the basics, a good inferencer adds detail that a contract needs: string formats like email, uri, date, date-time, and uuid when every observed value of a field matches the pattern; small enum candidates when a field only ever holds a handful of distinct values; length and numeric ranges; and example values pulled from the data.

A worked example

Here is the kind of input you would paste. Two user records from an API, where the second one omits nickname:

[
  {
    "id": "8f14e45f-ceea-467d-9b3a-2c2b3a7e1a11",
    "email": "ada@example.com",
    "age": 36,
    "active": true,
    "nickname": "Countess",
    "roles": ["admin", "billing"]
  },
  {
    "id": "5c9d6b2a-1f3e-4a7b-8c0d-9e1f2a3b4c5d",
    "email": "grace@example.com",
    "age": 45,
    "active": false,
    "roles": ["viewer"]
  }
]

Inference walks both records and produces something close to this:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "id":       { "type": "string", "format": "uuid" },
      "email":    { "type": "string", "format": "email" },
      "age":      { "type": "integer", "minimum": 36, "maximum": 45 },
      "active":   { "type": "boolean" },
      "nickname": { "type": "string" },
      "roles":    { "type": "array", "items": { "type": "string" } }
    },
    "required": ["id", "email", "age", "active", "roles"]
  }
}

Notice the work the inference did. It recognized id as a UUID and email as an email format because every value matched. It read age as an integer and bounded it by the values it saw. It kept nickname in properties because the first record had it, but it left nickname out of required because the second record did not. And it described roles as an array of strings by looking inside the array rather than stopping at "this is an array."

Nested objects and arrays

Real payloads are rarely flat, and this is where inference earns its keep. When a value is itself an object, the inferencer recurses: the nested object gets its own properties and required list, computed from the nested samples the same way the top level is. An address object with city, zip, and an optional unit is described as precisely as the outer record.

Arrays get the same treatment one level down. The inferencer looks at the items, and if they are objects it merges their shapes into a single items schema, marking a nested key required only when every element in every sample carries it. Mixed-type arrays collapse to a type union for the items rather than silently picking the first one. The practical payoff is that a deeply nested response, the sort that takes twenty minutes to schema by hand, comes back fully described in one pass.

Turning a draft into a real contract

An inferred schema is a starting point, not a finished contract, and treating it as gospel is the most common mistake. Samples only show the fields that happened to appear. A field that is optional in reality but present in all your samples will be marked required, and a rare error-only field will be missing entirely. Enum candidates are suggestions: three observed values do not prove the set is closed.

So I treat the inferred output as a first draft and then do a short review pass:

Widen required. Drop fields you know are optional even though every sample had them.
Confirm enums and bounds. Promote enum candidates only where the value set is genuinely closed, and replace sample-derived numeric ranges with real business limits.
Add what samples cannot show. Error shapes, pagination wrappers, and fields gated behind feature flags.
Document. Add description and title so the schema doubles as API docs.

For ingestion pipelines, this draft-then-tighten loop is the fastest way to get validation in place. You start enforcing the obvious structure today and harden the edges over the following days, instead of blocking on a perfect hand-written schema.

If your stack lives in TypeScript or you validate at runtime with Zod, the inferred JSON Schema is a clean handoff point. Convert it into runtime validators with a tool like TypeScript to Zod Schema so the contract you inferred from real data becomes the contract your code enforces.

A quick checklist

Before you ship an inferred schema, run through this:

Feed it as many distinct samples as you can — more samples means a more honest required list.
Include the awkward cases: empty arrays, nulls, missing optional fields.
Review every required entry against what you actually know.
Sanity-check formats and enums rather than trusting them blindly.
Strip secrets, tokens, and customer data before sharing the output — inference runs in your browser, but a generated schema can still leak example values.

Schema inference will not replace the judgment that turns a structure into a real contract. What it does replace is the tedious, error-prone first hour of typing out types and properties by hand. Paste your samples, let the types and required fields fall out of the data, and spend your attention on the decisions that actually need a human.

Made by Toolora · Updated 2026-06-13