Skip to main content

JSON to Scala Case Classes: A Practical Field Guide

Turn JSON into Scala case classes with typed fields, nested child classes, Option for nullable keys, and decoders that work with circe and play-json.

Published By Li Lei
#scala #json #case-class #circe #spark

JSON to Scala Case Classes: A Practical Field Guide

Every Scala service that talks to the outside world hits the same wall: a JSON payload comes in, and someone has to turn it into types. You can pattern match a Map[String, Any] by hand, or you can write a case class and let the compiler check field names and value types for you. The second path is the one that survives a refactor. This post walks through how to do that conversion cleanly, what the tricky cases look like, and how the generated classes plug into the libraries you already use in Spark, Akka, and play.

Why case classes, not maps

A case class is the Scala unit of structured data. The moment you write one, you get typed accessors for free. Take this object:

{ "id": 1, "name": "Ada", "active": true }

That becomes:

case class Root(id: Int, name: String, active: Boolean)

Now root.id is an Int, not a runtime cast away from blowing up. The auto-generated case class gives you typed fields, a constructor, equals, hashCode, copy, and pattern matching, all without a single line you wrote by hand. Compare that to reaching into a parsed Map with m("id").asInstanceOf[Int], where a renamed key fails at runtime instead of at compile time. When you have 25 fields across a couple of nested objects, writing those classes by hand is exactly the kind of tedious, error-prone work worth handing to a generator. Paste your sample into the JSON to Scala converter, set the root name, and the class drops out ready to commit.

Nested objects become child classes

Scala case classes do not nest anonymously. If your JSON has an object inside an object, the inner one has to become its own named class. The convention the tool follows is to name the child after the parent plus the key, so there is never an ambiguous Address shared between two unrelated parents.

Here is a worked example. Input:

{
  "userId": 42,
  "profile": {
    "city": "SF",
    "followers": 9999999999
  },
  "roles": ["admin", "editor"]
}

With the root named ApiUser, the output is:

case class ApiUserProfile(city: String, followers: Long)
case class ApiUser(userId: Int, profile: ApiUserProfile, roles: List[String])

Two things to notice. First, ApiUserProfile is emitted before ApiUser, so the file reads leaf first and the references resolve cleanly, even when you paste them straight into the REPL. Second, followers came back as Long, not Int. The value 9999999999 is past the signed 32-bit range, so the inference picks the wider type. An integer inside the 32-bit range stays Int, and anything with a decimal point becomes Double. JSON has only one number type, so this split is the part you most want a tool to get right rather than guessing yourself.

Option for nullable and missing fields

This is where modeling JSON in Scala gets opinionated. The idiomatic way to represent "this key might not be there" is Option[T]. A field seen as null, or present in some array elements and missing from others, should become Option[T] so the decoder never throws on absence.

Consider an array where the elements disagree:

[
  { "sku": "A1", "discount": 10 },
  { "sku": "A2" }
]

The objects fold into one case class, and discount is marked optional because the second element does not carry it:

case class Root(sku: String, discount: Option[Int])

That single folded class is deliberate. Two near-identical objects do not produce two near-duplicate types; they collapse into one, with the union of keys, and anything not universal turns into an Option. One thing to watch: a field that only ever appears as null becomes Option[Any], because null carries no type information. If you know the real shape, paste a non-null sample so a proper child class can be inferred instead.

Wiring it into circe and play-json

A generated case class is the first half of a decoder. The second half is one line. With circe:

import io.circe.generic.semiauto._
implicit val dec = deriveDecoder[ApiUser]

With play-json:

implicit val fmt = Json.format[ApiUser]

Because nullable and missing fields already came through as Option[T], the derived decoder handles absent keys without a runtime failure. This is the round of bugs you skip: the ones where you forgot a single field was optional and a production payload without it throws three weeks later. Generate the class with Option inference on, derive the codec, and the absence is handled by the type, not by a try/catch.

Spark, Akka, and List versus Seq

In Spark, a case class is what Dataset[T] is built on. A typed Dataset[ApiUser] gives you column names the Catalyst optimizer understands and compile-time checks on your transformations, which a DataFrame of Row objects does not. Generate the case class from a sample record and you have the schema for free.

In Akka HTTP, the same class doubles as the unmarshalling target for a request entity, so a route's entity(as[CreateOrderRequest]) is type-safe end to end. Arrays default to List[T], the collection most Scala code reaches for in a case class. If your codebase prefers the immutable Seq alias because it abstracts over the concrete collection, flip the Seq toggle and every array switches to Seq[T] instead. Scalar arrays infer their element type, so ["en", "zh"] becomes List[String], while a genuinely mixed array widens to List[Any] because no single element type fits.

A note from building real models with this

I lean on this conversion most when I am integrating an upstream API I do not control. The last one had a user object with a nested settings block and an array of permission entries, and the elements of that array were not uniform: some had an expiresAt, some did not. Hand-writing the case classes meant I had to eyeball every element to spot which fields were optional, and I missed one the first time, which surfaced as a decode failure in staging. Pasting the full sample array and letting the fold mark expiresAt as Option[Long] caught it before I wrote a line of decoder code. The other thing I appreciate is that nothing leaves the tab. The JSON I paste, often a real production response, is parsed locally with the browser's JSON.parse, so I never have to think about whether a payload with internal IDs went somewhere it should not.

Where this fits in a polyglot stack

JSON-to-type generation is the same idea across languages, and a Scala backend rarely lives alone. If your frontend consumes the same payloads, the JSON to TypeScript interface generator gives you the matching shapes on the client side, so the contract stays consistent from the database row to the React component. Generate the Scala case class for the service, generate the TypeScript interface for the UI, and both halves of the wire agree on what nullable means.

The short version: a case class turns an untyped blob into something the compiler can reason about. Nested objects become named children, numbers split into Int, Long, and Double, nullable keys become Option, and circe or play-json picks it up from there. Paste a real sample, name the root, and you have a typed model before you write the decoder.


Made by Toolora · Updated 2026-06-13