Skip to main content

CSV to YAML: Turn a Spreadsheet Export Into a Clean Object Array

Convert CSV to YAML the right way: header row becomes keys, each data row becomes an object, and type inference keeps your numbers, booleans and nulls real.

Published By Li Lei
#csv to yaml #yaml #converter #config #seed data

CSV to YAML: Turn a Spreadsheet Export Into a Clean Object Array

A spreadsheet is a grid. YAML config is a list. The gap between those two shapes is where most ad-hoc conversion scripts go to die — someone writes a 40-line Python snippet, forgets that a cell can contain a comma, and ships a fixtures file with the columns shifted one place to the right. Converting CSV to YAML is a small job that hides a few sharp edges, and getting it right means understanding exactly what shape your data should land in.

The short version: a CSV table becomes a YAML array of objects. The header row supplies the keys. Each data row becomes one object. That single mapping is what almost every config loader, seed file and fixtures reader on the planet expects, which is why this conversion is so useful and so worth doing carefully.

One Row Becomes One Object

Here is the rule that everything else follows from. Take the header row and read off the column names. Then for each data row, pair each value with the matching column name. That row is now a YAML map. Collect every map under a sequence, and the whole file is a YAML array of objects.

Concretely, this CSV:

name,age,active
Alice,30,true
Bob,25,false

becomes this YAML:

- name: Alice
  age: 30
  active: true
- name: Bob
  age: 25
  active: false

Two data rows, two objects in the sequence. The header name,age,active became the keys on every object. Nothing magical happened — but notice age came out as the number 30, not the string "30", and active is a real boolean. That is type inference doing its job, and it is the difference between a seed file that loads clean and one that buries you in string-to-int coercions.

If your CSV has no header row, you do not lose the structure — you just label the columns generically. Uncheck the header switch and Alice,30 becomes - col_1: Alice / col_2: 30, with keys sized to your widest row. Raw logs and headerless exports convert fine that way.

Why Spreadsheets Want to Be Config

The reason this conversion comes up constantly is that spreadsheets are where non-engineers maintain lists, and config files are where engineers consume them. A product owner keeps a Google Sheet of demo users. An ops lead tracks service replica counts in a tab so the rest of the team can edit it. Translators fill a grid with one column per language. None of those people want to touch YAML, and they shouldn't have to.

So the workflow is: they edit the sheet, you export to CSV, and you convert that CSV to YAML to drop into:

  • A Rails or Laravel seed / fixtures file — demo users with real boolean active flags and numeric ids.
  • A Helm values list or Kubernetes ConfigMap — services and replica counts that kubectl apply accepts because the numbers stayed numeric.
  • A Rails / i18n locale file — a key column plus one column per language, with CJK and accented text intact.
  • A GitHub Actions matrix — paste the YAML sequence under matrix.include and each row is one matrix entry.

The same table, four destinations. The spreadsheet stays the editable source of truth and the YAML is generated, never hand-maintained. The CSV to YAML converter does the table-to-list translation in your browser so there is no script to babysit.

The Comma Trap, and How Quoting Saves You

The single most common way a CSV-to-YAML conversion goes wrong is a comma inside a cell. If a city field holds Portland, OR and that value isn't quoted, a naive parser sees two columns where you meant one, and every column after it shifts right. Your state value lands in the zip field, and the corruption is silent.

The fix is the RFC 4180 spec, which every real spreadsheet follows on export. A field wrapped in double quotes may contain the delimiter, line breaks, and escaped quotes written as two double quotes in a row. So "Portland, OR" stays a single value, and "Quote ""Bob""" decodes to Quote "Bob". A correct converter parses that way by default — paste a genuine export and the columns line up.

When they don't line up, the culprit is usually the delimiter. European and German exports frequently use a semicolon, because the comma is their decimal separator. If every row collapses into one giant value, switch the delimiter to semicolon or tab. The column count shown next to the output tells you instantly whether you guessed right.

When to Turn Type Inference Off

Type inference is the right default. "123" becomes the number 123, "true" and "false" become booleans, and an empty cell becomes null. Your YAML carries real types, which is exactly what you want for a config that a loader will read.

But inference has one job it must not do: mangle a string that only looks like a number. A zip code 007 becomes 7 and loses its leading zeros. A version string 1.10 collapses to 1.1. An SKU or a phone number gets quietly numericized. For columns like those, turn inference off — every cell then stays text, and the converter quotes anything ambiguous so a value like 007 survives as "007".

The rule of thumb: inference on for genuine data (ages, prices, counts, flags), inference off for identifiers and codes. If a column mixes both, split it out and convert it separately.

I learned this the hard way on a fixtures file for a payments test suite. I had a column of card BINs — the six-digit prefixes — and several of them started with a zero. I converted with inference on, the leading zeros vanished, and a dozen test cases started matching the wrong card network. It took me an embarrassingly long afternoon to trace a flaky test back to a YAML file that had silently turned 004214 into 4214. Now the first thing I check on any export with ID-shaped columns is whether inference should be off.

YAML or JSON?

Both formats describe the same data, so the choice is about who reads it. YAML wins for human-edited config — Kubernetes manifests, GitHub Actions, docker-compose, Rails locales, Ansible vars — because comments and block scalars stay readable. JSON wins for APIs and machine pipelines where you never expect a person to open the file.

The good news is you don't have to commit. The same CSV input feeds the CSV to JSON converter if you need JSON instead, and once you have YAML, the YAML formatter re-indents and validates it so what you paste into your repo is clean YAML 1.2. Convert once, keep whichever format the consumer wants.

A Quick Checklist Before You Paste

  • Is the first row really your header? If not, uncheck the header switch.
  • Did the source export with proper quoting? Spot-check a row with a comma in it.
  • Is the delimiter right? Watch the column count for the giveaway.
  • Should any column stay text? Turn inference off for codes and IDs.

Everything runs client-side — the RFC 4180 parser, the type inference and the YAML stringifier are all JavaScript in your browser tab, with no upload and no logging. The one caveat: the share link encodes your CSV in the URL, so for a confidential table use the copy or download button instead of pasting a share link into chat. Close the tab and nothing is left behind.


Made by Toolora · Updated 2026-06-13