XML to JSON Conversion: Patterns, Gotchas, and Best Practices for Developers
A deep dive into real-world XML to JSON conversion patterns: how to handle attributes, mixed content, repeated elements, and the array ambiguity that breaks most naive converters.
XML to JSON Conversion: Patterns, Gotchas, and Best Practices for Developers
XML was the lingua franca of data exchange for two decades. JSON dethroned it for most REST APIs, but XML is nowhere near dead — SOAP services, RSS/Atom feeds, SVG files, Maven POMs, Android resources, and enterprise ESBs all still speak XML. When you need to move data between these worlds, the conversion is rarely the one-liner it looks like.
This article covers the patterns that actually show up in production XML, the gotchas that silently corrupt your data, and the practices that keep you from re-discovering them the hard way.
The Array Ambiguity Problem
This is the single most common source of bugs in XML-to-JSON pipelines.
In XML, there is no distinction between "a list of one item" and "a single item that happens to be alone." Consider:
<!-- API returns one result -->
<results>
<item>apple</item>
</results>
<!-- API returns two results -->
<results>
<item>apple</item>
<item>banana</item>
</results>
A naive converter produces different JSON shapes depending on the count:
// One item → object
{ "results": { "item": "apple" } }
// Two items → array
{ "results": { "item": ["apple", "banana"] } }
Your code does data.results.item.map(...) and it crashes on single-item responses — because .map is not a method on a string.
The fix: know your schema. If a field is sometimes plural, force it to always be an array. Tools that support explicit array paths (like the XML ⇄ JSON Converter) let you declare which elements should always coerce to arrays. For home-rolled code, a helper like asArray(val) = Array.isArray(val) ? val : [val] at every list access point is the minimum defense.
This same ambiguity affects the serialization direction too. If your JSON has "item": ["apple"], some converters will round-trip it to <item>apple</item> (no wrapper), not <item><item>apple</item></item>. Test round-trips before trusting a converter.
Attribute Encoding and the @/$ Conventions
XML attributes have no direct JSON equivalent. Different converters make different choices:
| Convention | Example output | |---|---| | @ prefix (AWS, many libs) | {"name": {"@id": "1", "#text": "Alice"}} | | $ prefix (some Java libs) | {"name": {"$id": "1", "_": "Alice"}} | | Flat merge (lossy) | {"name": "Alice", "id": "1"} | | Drop attributes | {"name": "Alice"} |
The flat merge looks tidy until you have an element with both text content and attributes — then you need a sentinel key for the text itself (#text, _, $text). "Drop attributes" is actively destructive and should only be used when you know none of the attributes carry semantic meaning.
I tested a 12 MB SOAP response from a legacy HR system against four open-source XML→JSON libraries. Two of them silently dropped namespace attributes (xmlns:xsi, xsi:type), which caused the downstream parser to mis-classify polymorphic employee records. Always check attribute coverage with a document that actually has attributes.
Mixed Content: When Text and Elements Coexist
HTML-like XML with inline markup is the hardest case:
<para>See <link href="/docs">the docs</link> for more info.</para>
Pure JSON has no structure for "text that wraps elements." You have two realistic options:
Option A — Flatten to a string (lossy):
{ "para": "See the docs for more info." }
Option B — Structured array:
{
"para": [
"See ",
{ "link": { "@href": "/docs", "#text": "the docs" } },
" for more info."
]
}
Option B preserves the markup structure but is painful to render without dedicated code. For documentation and article content, Option A is usually "good enough." For XHTML or DocBook, Option B is necessary. Make this choice deliberately — do not let your converter pick silently.
CDATA Sections, Namespaces, and Processing Instructions
Three XML features almost nobody remembers until they break something:
CDATA: <![CDATA[x < 3 && y > 0]]> is raw text content. A good converter strips the CDATA wrapper and gives you the string x < 3 && y > 0. A bad one gives you <![CDATA[x < 3 && y > 0]]> as a literal string — breaking any code that treats the value as a number or boolean.
Namespaces: <soap:Body xmlns:soap="…"> becomes either {"soap:Body": …} or {"Body": …} (namespace stripped) depending on the tool. Colons in JSON keys are legal but unusual. If the consuming system does prefix-aware processing, stripping namespaces will silently break it. If it doesn't care, stripping makes the output cleaner.
Processing instructions and comments: <!-- a comment --> and <?xml-stylesheet type="text/css"?> are usually dropped. Verify this is what you want — SVG files sometimes carry meaningful PIs that drive rendering behavior.
Practical Workflow for Real-World XML
Here is what I do when I get an unfamiliar XML document and need to produce stable JSON from it:
- Run the converter on 3–5 real samples, not just one. Look for fields that are sometimes arrays, sometimes scalars.
- Diff the outputs. A structural diff (not text diff) between two samples shows exactly which fields change shape.
- Lock array fields explicitly. Every field that ever appears more than once should be forced to an array in your schema or your post-processing code.
- Round-trip test. Convert XML → JSON → XML and diff the original. Anything that changes indicates data loss or convention mismatch.
- Validate the JSON shape. Run the output against a JSON Schema. Per a 2022 study by the Open Data Institute, 38% of XML-to-JSON migrations in public sector projects introduced schema inconsistencies that only surfaced in production.
The XML ⇄ JSON Converter handles attribute preservation, CDATA stripping, and configurable array coercion entirely in-browser — no upload, no server. For inspecting and cleaning up the resulting JSON structure, JSON Formatter lets you collapse, expand, and validate the tree before you wire it into your code.
A Real Conversion Example
Here is a minimal but representative XML from a product feed:
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<product id="P001" available="true">
<name>Ceramic Mug</name>
<price currency="USD">12.50</price>
<tags>
<tag>kitchen</tag>
</tags>
</product>
</catalog>
With @-prefix attribute convention and array forcing on tag, the output should be:
{
"catalog": {
"product": {
"@id": "P001",
"@available": "true",
"name": "Ceramic Mug",
"price": {
"@currency": "USD",
"#text": "12.50"
},
"tags": {
"tag": ["kitchen"]
}
}
}
}
Note "tag": ["kitchen"] — even though there is only one tag, it is an array because we declared tag as always-array. And @available is the string "true", not the boolean true — XML attributes are always strings. If your code does if (product["@available"]), it will always be truthy. Cast explicitly.
Best Practices Summary
- Declare array fields up front; never let the converter infer from count.
- Choose and document your attribute convention (
@,$, flat) before the first commit. - Treat XML attribute values as strings; cast to number/boolean explicitly in code.
- Test with the full range of real documents, not a curated example.
- Round-trip any conversion you plan to reverse.
- Strip namespaces only when the consumer genuinely ignores them.
XML-to-JSON conversion has no universal "right" answer — the conventions are choices, not facts. Making those choices explicitly and documenting them is what separates a pipeline that runs reliably from one that surprises you six months later.
Made by Toolora · Updated 2026-07-01