JSON is not a YAML subset

People on the internet believe that JSON is a subset of YAML, and that it's safe to parse JSON using a YAML parser:

Following this advice will end badly because JSON is not a subset of YAML. It is easy to construct JSON documents that (1) fail to parse as YAML, or (2) parse to valid but semantically different YAML. The second case is more dangerous because it's difficult to detect.

False has over "1.7e3" named fjords

YAML (infamously) allows string scalars to be unquoted. A conforming YAML parser, presented with a token known to contain a scalar value, must match that token against a set of patterns and then fall back to treating it as a string. This behavior produces surprising outcomes, and has been named The Norway Problem.

@$ irb-3.1.2
require 'yaml'
@=> true
YAML.load '[FI,NO,SE]'
@=> ["FI", false, "SE"]

A similar issue affects JSON documents passed to a YAML parser when dealing with numbers in exponential notation. The YAML 1.1 spec is stricter about the syntax of numbers than JSON: 1e2 is a valid JSON number, but YAML 1.1 requires it to be written as 1.0e+2. Being an invalid number, the YAML parser will treat it as a string.

@$ irb-3.1.2
require 'json'
@=> true
require 'yaml'
@=> true
JSON.load '{"a": 1e2}'
@=> {"a"=>100.0}
YAML.load '{"a": 1e2}'
@=> {"a"=>"1e2"}

YAML 1.2 won't save you

YAML 1.2 is a revision to the YAML spec that (among other goals) aims to make YAML a proper superset of JSON. To maintain backwards compatibility with existing YAML documents, the version is specified in a %YAML directive.

---
a: 1e2  # document["a"] == "1e2"
b: no   # document["b"] == false

%YAML 1.2
---
a: 1e2  # document["a"] == 100
b: no   # document["b"] == "no"

Regardless of whether YAML 1.2 has been (or will be) widely adopted, it does not help those who want to parse a JSON document with a YAML parser. JSON documents do not start with %YAML, and therefore cannot opt-in to the YAML parser behavior that would permit correct parsing of JSON.

Change Feed