json-definition-abnf

JSON definition as ABNF

The original definition (grammar) of JSON on json.org is given in McKeeman form, its specifications in ECMA 2009 and as ECMA-404 sport a “human readable” formulation, but its IETF RFC’s (4627 / 7159 / 8259) use ABNF, which is attractive for usual validators and parsers.

For formal interests like defining a subset of JSON, eg. with restricted object keys, an EBNF representation would be even nicer, but ABNF may outfavor EBNF in handling characters by Unicode points and classes, and if ABNF validators / parsers are quicker at hand, here you are.

The following is the JSON grammar from its RFC’s up to 8259 (see above) - in the RFC’s it is given in parts between explanations in natural language, but not in a concise “block” as given here. For clarity, some more layout and comments are slightly changed or added. Other than McKeeman form, line breaks only with trailing / mean OR, otherwise AND.

The “official” RFC grammar uses ABNF’s predefined “Core Rules” DIGIT and HEXDIG- since not every parser seems to implement Core Rules, they are explicated here at the end.

Note that you may encounter even good JSON validators / parsers that will reject pure strings like "xyz" and / or pure numbers like 999, despite being corrext JSON according to the offical definition (see above) - for pure strings the JSON requirement of surrounding double quotes often is a problem at the input for validators / parsers, i.e. you may have to single quote them as e.g. '"xyz"'. Pure strings - including an empty string "" - and pure numbers may be results of, e.g., hitting a database that returns JSON. The ABNF below handles pure strings - including empty ones - and numbers correctly.

JSON-text = ws value ws ;
begin-array     = ws %x5B ws  ; %x5B = [
begin-object    = ws %x7B ws  ; %x7B = {
end-array       = ws %x5D ws  ; %x5D = ]
end-object      = ws %x7D ws  ; %x7D = }
name-separator  = ws %x3A ws  ; %x3A = : colon
value-separator = ws %x2C ws  ; %x2C = comma
ws = *(                       ; none or many of one of the following
  %x20 /                      ; Space
  %x09 /                      ; Horizontal tab
  %x0A /                      ; Line feed or New line
  %x0D                        ; Carriage return
  )
value =                       ; one of the following
  false /
  null /
  true /
  object /
  array /
  number /
  string
false = %x66.61.6c.73.65      ; charcodes for "false"
null  = %x6e.75.6c.6c         ; charcodes for "null"
true  = %x74.72.75.65         ; charcodes for "true"
object =
  begin-object
    [                         ; zero or one of all of the following
      member
      *(                      ; none or many of all of the following
        value-separator member
      )
    ]
  end-object
member = string name-separator value
array =
  begin-array
    [                         ; zero or one of all of the following
      value
      *(                      ; none or many of all of the following
        value-separator value
      )
    ]
  end-array
number =
    [                         ; zero or one of all of the following
      minus
    ]
  int                         ; mandatory
    [                         ; zero or one of all of the following
      frac
    ]
    [                         ; zero or one of all of the following
      exp
    ]
decimal-point = %x2E          ; %x2E = .
digit1-9 = %x31-39            ; Unicode range from %x31 to %x39 = 1-9
e = %x65 / %x45               ; %x65 = e, %x45 = E => exponent marker lower or upper case
exp =
  e
    [                         ; zero or one of one of the following
      minus /
      plus
    ]
  1*DIGIT                     ; one or many of this
frac =
  decimal-point
  1*DIGIT                     ; one or many of this
int = zero / ( digit1-9 *DIGIT )

minus = %x2D               ; -

plus = %x2B                ; +

zero = %x30                ; 0

string = quotation-mark *char quotation-mark

char = unescaped /
    escape (
        %x22 /          ; "    quotation mark  U+0022
        %x5C /          ; \    reverse solidus U+005C
        %x2F /          ; /    solidus         U+002F
        %x62 /          ; b    backspace       U+0008
        %x66 /          ; f    form feed       U+000C
        %x6E /          ; n    line feed       U+000A
        %x72 /          ; r    carriage return U+000D
        %x74 /          ; t    tab             U+0009
        %x75 4HEXDIG )  ; uXXXX                U+XXXX

escape = %x5C              ; \

quotation-mark = %x22      ; "

unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
DIGIT	= %x30-39 ;
HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F"