Notes

Chapter 10: Processes of Perception and Analysis

Section 12: Human Thinking

Context-free languages

The set of valid expressions in a context-free language can be defined recursively by rules such as "e"  "e + e" and "e"  "( e )" that specify how one expression can be built up from sequences of literal objects or "tokens" and other expressions. (As discussed on page 939, the fact that the left-hand side contains nothing more than "e" is what makes the language context free.) To interpret or parse an expression in a context-free language one has to go backwards and find out which rules could be used to generate that expression. (For the built-in syntax of Mathematica this is achieved using ToExpression.)

It is convenient to think of expressions in a language as having forms such as s["(", "(", ")", ")"] with Attributes[s] = Flat. Then the rules for the language consisting of balanced runs of parentheses (see page 939) can be written as

{s[e]  s[e, e], s[e]  s["(", e, ")"], s[e]  s["(",")"]}

Different expressions in the language can be obtained by applying different sequences of these rules, say using (this gives so-called leftmost derivations)

Fold[# /. rules〚#2〛 &, s[e], list]

Given an expression, one can then use the following to find a list of rules that will generate it—if this exists:

Parse[rules_, expr_] := Catch[Block[{t = {}}, NestWhile[ReplaceList[#, MapIndexed[ReverseRule, rules]] &, {{expr, {}}}, (# /. {s[e], u_}  Throw[u]; # =!= {}) &];]]

ReverseRule[a_  b_, {i_}] := {___, {s[x___, b, y___], {u___}}, ___}  {s[x, a, y], {i, u}} /; FreeQ[s[x], s[a]]

In general, there will in principle be more than one such list, and to pick the appropriate list in a practical situation one normally takes the rules of the language to apply with a certain precedence—which is how, for example, x + y z comes to be interpreted in Mathematica as Plus[x, Times[y, z]] rather than Times[Plus[x, y], z]. (Note that in practice the output from a parser for a context-free language is usually represented as a tree—as in Mathematica FullForm—with each node corresponding to one rule application.)

Given only the rules for a context-free language, it is often very difficult to find out the properties of the language (compare page 944). Indeed, determining even whether two sets of rules ultimately yield the same set of expressions is in general undecidable (see page 1138).