On The Ambiguity Problem of Backus Systems

Backus [1] has developed an elegant method of defining well-formed formulas for computer languages such as ALGOL. It consists of (our notation is slightly different from that of Backus):<list><item>A finite alphabet: <italic>a</italic><subscrpt>1</subscrpt>, <italic>a</italic><subscrpt>2</subscrpt>, …, <italic>a<subscrpt>t</subscrpt></italic>; </item><item>Predicates: <italic>P</italic><subscrpt>1</subscrpt>, <italic>P</italic><subscrpt>2</subscrpt>, …, <italic>P</italic><subscrpt>@@@@</subscrpt>; </item><item>Productions, either of the form (a) <italic>a<subscrpt>j</subscrpt></italic> ∈ <italic>P<subscrpt>i</subscrpt></italic>; or of the form (b) <italic>P</italic><subscrpt><italic>i</italic><subscrpt>2</subscrpt></subscrpt><italic>P</italic><subscrpt><italic>i</italic><subscrpt>1</subscrpt></subscrpt> … <italic>P<subscrpt>i<subscrpt>t</subscrpt></subscrpt></italic> → <italic>P<subscrpt>j</subscrpt></italic>. </item></list> A <italic>word</italic> is a finite sequence of letters from the alphabet. Then IIIa states that certain words (containing only one letter) belong initially to some of the predicates, and IIIb states that if words <italic>W</italic><subscrpt>1</subscrpt>, <italic>W</italic><subscrpt>2</subscrpt>, …, <italic>W<subscrpt>t</subscrpt></italic> belong to the predicates <italic>P</italic><subscrpt><italic>i</italic><subscrpt>1</subscrpt></subscrpt>, <italic>P</italic><subscrpt><italic>i</italic><subscrpt>2</subscrpt></subscrpt>, …, <italic>P</italic><subscrpt>i<subscrpt>t</subscrpt></subscrpt> respectively, then the concatenation <italic>W</italic><subscrpt>1</subscrpt><italic>W</italic><subscrpt>2</subscrpt> … <italic>W<subscrpt>t</subscrpt></italic> belongs to <italic>P<subscrpt>j</subscrpt></italic>. We call this a <italic>Backus</italic> system. A simple example of such a system is: Alphabet: <italic>a</italic>, <italic>b</italic>; Predicates: <italic>P</italic>, <italic>Q</italic>, <italic>R</italic>; Productions: <italic>a</italic> ∈ <italic>P</italic>, <italic>b</italic> ∈ <italic>Q</italic>, <italic>PQ</italic> → <italic>R</italic>, <italic>QP</italic> → <italic>R</italic>; <italic>RR</italic> → <italic>R</italic>, <italic>PRQ</italic> → <italic>R</italic>, <italic>QRP</italic> → <italic>R</italic>. Then <italic>P</italic> and <italic>Q</italic> contain only the words <italic>a</italic> and <italic>b</italic>, respectively, while <italic>R</italic> contains all words which have the same number of <italic>a</italic>'s and <italic>b</italic>'s. In the above example, <italic>abab</italic> belongs to <italic>R</italic> and can be produced in two ways. Namely, as <italic>ab</italic> ∈ <italic>R</italic> and <italic>RR</italic> → <italic>R</italic>, <italic>abab</italic> ∈ <italic>R</italic>; also as <italic>ba</italic> ∈ <italic>R</italic> and <italic>PRQ</italic> → <italic>R</italic>, <italic>abab</italic> ∈ <italic>R</italic>. We call a Backus system <italic>ambiguous</italic> if one of its predicates contains a word which can be produced in more than one way. As, in practice, the meaning of a word is determined by the way it is produced, an ambiguous Backus System must be avoided. As the following example illustrates, ALGOL 60 [3] is ambiguous: if <italic>B</italic> ∧ <italic>C</italic> then for <italic>I</italic>: = 1 step 1 until <italic>N</italic> do if <italic>D</italic> ∨ <italic>E</italic> then <italic>A</italic>[<italic>I</italic>] : = 0 else <italic>K</italic> : = <italic>K</italic> + 1; <italic>K</italic> : = <italic>K</italic> - 1 In fact, both for <italic>I</italic> := 1 step 1 until <italic>N</italic> do if <italic>D</italic> ∨ <italic>E</italic> then <italic>A</italic>[<italic>I</italic>] := 0 and for <italic>I</italic> := 1 step 1 until <italic>N</italic> do if <italic>D</italic> ∨ <italic>E</italic> then <italic>A</italic>[<italic>I</italic>] := 0 else <italic>K</italic> := <italic>K</italic> + 1 are valid for statements of ALGOL 60. Combining the first with if <italic>B</italic> ∧ <italic>C</italic> then … else <italic>K</italic> = <italic>K</italic> + 1; or the second with if <italic>B</italic> ∧ <italic>C</italic> then … gives rise to the above example, and these two methods of construction correspond to the two possible meanings of the example. D. Dahm and H. Trotter, in a private communication, have raised the question: “Does there exist an algorithm to determine whether a Backus system is ambiguous?” We call this the <italic>ambiguity problem</italic>. The purpose of this paper is to show that no such algorithm exist, i.e., that the ambiguity problem is unsolvable. We first define a <italic>normal system</italic>. It consists of:<list><item>A finite alphabet: <italic>a</italic><subscrpt>1</subscrpt>, <italic>a</italic><subscrpt>2</subscrpt>, …, <italic>a<subscrpt>t</subscrpt></italic>; </item><item>A finite collection of ordered pairs: (<italic>g</italic><subscrpt>1</subscrpt>, <italic>g</italic><subscrpt>1</subscrpt>), (<italic>g</italic><subscrpt>2</subscrpt>, <italic>g</italic><subscrpt>2</subscrpt>), …, (<italic>g<subscrpt>r</subscrpt></italic>, <italic>g<subscrpt>r</subscrpt></italic>), where the <italic>g<subscrpt>i</subscrpt></italic> and <italic>g<subscrpt>i</subscrpt></italic> are words. </item><item>An axiom <italic>A</italic> which is some fixed word. </item></list> If <italic>U</italic> and <italic>V</italic> are words, we say <italic>U</italic> → <italic>V</italic> if <italic>U</italic> is of the form <italic>gP</italic> and <italic>V</italic> is of the form <italic>Pg</italic> where (<italic>g,g</italic>) is one of the ordered pairs. We also write, in this case, <italic>g<subscrpt>i</subscrpt>P</italic> → <italic>Pg<subscrpt>i</subscrpt></italic>. Also, if <italic>U</italic><subscrpt>1</subscrpt>, <italic>U</italic><subscrpt>2</subscrpt>, …, <italic>U<subscrpt>n</subscrpt></italic> are words with <italic>U<subscrpt>i</subscrpt></italic> → <italic>U</italic><subscrpt><italic>i</italic>+1</subscrpt>, 1 ≦ <italic>i</italic> ≦ <italic>n</italic>-1, then <italic>U</italic><subscrpt>1</subscrpt> → <italic>U<subscrpt>n</subscrpt></italic>, and we say <italic>U<subscrpt>n</subscrpt></italic> is derived from <italic>U</italic><subscrpt>1</subscrpt>. The words which may be derived from the axiom <italic>A</italic> are called <italic>theorems</italic>. A normal system is called <italic>undecidable</italic> if there does not exist an algorithm for determining whether a word is a theorem of the system. It is implicit in [2, sec. 6.5] that there exists an undecidable normal system, which we denote by <italic>NS</italic>, with the property that in each ordered pair (<italic>g, g</italic>), the words <italic>g</italic> and <italic>g</italic> have no common letters. LEMMA. <italic>If U and V are words of NS, then U → V, if and only if there exists indices j</italic><subscrpt>1</subscrpt>, <italic>j</italic><subscrpt>2</subscrpt>, …, <italic>j<subscrpt>m</subscrpt> such that</italic> <italic>Ug</italic><subscrpt><italic>j</italic>1</subscrpt><italic>g</italic><subscrpt><italic>j</italic>2</subscrpt> … <italic>g<subscrpt>jm</subscrpt></italic> = <italic>g</italic><subscrpt><italic>j</italic>1</subscrpt><italic>g</italic><subscrpt><italic>j</italic>2</subscrpt> … <italic>g<subscrpt>jm</subscrpt>V</italic>. PROOF. Suppose the equality holds. As <italic>g</italic><subscrpt><italic>j</italic>1</subscrpt> and <italic>g</italic><subscrpt><italic>j</italic>1</subscrpt> have no common letters, <italic>U</italic> is of the form <italic>g</italic><subscrpt><italic>j</italic>1</subscrpt><italic>R</italic><subscrpt>1</subscrpt> ; let <italic>U</italic><subscrpt>1</subscrpt> = <italic>R</italic><subscrpt>1</subscrpt><italic>g</italic><subscrpt><italic>j</italic>1</subscrpt>. Then we have <italic>U</italic> → <italic>U</italic><subscrpt>1</subscrpt> and <italic>U</italic><subscrpt>1</subscrpt><italic>g</italic><subscrpt><italic>j</italic>2</subscrpt> … <italic>g<subscrpt>jm</subscrpt></italic> = <italic>g</italic><subscrpt><italic>j</italic>2</subscrpt><italic>g</italic><subscrpt><italic>j</italic>3</subscrpt> … <italic>g<subscrpt>jm</subscrpt>V</italic>. Proceeding inductively, we obtain a sequence of words, <italic>U</italic>, <italic>U</italic><subscrpt>1</subscrpt>, <italic>U</italic><subscrpt>2</subscrpt>, …, <italic>U<subscrpt>m</subscrpt></italic> = <italic>V</italic> with <italic>U</italic> → <italic>U</italic><subscrpt>1</subscrpt> → … → <italic>U<subscrpt>m</subscrpt></italic> ; hence <italic>U</italic> → <italic>V</italic>. Conversely, if <italic>U</italic> → <italic>V</italic>, then there exist words <italic>U</italic><subscrpt>0</subscrpt>, <italic>U</italic><subscrpt>1</subscrpt>, …, <italic>U<subscrpt>m</italic></subscrpt> with <italic>U</italic><subscrpt>0</subscrpt> = <italic>U</italic> and <italic>U<subscrpt>m</subscrpt></italic> = <italic>V</italic>, and indices <italic>j</italic><subscrpt>1</subscrpt>, <italic>j</italic><subscrpt>2</subscrpt>, …, <italic>j<subscrpt>m</subscrpt></italic> such that <italic>U</italic><subscrpt><italic>i</italic>-1</subscrpt><italic>g<subscrpt>ji</subscrpt></italic> = <italic>g<subscrpt>ji</subscrpt></italic><italic>U<subscrpt>i</subscrpt></italic>, 1 ≦ <italic>i</italic> ≦ <italic>m</italic>. Then <italic>U</italic><subscrpt>0</subscrpt><italic>g</italic><subscrpt><italic>j</italic>1</subscrpt> = <italic>g</italic><subscrpt><italic>j</italic>1</subscrpt> <italic>U</italic><subscrpt>1</subscrpt> or <italic>U</italic><subscrpt>0</subscrpt><italic>g</italic><subscrpt><italic>j</italic>1</subscrpt><italic>g</italic><subscrpt><italic>j</italic>2</subscrpt> = <italic>g</italic><subscrpt><italic>j</italic>1</subscrpt><italic>U</italic><subscrpt>1</subscrpt><italic>g</italic><subscrpt><italic>j</italic>2</subscrpt> = <italic>g</italic><subscrpt><italic>j</italic>1</subscrpt><italic>g</italic><subscrpt><italic>j</italic>2</subscrpt><italic>U</italic><subscrpt>2</subscrpt>. By induction the proof is complete. THEOREM. <italic>The ambiguity

[1]  Martin D. Davis,et al.  Computability and Unsolvability , 1959, McGraw-Hill Series in Information Processing and Computers.

[2]  Friedrich L. Bauer,et al.  Report on the algorithmic language ALGOL 60 , 1960, Commun. ACM.