LR Parsing for Conjunctive Grammars

The Generalized LR parsing algorithm for context-free grammars, introduced by Tomita in 1986, is a polynomial-time implementation of nondeterministic LR parsing that uses graph- structured stack to represent the contents of the nondeterministic parser's pushdown for all possible branches of computation at a single computation step. It has been specifically developed as a solution for practical parsing tasks arising in computational linguistics, and indeed has proved itself to be very suitable for natural language processing. Conjunctive grammars extend context-free grammars by allowing the use of an explicit intersection operation within grammar rules. This paper develops a new LR-style parsing algorithm for these grammars, which is based on the very same idea of a graph-structured pushdown, where the simultaneous existence of several paths in the graph is used to perform the mentioned intersection operation. The underlying finite automata are treated in the most general way: instead of showing the algorithm's correctness for some particular way of constructing automata, the paper defines a wide class of automata usable with a given grammar, which includes not only the traditional LR(k) automata, but also, for instance, a trivial automaton with a single reachable state. A modification of the SLR(k) table construction method that makes use of specific properties of conjunctive grammars is provided as one possible way of making finite automata to use with the algorithm. It is shown that the algorithm is applicable to any conjunctive grammar and can be implemented to work in no more than cubic time. Additionally, the algorithm can be made to work in linear time for the Boolean closure of the family of deterministic context-free languages.

[1]  Alexander Okhotin A recognition and parsing algorithm for arbitrary conjunctive grammars , 2003, Theor. Comput. Sci..

[2]  Franklin L. DeRemer,et al.  Simple LR(k) grammars , 1971, Commun. ACM.

[3]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[4]  Klaas Sikkel,et al.  Parsing of Context-Free Languages , 1997, Handbook of Formal Languages.

[5]  Alfred V. Aho,et al.  LR Parsing , 1974, ACM Comput. Surv..

[6]  Alexander Okhotin,et al.  Conjunctive Grammars , 2001, J. Autom. Lang. Comb..

[7]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[8]  Michael A. Harrison,et al.  Introduction to formal language theory , 1978 .

[9]  Donald E. Knuth,et al.  On the Translation of Languages from Left to Right , 1965, Inf. Control..

[10]  Alexander Okhotin,et al.  Whale calf, a parser generator for conjunctive grammars , 2002, CIAA'02.

[11]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[12]  Donald E. Knuth,et al.  A Characterization of Parenthesis Languages , 1967, Inf. Control..

[13]  Takenobu Tokunaga,et al.  Integration of Morphological and Syntactic Analysis Based on GLR Parsing , 1996 .

[14]  Alexander Okhotin Top-Down Parsing of Conjunctive Languages , 2002, Grammars.

[15]  Masaru Tomita,et al.  Efficient parsing for natural language , 1985 .

[16]  Walter L. Ruzzo,et al.  An Improved Context-Free Recognizer , 1980, ACM Trans. Program. Lang. Syst..