An Efficient Context-Free Parsing Algorithm for Natural Languages

This thesis introduces an efficient context-free parsing algorithm and emphasizes its practical value in natural language processing. In the theoretical worst case analysis, the parsing algorithm occasionally takes more than O(n('3)) time with kinds of context-free grammars which are very unlikely to appear in natural languages. As far as practical natural language processing is concerned, on the other hand, the parsing algorithm seems more efficient than any existing algorithms including Earley's algorithm. Experiments with several English grammars and sample sentences show that our algorithm is 5 to 10 times faster than Earley's standard algorithm. The parsing algorithm can be viewed as an extended LR parsing algorithm which embodies the concept of a "graph-structured stack." Unlike the standard LR, the algorithm is capable of handling arbitrary non-cyclic context-free grammars including ambiguous grammars, with little loss of LR efficiency. In particular, if its grammar is "close" to LR, most of the LR parsing efficiency can be preserved. Natural language grammars are, fortunately, considerably "close" to LR, compared with other general context-free grammars. The algorithm is an all-path parsing algorithm; it produces all possible parse trees (a parse forest) in an efficient representation called a "shared-packed forest." This thesis also shows that Earley's forest representation has a defect and his representation cannot be used in natural language processing. The last chapters of the thesis suggest practical applications of the algorithm. A concept of left-to-right on-line parsing is introduced, taking advantage of the fact that our algorithm parses a sentence strictly from left to right. Several benefits of on-line parsing are described, and its application to user-friendly natural language interface is discussed. This thesis also proposes a technique to disambiguate a sentence out of the shared-packed forest representation by asking the user questions interactively. Finally, a personal/interactive machine translation system is suggested.

[1]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[2]  John Cocke,et al.  Programming languages and their compilers: Preliminary notes , 1969 .

[3]  John Cocke,et al.  Programming languages and their compilers , 1969 .

[4]  F. L. Deremer,et al.  Practical translators for LR(k) languages , 1969 .

[5]  Franklin L. DeRemer,et al.  Simple LR(k) grammars , 1971, Commun. ACM.

[6]  Michael J. Fischer,et al.  Boolean Matrix Multiplication and Transitive Closure , 1971, SWAT.

[7]  Alfred V. Aho,et al.  The Theory of Parsing, Translation, and Compiling , 1972 .

[8]  Alfred V. Aho,et al.  LR Parsing , 1974, ACM Comput. Surv..

[9]  Alain Pirotte,et al.  Efficient parsing algorithms for general context-free parsers , 1975, Inf. Sci..

[10]  Vaughan R. Pratt,et al.  LINGOL: a progress repor , 1975, IJCAI 1975.

[11]  V. Prati,et al.  LINGOL-A Progress Report , 1975, IJCAI.

[12]  Leslie G. Valiant,et al.  General Context-Free Recognition in Less than Cubic Time , 1975, J. Comput. Syst. Sci..

[13]  Michael A. Harrison,et al.  Parsing of General Context-Free Languages , 1976, Adv. Comput..

[14]  Alfred V. Aho,et al.  Principles of Compiler Design , 1977 .

[15]  Alfred V. Aho,et al.  Principles of Compiler Design (Addison-Wesley series in computer science and information processing) , 1977 .

[16]  Walter L. Ruzzo,et al.  An Improved Context-Free Recognizer , 1980, ACM Trans. Program. Lang. Syst..

[17]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[18]  Masaru Tomita,et al.  An efficient all-paths parsing algorithm for natural languages , 1984 .

[19]  Masaru Tomita,et al.  LR Parsers For Natural Languages , 1984, ACL.