A Modified Earley Parser for Huge Natural Language Grammars

For almost a half century Earley parser has been used in the parsing of context-free grammars and it is considered as a touch-stone algorithm in the history of parsing algorithms. On the other hand, it is also known for being expensive from its time requirement and memory usage perspectives. For huge context-free grammars, its performance is not good since its time complexity also depends on the number of rules in the grammar. The time complexity of the original Earley parser is O(RN) where N is the string length, and R is the number of rules. In this paper, we aim to improve time and memory usage performances of Earley parser for grammars with a large number of rules. In our approach, we prefer radix tree representation for rules instead of list representation as in original Earley parser. We have tested our algorithm using different number of rule sets up to 200,000 which are all learned by an examplebased machine translation system. According to our evaluation results, our modified parser has a time bound of O(log(R)N), and it has 20% less memory usage regarding the original Earley parser.

[1]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[2]  Donald E. Knuth,et al.  On the Translation of Languages from Left to Right , 1965, Inf. Control..

[3]  Walter L. Ruzzo,et al.  An Improved Context-Free Recognizer , 1980, ACM Trans. Program. Lang. Syst..

[4]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[5]  Michael A. Arbib,et al.  An Introduction to Formal Language Theory , 1988, Texts and Monographs in Computer Science.

[6]  H. Altay Güvenir,et al.  Learning Translation Templates from Bilingual Translation Examples , 2004, Applied Intelligence.

[7]  Alain Pirotte,et al.  Efficient parsing algorithms for general context-free parsers , 1975, Inf. Sci..

[8]  Ceriel J. H. Jacobs,et al.  Parsing Techniques - A Practical Guide , 2007, Monographs in Computer Science.

[9]  Ales Horák,et al.  New Meta-grammar Constructs in Czech Language Parser synt , 2005, TSD.

[10]  Masaru Tomita,et al.  An Efficient Augmented-Context-Free Parsing Algorithm , 1987, Comput. Linguistics.

[11]  R. Nigel Horspool,et al.  Directly-Executable Earley Parsing , 2001, CC.

[12]  R. Nigel Horspool,et al.  A Faster Earley Parser , 1996, CC.

[13]  Alfred V. Aho,et al.  The Theory of Parsing, Translation, and Compiling , 1972 .

[14]  Ilyas Cicekli Inducing translation templates with type constraints , 2006, Machine Translation.

[15]  R. Nigel Horspool,et al.  Practical Earley Parsing , 2002, Comput. J..