Are Very Large Context-Free Grammars Tractable?

In this paper, we present a method which, in practice, allows to use parsers for languages defined by very large context-free grammars (over a million symbol occurrences). The idea is to split the parsing process in two passes. A first pass computes a sub-grammar which is a specialized part of the large grammar selected by the input text and various filtering strategies. The second pass is a traditional parser which works with the sub-grammar and the input text. This approach is validated by practical experiments performed on a Earley-like parser running on a test set with two large context-free grammars.

[1]  Giorgio Satta,et al.  Review of Generalized LR parsing by Masaru Tomita. Kluwer Academic Publishers 1991. , 1992 .

[2]  Mark-Jan Nederhof,et al.  Generalized Left-Corner Parsing , 1993, EACL.

[3]  Gertjan van Noord An Efficient Implementation of the Head-Corner Parser , 1997, CL.

[4]  Giorgio Satta,et al.  Bidirectional Context-Free Grammar Parsing for Natural Language Processing , 1994, Artif. Intell..

[5]  Bernard Lang,et al.  The Structure of Shared Forests in Ambiguous Parsing , 1989, ACL.

[6]  Giorgio Satta,et al.  Left-To-Right Parsing and Bilexical Context-Free Grammars , 2000, ANLP.

[7]  Benoît Sagot,et al.  From Raw Corpus to Word Lattices: Robust Pre-parsing Processing with SxPipe , 2005 .

[8]  Pierre Boullier On TAG parsing , 2000 .

[9]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[10]  Robert C. Berwick,et al.  The Grammatical Basis of Linguistic Performance: Language Use and Acquisition , 1986 .

[11]  Pierre Boullier Guided Earley Parsing , 2003, IWPT.

[12]  Robert C. Moore,et al.  Improved Left-corner Chart Parsing for Large Context-free Grammars , 2000, IWPT.

[13]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[14]  Richard C. Waters,et al.  Tree Insertion Grammar: A Cubic-Time, Parsable Formalism that Lexicalizes Context-Free Grammar without Changing the Trees Produced , 1995, CL.

[15]  Aravind K. Joshi,et al.  Parsing Strategies with ‘Lexicalized’ Grammars: Application to Tree Adjoining Grammars , 1988, COLING.

[16]  Benoît Sagot,et al.  Efficient and Robust LFG Parsing: SxLFG , 2005, IWPT.

[17]  Éric Villemonte de la Clergerie From metagrammars to factorized TAG/TIG parsers , 2005, IWPT.

[18]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[19]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[20]  Benoît Sagot,et al.  The Lefff 2 syntactic lexicon for French: architecture, acquisition, use , 2006, LREC.