Efficient parsing with Linear Context-Free Rewriting Systems

Previous work on treebank parsing with discontinuous constituents using Linear Context-Free Rewriting systems (LCFRS) has been limited to sentences of up to 30 words, for reasons of computational complexity. There have been some results on binarizing an LCFRS in a manner that minimizes parsing complexity, but the present work shows that parsing long sentences with such an optimally binarized grammar remains infeasible. Instead, we introduce a technique which removes this length restriction, while maintaining a respectable accuracy. The resulting parser has been applied to a discontinuous treebank with favorable results.

[1]  Aravind K. Joshi,et al.  Unavoidable Ill-nestedness in Natural Language and the Adequacy of Tree Local-MCTAG Induced Dependency Structures , 2010, TAG.

[2]  Wolfgang Maier,et al.  Direct Parsing of Discontinuous Constituents in German , 2010, SPMRL@NAACL-HLT.

[3]  Oliver Plaehn,et al.  Computing the Most Probable Parse for a Discontinuous Phrase Structure Grammar , 2000, IWPT.

[4]  Laura Kallmeyer,et al.  Data-Driven Parsing with Probabilistic Linear Context-Free Rewriting Systems , 2010, COLING.

[5]  Frank Keller,et al.  Probabilistic Parsing for German Using Sister-Head Dependencies , 2003, ACL.

[6]  Joakim Nivre,et al.  Parsing Discontinuous Phrase Structure with Grammatical Functions , 2008, GoTAL.

[7]  Laura Kallmeyer,et al.  PLCFRS Parsing of English Discontinuous Constituents , 2011, IWPT.

[8]  Amit Dubey,et al.  Parsing german with sister-head dependencies , 2003, Annual Meeting of the Association for Computational Linguistics.

[9]  Giorgio Satta,et al.  Efficient Parsing of Well-Nested Linear Context-Free Rewriting Systems , 2010, HLT-NAACL.

[10]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[11]  Daniel Gildea,et al.  Optimal Parsing Strategies for Linear Context-Free Rewriting Systems , 2010, NAACL.

[12]  David Ellis,et al.  Multilevel Coarse-to-Fine PCFG Parsing , 2006, NAACL.

[13]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[14]  Pierre Boullier A Proposal for a Natural Lan-guage Processing Syntactic Backbone , 1997 .

[15]  Federico Sangati,et al.  Discontinuous Data-Oriented Parsing: A mildly context-sensitive all-fragments grammar , 2011, SPMRL@IWPT.

[16]  Éric Villemonte de la Clergerie,et al.  Guided Parsing of Range Concatenation Languages , 2001, ACL.

[17]  Stuart M. Shieber,et al.  Evidence against the context-freeness of natural language , 1985 .

[18]  David J. Weir,et al.  Characterizing Structural Descriptions Produced by Various Grammatical Formalisms , 1987, ACL.

[19]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[20]  Laura Kallmeyer,et al.  TuLiPA: Towards a Multi-Formalism Parsing Environment for Grammar Engineering , 2008, COLING 2008.

[21]  Laura Kallmeyer,et al.  Parsing Beyond Context-Free Grammars , 2010, Cognitive Technologies.

[22]  Giorgio Satta,et al.  Treebank Grammar Techniques for Non-Projective Dependency Parsing , 2009, EACL.

[23]  Giorgio Satta,et al.  Optimal Head-Driven Parsing Complexity for Linear Context-Free Rewriting Systems , 2011, ACL.

[24]  David J. Weir,et al.  The equivalence of four extensions of context-free grammars , 1994, Mathematical systems theory.

[25]  J. McCawley Parentheticals and discontinuous constituent structure , 1982 .

[26]  Giorgio Satta,et al.  Optimal Reduction of Rule Length in Linear Context-Free Rewriting Systems , 2009, NAACL.

[27]  David J. Weir,et al.  Characterizing mildly context-sensitive grammar formalisms , 1988 .

[28]  Adriane Boyd,et al.  Discontinuity Revisited: An Improved Conversion to Context-Free Representations , 2007, LAW@ACL.

[29]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[30]  Chris Fox,et al.  Proceedings of Formal Grammar 2003 , 2003 .

[31]  Christopher D. Manning,et al.  Probabilistic models of word order and syntactic discontinuity , 2005 .

[32]  Sabine Brants,et al.  The TIGER Treebank , 2001 .

[33]  Wolfgangmaier Andanderssøgaard,et al.  Treebanks and Mild Context-Sensitivity , 2008 .

[34]  Timm Lichte,et al.  Characterizing Discontinuity in Constituent Treebanks , 2009, FG.