Treebanks and Mild Context-Sensitivity

Sometreebanks, such as German TIGER/NeGra, represent discontinuous elements directly, i.e. trees contain crossing edges, but the context-free grammars that are extracted from them, fail to make any use of this information. In this paper, we present amethod for extracting mildly context-sensitive grammars, i.e. simple range concatenation grammars (RCGs), from such treebanks. A measure for the degree of a treebank’s mild contextsensitivity is presented and compared to similar measures used in non-projective dependency parsing. Our work is also compared to discontinuous phrase structure grammar (DPSG).

[1]  Laura Kallmeyer,et al.  TuLiPA: A syntax-semantics parsing environment for mildly context-sensitive formalisms , 2008, TAG.

[2]  Pierre Boullier,et al.  Range Concatenation Grammars , 2000, IWPT.

[3]  M. T. Lino,et al.  Proceedings of the 4th International Conference on Language Resources and Evaluation , 2004 .

[4]  Joakim Nivre Constraints on Non-Projective Dependency Parsing , 2006, EACL.

[5]  David J. Weir,et al.  Characterizing mildly context-sensitive grammar formalisms , 1988 .

[6]  Adriane Boyd,et al.  Discontinuity Revisited: An Improved Conversion to Context-Free Representations , 2007, LAW@ACL.

[7]  Matthias Buch-Kromann Computing Translation Units and Quantifying Parallelism in Parallel Dependency Treebanks , 2007, LAW@ACL.

[8]  Pierre Boullier A Proposal for a Natural Lan-guage Processing Syntactic Backbone , 1997 .

[9]  Tadao Kasami,et al.  RNA Pseudoknotted Structure Prediction Using Stochastic Multiple Context-Free Grammar , 2006 .

[10]  Mirella Lapata,et al.  Optimal Constituent Alignment with Edge Covers for Semantic Projection , 2006, ACL.

[11]  Bernard Lang,et al.  The Structure of Shared Forests in Ambiguous Parsing , 1989, ACL.

[12]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[13]  Anders Søgaard Range Concatenation Grammars for Translation , 2008, COLING.

[14]  Joakim Nivre,et al.  Mildly Non-Projective Dependency Structures , 2006, ACL.

[15]  Håkan Burden,et al.  Parsing Linear Context-Free Rewriting Systems , 2005, IWPT.

[16]  Harry Bunt,et al.  Discontinuous Constituents in Trees, Rules, and Parsing , 1987, EACL.

[17]  Erhard W. Hinrichs,et al.  Is it Really that Difficult to Parse German? , 2006, EMNLP.

[18]  Wolfgang Menzel,et al.  Automatic Transformation of Phrase Treebanks to Dependency Trees , 2004, LREC.

[19]  Montserrat Civit Torruella,et al.  Design Principles for a Spanish Treebank , 2002 .

[20]  Tadao Kasami,et al.  On Multiple Context-Free Grammars , 1991, Theor. Comput. Sci..

[21]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[22]  Éric Villemonte de la Clergerie Parsing Mildly Context-Sensitive Languages with Thread Automata , 2002, COLING.