Hybrid Grammars for Parsing of Discontinuous Phrase Structures and Non-Projective Dependency Structures

We explore the concept of hybrid grammars, which formalize and generalize a range of existing frameworks for dealing with discontinuous syntactic structures. Covered are both discontinuous phrase structures and non-projective dependency structures. Technically, hybrid grammars are related to synchronous grammars, where one grammar component generates linear structures and another generates hierarchical structures. By coupling lexical elements of both components together, discontinuous structures result. Several types of hybrid grammars are characterized. We also discuss grammar induction from treebanks. The main advantage over existing frameworks is the ability of hybrid grammars to separate discontinuity of the desired structures from time complexity of parsing. This permits exploration of a large variety of parsing algorithms for discontinuous structures, with different properties. This is confirmed by the reported experimental results, which show a wide variety of running time, accuracy, and frequency of parse failures.

[1]  Noam Chomsky,et al.  Lectures on Government and Binding , 1981 .

[2]  Stuart M. Shieber,et al.  Evidence against the context-freeness of natural language , 1985 .

[3]  David J. Weir,et al.  Characterizing Structural Descriptions Produced by Various Grammatical Formalisms , 1987, ACL.

[4]  Giorgio Satta,et al.  An Optimal-Time Binarization Algorithm for Linear Context-Free Rewriting Systems with Fan-Out Two , 2009, ACL/IJCNLP.

[5]  Fernando Pereira,et al.  Online Learning of Approximate Dependency Parsing Algorithms , 2006, EACL.

[6]  Manuel Bodirsky,et al.  Well-Nested Drawings as Models of Syntactic Structure ? , 2005 .

[7]  Ingrid Fischer,et al.  Parsing String Generating Hypergraph Grammars , 2004, ICGT.

[8]  Richard Campbell,et al.  Using Linguistic Principles to Recover Empty Categories , 2004, ACL.

[9]  Joachim Niehren,et al.  Logics and Automata for Totally Ordered Trees , 2008, RTA.

[10]  Stanley Peters,et al.  Cross-Serial Dependencies in Dutch , 1982 .

[11]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[12]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[13]  Joakim Nivre,et al.  Statistical Parsing , 2010, Handbook of Natural Language Processing.

[14]  Stuart M. Shieber,et al.  Synchronous Tree-Adjoining Grammars , 1990, COLING.

[15]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[16]  Hwee Tou Ng,et al.  A Generative Model for Parsing Natural Language to Meaning Representations , 2008, EMNLP.

[17]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[18]  Laura Kallmeyer,et al.  Discontinuity and Non-Projectivity: Using Mildly Context-Sensitive Formalisms for Data-Driven Parsing , 2010, TAG.

[19]  Mark Johnson,et al.  A Simple Pattern-matching Algorithm for Recovering Empty Nodes and their Antecedents , 2002, ACL.

[20]  Marco Kuhlmann,et al.  Mildly Non-Projective Dependency Grammar , 2013, CL.

[21]  Joakim Nivre,et al.  An Improved Oracle for Dependency Parsing with Online Reordering , 2009, IWPT.

[22]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[23]  Walt Detmar Meurers,et al.  Towards a platform for linearization grammars , 2001 .

[24]  K. Vijay-Shankar,et al.  SOME COMPUTATIONAL PROPERTIES OF TREE ADJOINING GRAMMERS , 1985, ACL 1985.

[25]  Aravind K. Joshi,et al.  A Formal Look at Dependency Grammars and Phrase-Structure Grammars, with Special Consideration of Word-Order Phenomena , 1994, ArXiv.

[26]  Walter S. Brainerd,et al.  Tree Generating Regular Systems , 1969, Inf. Control..

[27]  Eugene Charniak,et al.  Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.

[28]  Anna Freud,et al.  Grammatical Framework Programming With Multilingual Grammars , 2016 .

[29]  Laura Kallmeyer,et al.  Parsing Beyond Context-Free Grammars , 2010, Cognitive Technologies.

[30]  Peter Ljunglöf,et al.  Fast Statistical Parsing with Parallel Multiple Context-Free Grammars , 2014, EACL.

[31]  Giorgio Satta,et al.  Treebank Grammar Techniques for Non-Projective Dependency Parsing , 2009, EACL.

[32]  Kilian Gebhardt,et al.  A Direct Link between Tree-Adjoining and Context-Free Tree Grammars , 2015, FSMNLP.

[33]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[34]  Kyle Gorman,et al.  Pynini: A Python library for weighted finite-state grammar compilation , 2016 .

[35]  Wolfgang Lezius,et al.  TIGER: Linguistic Interpretation of a German Corpus , 2004 .

[36]  Joakim Nivre,et al.  Dependency Parsing , 2009, Lang. Linguistics Compass.

[37]  Joost Engelfriet,et al.  The formal power of one-visit attribute grammars , 1981, Acta Informatica.

[38]  Laura Kallmeyer,et al.  PLCFRS Parsing of English Discontinuous Constituents , 2011, IWPT.

[39]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[40]  Joakim Nivre,et al.  Pseudo-Projective Dependency Parsing , 2005, ACL.

[41]  Joost Engelfriet,et al.  IO and OI. I , 1977, J. Comput. Syst. Sci..

[42]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[43]  Giorgio Satta,et al.  Efficient Parsing for Head-Split Dependency Trees , 2013, Transactions of the Association for Computational Linguistics.

[44]  Laura Kallmeyer,et al.  Data-Driven Parsing with Probabilistic Linear Context-Free Rewriting Systems , 2010, COLING.

[45]  Donald E. Knuth,et al.  Semantics of context-free languages , 1968, Mathematical systems theory.

[46]  Makoto Kanazawa The Pumping Lemma for Well-Nested Multiple Context-Free Languages , 2009, Developments in Language Theory.

[47]  Walt Detmar Meurers,et al.  Improving the Eciency of Parsing with Discontinuous Constituents , 2002 .

[48]  Alexis Nasr,et al.  Pseudo-Projectivity, A Polynomially Parsable Non-Projective Dependency Grammar , 1998, ACL.

[49]  Uwe Mönnich,et al.  Well-Nested Tree Languages and Attributed Tree Transducers , 2010, TAG.

[50]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[51]  William C. Rounds,et al.  Mappings and grammars on trees , 1970, Mathematical systems theory.

[52]  Joakim Nivre,et al.  Non-Projective Dependency Parsing in Expected Linear Time , 2009, ACL.

[53]  Hiroyuki Seki,et al.  On the Generative Power of Multiple Context-Free Grammars and Macro Grammars , 2008, IEICE Trans. Inf. Syst..

[54]  Jukka Paakki,et al.  Attribute grammar paradigms—a high-level methodology in language implementation , 1995, CSUR.

[55]  Rainer Parchmann,et al.  IO-Macrolanguages and Attributed Translations , 1977, Inf. Control..

[56]  Ido Dagan,et al.  Synthesis Lectures on Human Language Technologies , 2009 .

[57]  Aravind K. Joshi,et al.  Some Computational Properties of Tree Adjoining Grammars , 1985, Annual Meeting of the Association for Computational Linguistics.

[58]  Wolfgangmaier Andanderssøgaard,et al.  Treebanks and Mild Context-Sensitivity , 2008 .

[59]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[60]  Matt Post,et al.  Syntax-based language models for statistical machine translation , 2010 .

[61]  Timm Lichte,et al.  Characterizing Discontinuity in Constituent Treebanks , 2009, FG.

[62]  Joost Engelfriet,et al.  The Equivalence of Bottom-Up and Top-Down Tree-to-Graph Transducers , 1998, J. Comput. Syst. Sci..

[63]  Johan Schalkwyk,et al.  OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.

[64]  Phil Blunsom,et al.  Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing , 2010, EMNLP.

[65]  James Henderson,et al.  Discriminative Training of a Neural Network Statistical Parser , 2004, ACL.

[66]  Dan Klein,et al.  A* Parsing: Fast Exact Viterbi Parse Selection , 2003, NAACL.

[67]  Bruno Courcelle,et al.  Attribute Grammars and Recursive Program Schemes I , 1982, Theoretical Computer Science.

[68]  Joost Engelfriet,et al.  Strong Lexicalization of Tree Adjoining Grammars , 2012, ACL.

[69]  Aravind K. Joshi,et al.  Long-Distance Scrambling and Tree Adjoining Grammars , 1991, EACL.

[70]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[71]  Heiko Vogler,et al.  Hybrid Grammars for Discontinuous Parsing , 2014, COLING.

[72]  Andreas Kathol,et al.  Extraposition via Complex Domain Formation , 1995, ACL.

[73]  Philipp Koehn,et al.  Synthesis Lectures on Human Language Technologies , 2016 .

[74]  Kenji Yamada,et al.  Syntax-based language models for statistical machine translation , 2003, ACL 2003.

[75]  Gregor von Bochmann,et al.  Semantic evaluation from left to right , 1976, CACM.

[76]  Makoto Kanazawa,et al.  The Copying Power of Well-Nested Multiple Context-Free Grammars , 2010, LATA.

[77]  Giorgio Satta,et al.  Efficient Parsing of Well-Nested Linear Context-Free Rewriting Systems , 2010, HLT-NAACL.

[78]  Bruno Courcelle,et al.  Attribute Grammars and Recursive Program Schemes II , 1982, Theor. Comput. Sci..

[79]  Mike Reape,et al.  A logical treatment of semi-free word order and bounded discontinuous constituency , 1989, EACL.

[80]  Owen Rambow The Simple Truth about Dependency and Phrase Structure Representations: An Opinion Piece , 2010, HLT-NAACL.

[81]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[82]  Ferenc Gécseg,et al.  Tree Languages , 1997, Handbook of Formal Languages.

[83]  Giorgio Satta,et al.  Some Computational Complexity Results for Synchronous Context-Free Grammars , 2005, HLT/EMNLP.

[84]  Jan Hajic,et al.  The Prague Dependency Treebank , 2003 .

[85]  Laura Kallmeyer,et al.  A Formal Model for Plausible Dependencies in Lexicalized Tree Adjoining Grammar , 2012, TAG.

[86]  Robert Giegerich,et al.  Composition and evaluation of attribute coupled grammars , 1988, Acta Informatica.

[87]  Pierre Deransart,et al.  Attribute Grammars: Definitions, Systems and Bibliography , 1988 .

[88]  Tadao Kasami,et al.  On Multiple Context-Free Grammars , 1991, Theor. Comput. Sci..

[89]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[90]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[91]  Khalil Sima'an,et al.  Efficient Disambiguation by means of Stochastic Tree Substitution Grammars , 1994 .

[92]  Zoltán Fülöp,et al.  A Characterization of Attributed Tree Transformations by a Subclass of Macro Tree Transducers , 1999, Theory of Computing Systems.

[93]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[94]  Seth Kulick,et al.  Fully Parsing the Penn Treebank , 2006, NAACL.

[95]  Walt Detmar Meurers,et al.  A Grammar Formalism and Parser for Linearization-based HPSG , 2004, COLING.

[96]  Michael J. Fischer,et al.  Grammars with Macro-Like Productions , 1968, SWAT.

[97]  Stephan Kepser,et al.  The Equivalence of Tree Adjoining Grammars and Monadic Linear Context-free Tree Grammars , 2011, J. Log. Lang. Inf..

[98]  Jan Maluszynski,et al.  Relating Logic Programs and Attribute Grammars , 1985, J. Log. Program..

[99]  J. McCawley Parentheticals and discontinuous constituent structure , 1982 .

[100]  Giorgio Satta,et al.  Optimal Reduction of Rule Length in Linear Context-Free Rewriting Systems , 2009, NAACL.

[101]  Adriane Boyd,et al.  Discontinuity Revisited: An Improved Conversion to Context-Free Representations , 2007, LAW@ACL.

[102]  Andreas van Cranenburgh Efficient parsing with Linear Context-Free Rewriting Systems , 2012, EACL.

[103]  Stefan Müller,et al.  Continuous or Discontinuous Constituents? A Comparison between Syntactic Analyses for Constituent Order and Their Processing Systems , 2004 .