Generic refinement of expressive grammar formalisms with an application to discontinuous constituent parsing

We formulate a generalization of Petrov et al. (2006)’s split/merge algorithm for interpreted regular tree grammars (Koller and Kuhlmann, 2011), which capture a large class of grammar formalisms. We evaluate its effectiveness empirically on the task of discontinuous constituent parsing with two mildly context-sensitive grammar formalisms: linear context-free rewriting systems (Vijay-Shanker et al., 1987) as well as hybrid grammars (Nederhof and Vogler, 2014).

[1]  Xiaochang Peng,et al.  A Synchronous Hyperedge Replacement Grammar based approach for AMR parsing , 2015, CoNLL.

[2]  Mark Johnson,et al.  Efficient techniques for parsing with tree automata , 2016, ACL.

[3]  Wolfgang Maier,et al.  Discontinuous Incremental Shift-reduce Parsing , 2015, ACL.

[4]  Joseph Le Roux,et al.  Efficient Discontinuous Phrase-Structure Parsing via the Generalized Maximum Spanning Arborescence , 2017, EMNLP.

[5]  Mark-Jan Nederhof,et al.  Squibs and Discussions: Weighted Deductive Parsing and Knuth’s Algorithm , 2003, CL.

[6]  Donald E. Knuth,et al.  A Generalization of Dijkstra's Algorithm , 1977, Inf. Process. Lett..

[7]  Walter S. Brainerd,et al.  Tree Generating Regular Systems , 1969, Inf. Control..

[8]  David Chiang,et al.  Better k-best Parsing , 2005, IWPT.

[9]  Heiko Vogler,et al.  Hybrid Grammars for Discontinuous Parsing , 2014, COLING.

[10]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[11]  Alexander Koller,et al.  A Generalized View on Parsing and Translation , 2011, IWPT.

[12]  Wolfgang Lezius,et al.  TIGER: Linguistic Interpretation of a German Corpus , 2004 .

[13]  Reut Tsarfaty,et al.  Introducing the SPMRL 2014 Shared Task on Parsing Morphologically-rich Languages , 2014 .

[14]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[15]  Laura Kallmeyer,et al.  Data-Driven Parsing with Probabilistic Linear Context-Free Rewriting Systems , 2010, COLING.

[16]  Francis Ferraro,et al.  Toward Tree Substitution Grammars with Latent Annotations , 2012, HLT-NAACL 2012.

[17]  Giorgio Satta,et al.  Kullback-Leibler Distance between Probabilistic Context-Free Grammars and Probabilistic Finite Automata , 2004, COLING.

[18]  André F. T. Martins,et al.  Parsing as Reduction , 2015, ACL.

[19]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[20]  Christoph Teichmann,et al.  Adaptive Importance Sampling from Finite State Automata , 2016 .

[21]  Heiko Vogler,et al.  General binarization for parsing and translation , 2013, ACL.

[22]  Pierre Nugues,et al.  A High-Performance Syntactic and Semantic Dependency Parser , 2010, COLING.

[23]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[24]  Christoph Teichmann,et al.  Coarse-To-Fine Parsing for Expressive Grammar Formalisms , 2017, IWPT.

[25]  Jason Eisner,et al.  Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing , 2017, TACL.

[26]  Laura Kallmeyer,et al.  PLCFRS Parsing of English Discontinuous Constituents , 2011, IWPT.

[27]  Timm Lichte,et al.  Discontinuous parsing with continuous trees , 2016 .

[28]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[29]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[30]  Heiko Vogler,et al.  Hybrid Grammars for Parsing of Discontinuous Phrase Structures and Non-Projective Dependency Structures , 2017, Computational Linguistics.

[31]  Joakim Nivre,et al.  Parsing Discontinuous Phrase Structure with Grammatical Functions , 2008, GoTAL.

[32]  Heiko Vogler,et al.  EM-Training for Weighted Aligned Hypergraph Bimorphisms , 2016, ACL 2016.

[33]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[34]  Jan Maluszynski,et al.  Relating Logic Programs and Attribute Grammars , 1985, J. Log. Program..

[35]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[36]  Milos Stanojevic,et al.  Neural Discontinuous Constituency Parsing , 2017, EMNLP.

[37]  David J. Weir,et al.  Characterizing Structural Descriptions Produced by Various Grammatical Formalisms , 1987, ACL.

[38]  Maximin Coavoux,et al.  Incremental Discontinuous Phrase Structure Parsing with the GAP Transition , 2017, EACL.

[39]  Daniel Götzmann Multiple Context-Free Grammars , 2007 .

[40]  Rens Bod,et al.  Data-Oriented Parsing with Discontinuous Constituents and Function Tags , 2016, J. Lang. Model..

[41]  Wolfgangmaier Andanderssøgaard,et al.  Treebanks and Mild Context-Sensitivity , 2008 .

[42]  Hiroyuki Shindo,et al.  Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing , 2012, ACL.

[43]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.