Max-Margin Incremental CCG Parsing

Incremental syntactic parsing has been an active research area both for cognitive scientists trying to model human sentence processing and for NLP researchers attempting to combine incremental parsing with language modelling for ASR and MT. Most effort has been directed at designing the right transition mechanism, but less has been done to answer the question of what a probabilistic model for those transition parsers should look like. A very incremental transition mechanism of a recently proposed CCG parser when trained in straightforward locally normalised discriminative fashion produces very bad results on English CCGbank. We identify three biases as the causes of this problem: label bias, exposure bias and imbalanced probabilities bias. While known techniques for tackling these biases improve results, they still do not make the parser state of the art. Instead, we tackle all of these three biases at the same time using an improved version of beam search optimisation that minimises all beam search violations instead of minimising only the biggest violation. The new incremental parser gives better results than all previously published incremental CCG parsers, and outperforms even some widely used non-incremental CCG parsers.

[1]  Yue Zhang,et al.  A Neural Probabilistic Structured-Prediction Model for Transition-Based Dependency Parsing , 2015, ACL.

[2]  Vincenzo Lombardo,et al.  Processing Coordinated Structures: Incrementality and Connectedness , 2005, Cogn. Sci..

[3]  Martin Kay,et al.  Syntactic Process , 1979, ACL.

[4]  Ashish Vaswani,et al.  Efficient Structured Inference for Transition-Based Parsing with Neural Networks and Error States , 2016, Transactions of the Association for Computational Linguistics.

[5]  Mark Steedman,et al.  A* CCG Parsing with a Supertag-factored Model , 2014, EMNLP.

[6]  Fernando C. N. Pereira Natural language parsing: A new characterization of attachment preferences , 1985 .

[7]  Lyn Frazier,et al.  ON COMPREHENDING SENTENCES: SYNTACTIC PARSING STRATEGIES. , 1979 .

[8]  Edward P. Stabler,et al.  Avoid the Pedestrian’s Paradox , 1991 .

[9]  Mark Steedman,et al.  CCG Parsing Algorithm with Incremental Tree Rotation , 2019, NAACL-HLT.

[10]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[11]  Brian Roark,et al.  Efficient probabilistic top-down and left-corner parsing , 1999, ACL.

[12]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[13]  Alexander M. Rush,et al.  Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.

[14]  Slav Petrov,et al.  Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[15]  Mark Steedman,et al.  Interaction with context during human sentence processing , 1988, Cognition.

[16]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[17]  John Hale,et al.  A Probabilistic Earley Parser as a Psycholinguistic Model , 2001, NAACL.

[18]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[19]  Frank Keller,et al.  Incremental, Predictive Parsing with Psycholinguistically Motivated Tree-Adjoining Grammar , 2013, CL.

[20]  Mark Johnson,et al.  Memory requirements and local ambiguities of parsing strategies , 1991 .

[21]  Julia Hockenmaier,et al.  Data and models for statistical parsing with combinatory categorial grammar , 2003 .

[22]  Chris Dyer,et al.  An Empirical Investigation of Global and Local Normalization for Recurrent Neural Sequence Models Using a Continuous Relaxation to Beam Search , 2019, NAACL.

[23]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[24]  Stephen Clark,et al.  Neural Generative Rhetorical Structure Parsing , 2019, EMNLP.

[25]  Andreas Stolcke,et al.  An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities , 1994, CL.

[26]  John Hale,et al.  Automaton Theories of Human Sentence Comprehension , 2014 .

[27]  Mark Steedman,et al.  An Incremental Algorithm for Transition-based CCG Parsing , 2015, NAACL.

[28]  Kevin Duh,et al.  DyNet: The Dynamic Neural Network Toolkit , 2017, ArXiv.

[29]  Joshua Goodman,et al.  Parsing Algorithms and Metrics , 1996, ACL.

[30]  James Cross,et al.  Span-Based Constituency Parsing with a Structure-Label System and Provably Optimal Dynamic Oracles , 2016, EMNLP.

[31]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[32]  Brian Roark,et al.  Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[33]  Mark Steedman,et al.  Building Deep Dependency Structures using a Wide-Coverage CCG Parser , 2002, ACL.

[34]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[35]  Ivan Titov,et al.  Loss Minimization in Parse Reranking , 2006, EMNLP.

[36]  Bharat Ram Ambati Transition-based combinatory categorial grammar parsing for English and Hindi , 2016 .

[37]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[38]  Andy Way,et al.  A syntactic language model based on incremental CCG parsing , 2008, 2008 IEEE Spoken Language Technology Workshop.

[39]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[40]  Dan Klein,et al.  Improving Neural Parsing by Disentangling Model Combination and Reranking Effects , 2017, ACL.

[41]  Mark Steedman,et al.  On the order of words , 1982 .

[42]  Mark Steedman,et al.  Grammar, interpretation, and processing from the lexicon , 1989 .

[43]  Stephen Clark,et al.  Shift-Reduce CCG Parsing , 2011, ACL.

[44]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[45]  Philip Resnik,et al.  Left-Corner Parsing and Psychological Plausibility , 1992, COLING.

[46]  Shankar Kumar,et al.  Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2004, NAACL.

[47]  Joakim Nivre,et al.  A Dynamic Oracle for Arc-Eager Dependency Parsing , 2012, COLING.

[48]  Benoît Crabbé,et al.  Variable beam search for generative neural parsing and its relevance for the analysis of neuro-imaging signal , 2019, EMNLP/IJCNLP.

[49]  Dan Klein,et al.  Effective Inference for Generative Neural Parsing , 2017, EMNLP.

[50]  WILLIAM MARSLEN-WILSON,et al.  Linguistic Structure and Speech Shadowing at Very Short Latencies , 1973, Nature.

[51]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[52]  Khalil Sima'an,et al.  Reordering Grammar Induction , 2015, EMNLP.

[53]  Edward P. Stabler,et al.  A Sound and Complete Left-Corner Parsing for Minimalist Grammars , 2018 .

[54]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[55]  Jürgen Schmidhuber,et al.  Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition , 2005, ICANN.

[56]  Dan Klein,et al.  Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing , 2018, ACL.

[57]  F. Cuetos,et al.  Cross-linguistic differences in parsing: Restrictions on the use of the Late Closure strategy in Spanish , 1988, Cognition.

[58]  John Hale,et al.  Finding syntax in human encephalography with beam search , 2018, ACL.