Incremental Recurrent Neural Network Dependency Parser with Search-based Discriminative Training

We propose a discriminatively trained recurrent neural network (RNN) that predicts the actions for a fast and accurate shift-reduce dependency parser. The RNN uses its output-dependent model structure to compute hidden vectors that encode the preceding partial parse, and uses them to estimate probabilities of parser actions. Unlike a similar previous generative model (Henderson and Titov, 2010), the RNN is trained discriminatively to optimize a fast beam search. This beam search prunes after each shift action, so we add a correctness probability to each shift action and train this score to discriminate between correct and incorrect sequences of parser actions. We also speed up parsing time by caching computations for frequent feature combinations, including during training, giving us both faster training and a form of backoff smoothing. The resulting parser is over 35 times faster than its generative counterpart with nearly the same accuracy, producing state-of-art dependency parsing results while requiring minimal feature engineering.

[1]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[2]  Ivan Titov,et al.  Multilingual Joint Parsing of Syntactic and Semantic Dependencies with a Latent Variable Model , 2013, CL.

[3]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[4]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[5]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[6]  Stephen Clark,et al.  Syntactic Processing Using the Generalized Perceptron and Beam Search , 2011, CL.

[7]  Richard Johansson,et al.  The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages , 2009, CoNLL Shared Task.

[8]  Ivan Titov,et al.  A Latent Variable Model of Synchronous Parsing for Syntactic and Semantic Dependencies , 2008, CoNLL.

[9]  Ivan Titov,et al.  A Latent Variable Model for Generative Dependency Parsing , 2007, Trends in Parsing Technology.

[10]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[11]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[12]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[13]  James Henderson,et al.  Discriminative Training of a Neural Network Statistical Parser , 2004, ACL.

[14]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[15]  Yang Guo,et al.  Structured Perceptron with Inexact Search , 2012, NAACL.

[16]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[17]  Ivan Titov,et al.  Fast and Robust Multilingual Dependency Parsing with a Generative Latent Variable Model , 2007, EMNLP.

[18]  Joakim Nivre,et al.  Memory-Based Dependency Parsing , 2004, CoNLL.

[19]  Mihai Surdeanu,et al.  Ensemble Models for Dependency Parsing: Cheap and Good? , 2010, HLT-NAACL.

[20]  Geoffrey Zweig,et al.  Recurrent neural networks for language understanding , 2013, INTERSPEECH.

[21]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[22]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[23]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[24]  Ivan Titov,et al.  Incremental Sigmoid Belief Networks for Grammar Learning , 2010, J. Mach. Learn. Res..

[25]  Richard M. Schwartz,et al.  Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[26]  Joakim Nivre,et al.  Labeled Pseudo-Projective Dependency Parsing with Support Vector Machines , 2006, CoNLL.

[27]  James Henderson Inducing History Representations for Broad Coverage Statistical Parsing , 2003, HLT-NAACL.

[28]  Ronan Collobert,et al.  Deep Learning for Efficient Discriminative Parsing , 2011, AISTATS.

[29]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..