Training with Exploration Improves a Greedy Stack LSTM Parser

We adapt the greedy Stack-LSTM dependency parser of Dyer et al. (2015) to support a training-with-exploration procedure using dynamic oracles(Goldberg and Nivre, 2013) instead of cross-entropy minimization. This form of training, which accounts for model predictions at training time rather than assuming an error-free action history, improves parsing accuracies for both English and Chinese, obtaining very strong results for both languages. We discuss some modifications needed in order to get training with exploration to work well for a probabilistic neural-network.

[1]  Eric P. Xing,et al.  Polyhedral outer approximations with application to natural language parsing , 2009, ICML '09.

[2]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[3]  Daniel Jurafsky,et al.  A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005 , 2005, IJCNLP.

[4]  Ashish Vaswani,et al.  Efficient Structured Inference for Transition-Based Parsing with Neural Networks and Error States , 2016, Transactions of the Association for Computational Linguistics.

[5]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[6]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[7]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[8]  Joakim Nivre,et al.  Pseudo-Projective Dependency Parsing , 2005, ACL.

[9]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[10]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[11]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[12]  Joakim Nivre,et al.  A Dynamic Oracle for Arc-Eager Dependency Parsing , 2012, COLING.

[13]  John Langford,et al.  Learning to Search Better than Your Teacher , 2015, ICML.

[14]  Mark Johnson,et al.  Lexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training , 2000, ACL.

[15]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[16]  James Henderson,et al.  Incremental Recurrent Neural Network Dependency Parser with Search-based Discriminative Training , 2015, CoNLL.

[17]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[18]  Slav Petrov,et al.  Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[19]  Giorgio Satta,et al.  Dynamic Programming Algorithms for Transition-Based Dependency Parsers , 2011, ACL.

[20]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[21]  Gülsen Eryigit,et al.  Transition-based Dependency DAG Parsing Using Dynamic Oracles , 2015, ACL.

[22]  Carlos Gómez-Rodríguez,et al.  An Efficient Dynamic Oracle for Unrestricted Non-Projective Parsing , 2015, ACL.

[23]  Joakim Nivre,et al.  Algorithms for Deterministic Incremental Dependency Parsing , 2008, CL.

[24]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[25]  Joakim Nivre,et al.  An Efficient Algorithm for Projective Dependency Parsing , 2003, IWPT.

[26]  Mark Johnson,et al.  Joint Incremental Disfluency Detection and Dependency Parsing , 2014, TACL.

[27]  Joakim Nivre,et al.  Incrementality in Deterministic Dependency Parsing , 2004 .

[28]  Yang Liu,et al.  Minimum Risk Training for Neural Machine Translation , 2015, ACL.

[29]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[30]  Fernando Pereira,et al.  Structured Learning with Approximate Inference , 2007, NIPS.

[31]  Giorgio Satta,et al.  A Tabular Method for Dynamic Oracles in Transition-Based Parsing , 2014, TACL.

[32]  Joakim Nivre,et al.  Training Deterministic Parsers with Non-Deterministic Oracles , 2013, TACL.

[33]  Yoav Goldberg,et al.  Dynamic-oracle Transition-based Parsing with Calibrated Probabilistic Output , 2013, IWPT.

[34]  Stephen Clark,et al.  A Tale of Two Parsers: Investigating and Combining Graph-based and Transition-based Dependency Parsing , 2008, EMNLP.

[35]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[36]  Mark Johnson,et al.  A Non-Monotonic Arc-Eager Transition System for Dependency Parsing , 2013, CoNLL.

[37]  Thorsten Joachims,et al.  Training structural SVMs when exact inference is intractable , 2008, ICML '08.

[38]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[39]  Joakim Nivre,et al.  Non-Deterministic Oracles for Unrestricted Non-Projective Transition-Based Dependency Parsing , 2015, IWPT.

[40]  He He,et al.  Imitation Learning by Coaching , 2012, NIPS.

[41]  Andreas Vlachos,et al.  An investigation of imitation learning algorithms for structured prediction , 2012, EWRL.

[42]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[43]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[44]  Richard Johansson,et al.  The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages , 2009, CoNLL Shared Task.

[45]  Giorgio Satta,et al.  A Polynomial-Time Dynamic Oracle for Non-Projective Dependency Parsing , 2014, EMNLP.