Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations

We present a simple and effective scheme for dependency parsing which is based on bidirectional-LSTMs (BiLSTMs). Each sentence token is associated with a BiLSTM vector representing the token in its sentential context, and feature vectors are constructed by concatenating a few BiLSTM vectors. The BiLSTM is trained jointly with the parser objective, resulting in very effective feature extractors for parsing. We demonstrate the effectiveness of the approach by applying it to a greedy transition-based parser as well as to a globally optimized graph-based parser. The resulting parsers have very simple architectures, and match or surpass the state-of-the-art accuracies on English and Chinese.

[1]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[2]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[6]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[7]  Joakim Nivre,et al.  Incrementality in Deterministic Dependency Parsing , 2004 .

[8]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[9]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[10]  Fernando Pereira,et al.  Discriminative learning and spanning tree algorithms for dependency parsing , 2006 .

[11]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[12]  Joakim Nivre,et al.  Algorithms for Deterministic Incremental Dependency Parsing , 2008, CL.

[13]  Xavier Carreras,et al.  Simple Semi-supervised Dependency Parsing , 2008, ACL.

[14]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[15]  Stephen Clark,et al.  A Tale of Two Parsers: Investigating and Combining Graph-based and Transition-based Dependency Parsing , 2008, EMNLP.

[16]  Eric P. Xing,et al.  Concise Integer Linear Programming Formulations for Dependency Parsing , 2009, ACL.

[17]  Joakim Nivre,et al.  Dependency Parsing , 2009, Lang. Linguistics Compass.

[18]  Kenji Sagae,et al.  Dynamic Programming for Linear-Time Incremental Parsing , 2010, ACL.

[19]  Ivan Titov,et al.  A Latent Variable Model for Generative Dependency Parsing , 2007, Trends in Parsing Technology.

[20]  Michael Collins,et al.  Efficient Third-Order Dependency Parsers , 2010, ACL.

[21]  Giorgio Satta,et al.  Dynamic Programming Algorithms for Transition-Based Dependency Parsers , 2011, ACL.

[22]  Joakim Nivre,et al.  Transition-based Dependency Parsing with Rich Non-local Features , 2011, ACL.

[23]  Joakim Nivre,et al.  A Dynamic Oracle for Arc-Eager Dependency Parsing , 2012, COLING.

[24]  Noah A. Smith,et al.  Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers , 2013, ACL.

[25]  Joakim Nivre,et al.  Training Deterministic Parsers with Non-Deterministic Oracles , 2013, TACL.

[26]  Phong Le,et al.  The Inside-Outside Recursive Neural Network model for Dependency Parsing , 2014, EMNLP.

[27]  Claire Cardie,et al.  Opinion Mining with Deep Recurrent Neural Networks , 2014, EMNLP.

[28]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[29]  Regina Barzilay,et al.  Low-Rank Tensors for Scoring Dependency Structures , 2014, ACL.

[30]  Baobao Chang,et al.  An Effective Neural Network Model for Graph-based Dependency Parsing , 2015, ACL.

[31]  Slav Petrov,et al.  Structured Training for Neural Network Transition-Based Parsing , 2015, ACL.

[32]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[33]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[34]  Kyunghyun Cho,et al.  Natural Language Understanding with Distributed Representation , 2015, ArXiv.

[35]  Xuanjing Huang,et al.  A Re-ranking Model for Dependency Parser with Recursive Convolutional Neural Network , 2015, ACL.

[36]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[38]  Hal Daumé,et al.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.

[39]  Amir Globerson,et al.  Template Kernels for Dependency Parsing , 2015, HLT-NAACL.

[40]  Ashish Vaswani,et al.  Supertagging With LSTMs , 2016, NAACL.

[41]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[42]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[43]  James Cross,et al.  Incremental Parsing with Minimal Features Using Bi-Directional LSTM , 2016, ACL.

[44]  Luke S. Zettlemoyer,et al.  LSTM CCG Parsing , 2016, NAACL.

[45]  Yoav Goldberg,et al.  A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[46]  Noah A. Smith,et al.  Training with Exploration Improves a Greedy Stack LSTM Parser , 2016, EMNLP.

[47]  Eliyahu Kiperwasser,et al.  Easy-First Dependency Parsing with Hierarchical Tree LSTMs , 2016, TACL.

[48]  Philipp Koehn,et al.  Synthesis Lectures on Human Language Technologies , 2016 .

[49]  M. Blanchette Dynamic programming algorithms , 2017 .