Globally Normalized Transition-Based Neural Networks

We introduce a globally normalized transition-based neural network model that achieves state-of-the-art part-of-speech tagging, dependency parsing and sentence compression results. Our model is a simple feed-forward neural network that operates on a task-specific transition system, yet achieves comparable or better accuracies than recurrent models. We discuss the importance of global as opposed to local normalization: a key insight is that the label bias problem implies that globally normalized models can be strictly more expressive than locally normalized models.

[1]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Dan Klein,et al.  Neural CRF Parsing , 2015, ACL.

[4]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[5]  Thierry Artières,et al.  Neural conditional random fields , 2010, AISTATS.

[6]  Jian Peng,et al.  Conditional Neural Fields , 2009, NIPS.

[7]  Noah A. Smith,et al.  Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers , 2013, ACL.

[8]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[9]  James Henderson Inducing History Representations for Broad Coverage Statistical Parsing , 2003, HLT-NAACL.

[10]  Joakim Nivre,et al.  Non-Projective Dependency Parsing in Expected Linear Time , 2009, ACL.

[11]  Taro Watanabe,et al.  Transition-based Neural Constituent Parsing , 2015, ACL.

[12]  Joakim Nivre,et al.  Training Deterministic Parsers with Non-Deterministic Oracles , 2013, TACL.

[13]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[14]  Joakim Nivre,et al.  A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing , 2012, EMNLP.

[15]  Wang Ling,et al.  Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[16]  Slav Petrov,et al.  Overview of the 2012 Shared Task on Parsing the Web , 2012 .

[17]  Yoshua Bengio,et al.  Global training of document processing systems using graph transformer networks , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19]  John Thickstun,et al.  CONDITIONAL RANDOM FIELDS , 2016 .

[20]  Slav Petrov,et al.  Improved Transition-Based Parsing and Tagging with Neural Networks , 2015, EMNLP.

[21]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[22]  Richard Johansson,et al.  The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages , 2009, CoNLL Shared Task.

[23]  Regina Barzilay,et al.  Low-Rank Tensors for Scoring Dependency Structures , 2014, ACL.

[24]  Noah A. Smith,et al.  Improved Transition-based Parsing by Modeling Characters instead of Words with LSTMs , 2015, EMNLP.

[25]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[26]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[27]  Slav Petrov,et al.  Structured Training for Neural Network Transition-Based Parsing , 2015, ACL.

[28]  Yue Zhang,et al.  A Neural Probabilistic Structured-Prediction Model for Transition-Based Dependency Parsing , 2015, ACL.

[29]  James Henderson,et al.  Discriminative Training of a Neural Network Statistical Parser , 2004, ACL.

[30]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[31]  Fernando Pereira,et al.  Relating Probabilistic Grammars and Automata , 1999, ACL.

[32]  James Henderson,et al.  Incremental Recurrent Neural Network Dependency Parser with Search-based Discriminative Training , 2015, CoNLL.

[33]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[35]  Hao Zhang,et al.  Enforcing Structural Diversity in Cube-pruned Dependency Parsing , 2014, ACL.

[36]  Yann LeCun,et al.  Graph transformer networks for image recognition , 2005 .

[37]  Dan Klein,et al.  Structure compilation: trading structure for features , 2008, ICML '08.

[38]  Geoffrey J. Gordon,et al.  No-Regret Reductions for Imitation Learning and Structured Prediction , 2010, ArXiv.

[39]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[40]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[41]  Josef van Genabith,et al.  QuestionBank: Creating a Corpus of Parse-Annotated Questions , 2006, ACL.

[42]  Noah A. Smith,et al.  Weighted and Probabilistic Context-Free Grammars Are Equally Expressive , 2007, CL.

[43]  Joakim Nivre,et al.  Inductive Dependency Parsing , 2006, Text, speech and language technology.

[44]  Geoffrey Zweig,et al.  Recurrent conditional random field for language understanding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[45]  Ashish Vaswani,et al.  Efficient Structured Inference for Transition-Based Parsing with Neural Networks and Error States , 2016, Transactions of the Association for Computational Linguistics.

[46]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[47]  Lukasz Kaiser,et al.  Sentence Compression by Deletion with LSTMs , 2015, EMNLP.

[48]  Wei Xu,et al.  End-to-end learning of semantic role labeling using recurrent neural networks , 2015, ACL.

[49]  Zhiyi Chi,et al.  Statistical Properties of Probabilistic Context-Free Grammars , 1999, CL.