论文信息 - Globally Normalized Transition-Based Neural Networks

Globally Normalized Transition-Based Neural Networks

We introduce a globally normalized transition-based neural network model that achieves state-of-the-art part-of-speech tagging, dependency parsing and sentence compression results. Our model is a simple feed-forward neural network that operates on a task-specific transition system, yet achieves comparable or better accuracies than recurrent models. We discuss the importance of global as opposed to local normalization: a key insight is that the label bias problem implies that globally normalized models can be strictly more expressive than locally normalized models.

[1] John Langford,et al. Search-based structured prediction , 2009, Machine Learning.

[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[3] Dan Klein,et al. Neural CRF Parsing , 2015, ACL.

[4] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[5] Thierry Artières,et al. Neural conditional random fields , 2010, AISTATS.

[6] Jian Peng,et al. Conditional Neural Fields , 2009, NIPS.

[7] Noah A. Smith,et al. Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers , 2013, ACL.

[8] Geoffrey E. Hinton,et al. Grammar as a Foreign Language , 2014, NIPS.

[9] James Henderson. Inducing History Representations for Broad Coverage Statistical Parsing , 2003, HLT-NAACL.

[10] Joakim Nivre,et al. Non-Projective Dependency Parsing in Expected Linear Time , 2009, ACL.

[11] Taro Watanabe,et al. Transition-based Neural Constituent Parsing , 2015, ACL.

[12] Joakim Nivre,et al. Training Deterministic Parsers with Non-Deterministic Oracles , 2013, TACL.

[13] Brian Roark,et al. Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[14] Joakim Nivre,et al. A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing , 2012, EMNLP.

[15] Wang Ling,et al. Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[16] Slav Petrov,et al. Overview of the 2012 Shared Task on Parsing the Web , 2012 .

[17] Yoshua Bengio,et al. Global training of document processing systems using graph transformer networks , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19] John Thickstun,et al. CONDITIONAL RANDOM FIELDS , 2016 .

[20] Slav Petrov,et al. Improved Transition-Based Parsing and Tagging with Neural Networks , 2015, EMNLP.

[21] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[22] Richard Johansson,et al. The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages , 2009, CoNLL Shared Task.

[23] Regina Barzilay,et al. Low-Rank Tensors for Scoring Dependency Structures , 2014, ACL.

[24] Noah A. Smith,et al. Improved Transition-based Parsing by Modeling Characters instead of Words with LSTMs , 2015, EMNLP.

[25] Wei Xu,et al. Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[26] Christopher D. Manning,et al. Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[27] Slav Petrov,et al. Structured Training for Neural Network Transition-Based Parsing , 2015, ACL.

[28] Yue Zhang,et al. A Neural Probabilistic Structured-Prediction Model for Transition-Based Dependency Parsing , 2015, ACL.

[29] James Henderson,et al. Discriminative Training of a Neural Network Statistical Parser , 2004, ACL.

[30] Mitchell P. Marcus,et al. OntoNotes: The 90% Solution , 2006, NAACL.

[31] Fernando Pereira,et al. Relating Probabilistic Grammars and Automata , 1999, ACL.

[32] James Henderson,et al. Incremental Recurrent Neural Network Dependency Parser with Search-based Discriminative Training , 2015, CoNLL.

[33] Vibhav Vineet,et al. Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[35] Hao Zhang,et al. Enforcing Structural Diversity in Cube-pruned Dependency Parsing , 2014, ACL.

[36] Yann LeCun,et al. Graph transformer networks for image recognition , 2005 .

[37] Dan Klein,et al. Structure compilation: trading structure for features , 2008, ICML '08.

[38] Geoffrey J. Gordon,et al. No-Regret Reductions for Imitation Learning and Structured Prediction , 2010, ArXiv.

[39] Michael Collins,et al. Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[40] Noah A. Smith,et al. Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[41] Josef van Genabith,et al. QuestionBank: Creating a Corpus of Parse-Annotated Questions , 2006, ACL.

[42] Noah A. Smith,et al. Weighted and Probabilistic Context-Free Grammars Are Equally Expressive , 2007, CL.

[43] Joakim Nivre,et al. Inductive Dependency Parsing , 2006, Text, speech and language technology.

[44] Geoffrey Zweig,et al. Recurrent conditional random field for language understanding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[45] Ashish Vaswani,et al. Efficient Structured Inference for Transition-Based Parsing with Neural Networks and Error States , 2016, Transactions of the Association for Computational Linguistics.

[46] Danqi Chen,et al. A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[47] Lukasz Kaiser,et al. Sentence Compression by Deletion with LSTMs , 2015, EMNLP.

[48] Wei Xu,et al. End-to-end learning of semantic role labeling using recurrent neural networks , 2015, ACL.

[49] Zhiyi Chi,et al. Statistical Properties of Probabilistic Context-Free Grammars , 1999, CL.