论文信息 - End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF - 字舞流文

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF

State-of-the-art sequence labeling systems traditionally require large amounts of task-specific knowledge in the form of hand-crafted features and data pre-processing. In this paper, we introduce a novel neutral network architecture that benefits from both word- and character-level representations automatically, by using combination of bidirectional LSTM, CNN and CRF. Our system is truly end-to-end, requiring no feature engineering or data pre-processing, thus making it applicable to a wide range of sequence labeling tasks. We evaluate our system on two data sets for two sequence labeling tasks --- Penn Treebank WSJ corpus for part-of-speech (POS) tagging and CoNLL 2003 corpus for named entity recognition (NER). We obtain state-of-the-art performance on both the two data --- 97.55\% accuracy for POS tagging and 91.21\% F1 for NER.

Eduard H. Hovy | Xuezhe Ma | E. Hovy | Xuezhe Ma

[1] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[2] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[3] Cícero Nogueira dos Santos,et al. Boosting Named Entity Recognition with Neural Character Embeddings , 2015, NEWS@ACL.

[4] Hwee Tou Ng,et al. Named Entity Recognition: A Maximum Entropy Approach Using Global Information , 2002, COLING.

[5] Eric Nichols,et al. Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[6] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[7] Christopher D. Manning. Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.

[8] Cícero Nogueira dos Santos,et al. Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[9] Harm de Vries,et al. RMSProp and equilibrated adaptive learning rates for non-convex optimization. , 2015 .

[10] Hai Zhao,et al. Probabilistic Models for High-Order Projective Dependency Parsing , 2015, ArXiv.

[11] Noah A. Smith,et al. Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[12] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[13] Koby Crammer,et al. Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[14] Eduard H. Hovy,et al. Efficient Inner-to-outer Greedy Algorithm for Higher-order Labeled Dependency Parsing , 2015, EMNLP.

[15] Dan Roth,et al. Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[16] Dekang Lin,et al. Phrase Clustering for Discriminative Learning , 2009, ACL.

[17] Giorgio Satta,et al. Guided Learning for Bidirectional Sequence Classification , 2007, ACL.

[18] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20] Wei Xu,et al. Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[21] Tong Zhang,et al. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[22] Rich Caruana,et al. Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.

[23] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[24] Eric P. Xing,et al. Harnessing Deep Neural Networks with Logic Rules , 2016, ACL.

[25] Christoph Goller,et al. Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[26] Yoshua Bengio,et al. Equilibrated adaptive learning rates for non-convex optimization , 2015, NIPS.

[27] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[28] Anders Søgaard,et al. Semi-supervised condensed nearest neighbor for part-of-speech tagging , 2011, ACL.

[29] Eduard H. Hovy,et al. Unsupervised Ranking Model for Entity Coreference Resolution , 2016, NAACL.

[30] Nanyun Peng,et al. Improving Named Entity Recognition for Chinese Social Media with Word Segmentation Representation Learning , 2016, ACL.

[31] Andrew McCallum,et al. Lexicon Infused Phrase Embeddings for Named Entity Resolution , 2014, CoNLL.

[32] Alexandre Allauzen,et al. Non-lexical neural architecture for fine-grained POS Tagging , 2015, EMNLP.

[33] Fei Xia,et al. Unsupervised Dependency Parsing with Transferring Distribution via Parallel Guidance and Entropy Regularization , 2014, ACL.

[34] Zaiqing Nie,et al. Joint Entity Recognition and Disambiguation , 2015, EMNLP.

[35] Guillaume Lample,et al. Neural Architectures for Named Entity Recognition , 2016, NAACL.

[36] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[37] Erik F. Tjong Kim Sang,et al. Representing Text Chunks , 1999, EACL.

[38] Vincent Ng,et al. Supervised Noun Phrase Coreference Research: The First Fifteen Years , 2010, ACL.

[39] Danqi Chen,et al. A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[40] Joakim Nivre,et al. Deterministic Dependency Parsing of English Text , 2004, COLING.

[41] Nanyun Peng,et al. Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings , 2015, EMNLP.

[42] Dan Klein,et al. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[43] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[44] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[46] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[47] Lluís Màrquez i Villodre,et al. SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[48] Wojciech Zaremba,et al. An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[49] Xu Sun,et al. Structure Regularization for Structured Prediction , 2014, NIPS.

[50] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[51] Wang Ling,et al. Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[52] Tong Zhang,et al. Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[53] Jürgen Schmidhuber,et al. Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[54] Hai Zhao,et al. Fourth-Order Dependency Parsing , 2012, COLING.

[55] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[56] Yung-Chun Chang,et al. Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization , 2015, Journal of Cheminformatics.

[57] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[58] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[59] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[60] Ruslan Salakhutdinov,et al. Multi-Task Cross-Lingual Sequence Tagging from Scratch , 2016, ArXiv.

[61] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[62] Michael Collins,et al. Efficient Third-Order Dependency Parsers , 2010, ACL.