论文信息 - Robust Lexical Features for Improved Neural Network Named-Entity Recognition - 字舞流文

Robust Lexical Features for Improved Neural Network Named-Entity Recognition

Neural network approaches to Named-Entity Recognition reduce the need for carefully hand-crafted features. While some features do remain in state-of-the-art systems, lexical features have been mostly discarded, with the exception of gazetteers. In this work, we show that this is unfair: lexical features are actually quite useful. We propose to embed words and entity types into a low-dimensional vector space we train from annotated data produced by distant supervision thanks to Wikipedia. From this, we compute - offline - a feature vector representing each word. When used with a vanilla recurrent neural network model, this representation yields substantial improvements. We establish a new state-of-the-art F1 score of 87.95 on ONTONOTES 5.0, while matching state-of-the-art performance with a F1 score of 91.73 on the over-studied CONLL-2003 dataset.

Philippe Langlais | Abbas Ghaddar | P. Langlais | Abbas Ghaddar

[1] Philippe Langlais,et al. WikiCoref: An English Coreference-annotated Corpus of Wikipedia Articles , 2016, LREC.

[2] Steven Skiena,et al. POLYGLOT-NER: Massive Multilingual Named Entity Recognition , 2014, SDM.

[3] Christopher D. Manning,et al. Joint Parsing and Named Entity Recognition , 2009, NAACL.

[4] Yuchen Zhang,et al. CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes , 2012, EMNLP-CoNLL Shared Task.

[5] Philippe Langlais,et al. Transforming Wikipedia into a Large-Scale Fine-Grained Entity Type Corpus , 2018, LREC.

[6] Eric Nichols,et al. Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[7] Abbas Ghaddar,et al. WiNER: A Wikipedia Annotated Corpus for Named Entity Recognition , 2017, IJCNLP.

[8] Mitchell P. Marcus,et al. OntoNotes: The 90% Solution , 2006, NAACL.

[9] Hwee Tou Ng,et al. Towards Robust Linguistic Analysis using OntoNotes , 2013, CoNLL.

[10] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11] Xiang Ren,et al. Empower Sequence Labeling with Task-Aware Neural Language Model , 2017, AAAI.

[12] Anima Anandkumar,et al. Deep Active Learning for Named Entity Recognition , 2017, Rep4NLP@ACL.

[13] Andrew McCallum,et al. Fast and Accurate Entity Recognition with Iterated Dilated Convolutions , 2017, EMNLP.

[14] Dan Roth,et al. Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[15] Wei Xu,et al. Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[16] Ramón Fernández Astudillo,et al. Not All Contexts Are Created Equal: Better Word Representations with Variable Attention , 2015, EMNLP.

[17] Yoshua Bengio,et al. Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[18] Dan Klein,et al. A Joint Model for Entity Analysis: Coreference, Typing, and Linking , 2014, TACL.

[19] Eduard H. Hovy,et al. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[20] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[21] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[22] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.

[23] Andrew McCallum,et al. Lexicon Infused Phrase Embeddings for Named Entity Resolution , 2014, CoNLL.

[24] Anders Søgaard,et al. Deep multi-task learning with low level tasks supervised at lower layers , 2016, ACL.

[25] Philippe Langlais,et al. Coreference in Wikipedia: Main Concept Resolution , 2016, CoNLL.

[26] Dekang Lin,et al. Phrase Clustering for Discriminative Learning , 2009, ACL.

[27] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[28] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[29] Chandra Bhagavatula,et al. Semi-supervised sequence tagging with bidirectional language models , 2017, ACL.

[30] Christopher D. Manning,et al. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[31] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.

[32] Yue Zhang,et al. Neural Reranking for Named Entity Recognition , 2017, RANLP.

[33] Antonio Jimeno-Yepes,et al. Named Entity Recognition with Stack Residual LSTM and Trainable Bias Decoding , 2017, IJCNLP.

[34] Laurens van der Maaten,et al. Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[35] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[36] Praveen Paritosh,et al. Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[37] Jens Lehmann,et al. DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[38] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[39] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[41] Zaiqing Nie,et al. Joint Entity Recognition and Disambiguation , 2015, EMNLP.

[42] Guillaume Lample,et al. Neural Architectures for Named Entity Recognition , 2016, NAACL.

[43] Joel Nothman,et al. Transforming Wikipedia into Named Entity Training Data , 2008, ALTA.

[44] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..