Toward Mention Detection Robustness with Recurrent Neural Networks

One of the key challenges in natural language processing (NLP) is to yield good performance across application domains and languages. In this work, we investigate the robustness of the mention detection systems, one of the fundamental tasks in information extraction, via recurrent neural networks (RNNs). The advantage of RNNs over the traditional approaches is their capacity to capture long ranges of context and implicitly adapt the word embeddings, trained on a large corpus, into a task-specific word representation, but still preserve the original semantic generalization to be helpful across domains. Our systematic evaluation for RNN architectures demonstrates that RNNs not only outperform the best reported systems (up to 9\% relative error reduction) in the general setting but also achieve the state-of-the-art performance in the cross-domain setting for English. Regarding other languages, RNNs are significantly better than the traditional methods on the similar task of named entity recognition for Dutch (up to 22\% relative error reduction).

[1]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[4]  Jun Zhao,et al.  Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[5]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[6]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[7]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[8]  Ralph Grishman,et al.  Semantic Representations for Domain Adaptation: A Case Study on the Tree Kernel-based Method for Relation Extraction , 2015, ACL.

[9]  Yoshua Bengio,et al.  Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding , 2013, INTERSPEECH.

[10]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[11]  Heng Ji,et al.  Constructing Information Networks Using One Single Model , 2014, EMNLP.

[12]  Christopher Meek,et al.  Semantic Parsing for Single-Relation Question Answering , 2014, ACL.

[13]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[14]  Bowen Zhou,et al.  Classifying Relations by Ranking with Convolutional Neural Networks , 2015, ACL.

[15]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[16]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[17]  Michael I. Jordan Serial Order: A Parallel Distributed Processing Approach , 1997 .

[18]  Jun Suzuki,et al.  Semi-Supervised Sequential Labeling and Segmentation Using Giga-Word Scale Unlabeled Data , 2008, ACL.

[19]  Scott Miller,et al.  Name Tagging with Word Clusters and Discriminative Training , 2004, NAACL.

[20]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[21]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[22]  Yelong Shen,et al.  Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[23]  Heng Ji,et al.  Incremental Joint Extraction of Entity Mentions and Relations , 2014, ACL.

[24]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[25]  Ralph Grishman,et al.  Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition , 1998, VLC@COLING/ACL.

[26]  Xavier Carreras,et al.  Named Entity Extraction using AdaBoost , 2002, CoNLL.

[27]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[28]  Imed Zitouni,et al.  Mention Detection Crossing the Language Barrier , 2008, EMNLP.

[29]  D. Roth 1 Global Inference for Entity and Relation Identification via a Linear Programming Formulation , 2007 .

[30]  Wei Xu,et al.  End-to-end learning of semantic role labeling using recurrent neural networks , 2015, ACL.

[31]  Ralph Grishman,et al.  Employing Word Representations and Regularization for Domain Adaptation of Relation Extraction , 2014, ACL.

[32]  Fei Huang,et al.  Exploring Representation-Learning Approaches to Domain Adaptation , 2010 .

[33]  Cícero Nogueira dos Santos,et al.  Boosting Named Entity Recognition with Neural Character Embeddings , 2015, NEWS@ACL.

[34]  L. Getoor,et al.  1 Global Inference for Entity and Relation Identification via a Linear Programming Formulation , 2007 .

[35]  Hinrich Schütze,et al.  FLORS: Fast and Simple Domain Adaptation for Part-of-Speech Tagging , 2014, TACL.

[36]  Oriol Vinyals,et al.  Multilingual Language Processing From Bytes , 2015, NAACL.

[37]  E. Amari-Vaught,et al.  Don't I count? , 1997, The Hastings Center report.

[38]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[39]  Qing Ma,et al.  Natural language processing with neural networks , 2002, Language Engineering Conference, 2002. Proceedings.

[40]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[41]  Hongyu Guo,et al.  The Unreasonable Effectiveness of Word Representations for Twitter Named Entity Recognition , 2015, NAACL.

[42]  Tong Zhang,et al.  Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[43]  Tong Zhang,et al.  A High-Performance Semi-Supervised Learning Method for Text Chunking , 2005, ACL.

[44]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[45]  Alessandro Moschitti,et al.  Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction , 2013, ACL.

[46]  Dekang Lin,et al.  Phrase Clustering for Discriminative Learning , 2009, ACL.

[47]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[48]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[49]  Ralph Grishman,et al.  Event Detection and Domain Adaptation with Convolutional Neural Networks , 2015, ACL.

[50]  Min Xiao,et al.  Domain Adaptation for Sequence Labeling Tasks with a Probabilistic Language Adaptation Model , 2013, ICML.

[51]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[52]  Rohit J. Kate,et al.  Joint Entity and Relation Extraction Using Card-Pyramid Parsing , 2010, CoNLL.

[53]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[54]  Andrew McCallum,et al.  Lexicon Infused Phrase Embeddings for Named Entity Resolution , 2014, CoNLL.

[55]  Michael C. Mozer,et al.  A Focused Backpropagation Algorithm for Temporal Pattern Recognition , 1989, Complex Syst..

[56]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[57]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[58]  Imed Zitouni,et al.  Factorizing Complex Models: A Case Study in Mention Detection , 2006, ACL.

[59]  Imed Zitouni,et al.  Improving Mention Detection Robustness to Noisy Input , 2010, EMNLP.

[60]  Joel Nothman,et al.  Learning multilingual named entity recognition from Wikipedia , 2013, Artif. Intell..

[61]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[62]  Mark Dredze,et al.  Improved Relation Extraction with Feature-Rich Compositional Embedding Models , 2015, EMNLP.

[63]  Geoffrey Zweig,et al.  Recurrent neural networks for language understanding , 2013, INTERSPEECH.

[64]  Xiaoqiang Luo,et al.  A Statistical Model for Multilingual Entity Detection and Tracking , 2004, NAACL.

[65]  Geoffrey Zweig,et al.  Spoken language understanding using long short-term memory neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[66]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[67]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[68]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.