Deep Active Learning for Named Entity Recognition

Deep learning has yielded state-of-the-art performance on many natural language processing tasks including named entity recognition (NER). However, this typically requires large amounts of labeled data. In this work, we demonstrate that the amount of labeled training data can be drastically reduced when deep learning is combined with active learning. While active learning is sample-efficient, it can be computationally expensive since it requires iterative retraining. To speed this up, we introduce a lightweight architecture for NER, viz., the CNN-CNN-LSTM model consisting of convolutional character and word encoders and a long short term memory (LSTM) tag decoder. The model achieves nearly state-of-the-art performance on standard datasets for the task while being computationally much more efficient than best performing models. We carry out incremental active learning, during the training process, and are able to nearly match state-of-the-art performance with just 25% of the original training data.

[1]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[2]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[5]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[6]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 shared task , 2003 .

[7]  Jian Su,et al.  Multi-Criteria-based Active Learning for Named Entity Recognition , 2004, ACL.

[8]  Adam Tauman Kalai,et al.  Analysis of Perceptron-Based Active Learning , 2009, COLT.

[9]  Andrew McCallum,et al.  Reducing Labeling Effort for Structured Prediction Tasks , 2005, AAAI.

[10]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[11]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[12]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[13]  Fredrik Olsson,et al.  A literature survey of active machine learning in the context of natural language processing , 2009 .

[14]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[15]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[16]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[17]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[19]  Yoshua Bengio,et al.  Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding , 2013, INTERSPEECH.

[20]  Hwee Tou Ng,et al.  Towards Robust Linguistic Analysis using OntoNotes , 2013, CoNLL.

[21]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[22]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[23]  Christopher D. Manning Computational Linguistics and Deep Learning , 2015, Computational Linguistics.

[24]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[25]  Rishabh K. Iyer,et al.  Submodularity in Data Subset Selection and Active Learning , 2015, ICML.

[26]  Wang Ling,et al.  Two/Too Simple Adaptations of Word2Vec for Syntax Problems , 2015, NAACL.

[27]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[28]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[29]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Ruslan Salakhutdinov,et al.  Multi-Task Cross-Lingual Sequence Tagging from Scratch , 2016, ArXiv.

[32]  Halil Kilicoglu,et al.  Annotating Named Entities in Consumer Health Questions , 2016, LREC.

[33]  Avirup Sil,et al.  Toward Mention Detection Robustness with Recurrent Neural Networks , 2016, ArXiv.

[34]  Andrew McCallum,et al.  Fast and Accurate Entity Recognition with Iterated Dilated Convolutions , 2017, EMNLP.

[35]  Iryna Gurevych,et al.  Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging , 2017, EMNLP.

[36]  Maria-Florina Balcan,et al.  The Power of Localization for Efficiently Learning Linear Separators with Noise , 2013, J. ACM.

[37]  Ye Zhang,et al.  Active Discriminative Text Representation Learning , 2016, AAAI.

[38]  Chicheng Zhang,et al.  Revisiting Perceptron: Efficient and Label-Optimal Active Learning of Halfspaces , 2017, ArXiv.

[39]  Bowen Zhou,et al.  Neural Models for Sequence Chunking , 2017, AAAI.

[40]  Ruimao Zhang,et al.  Cost-Effective Active Learning for Deep Image Classification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[41]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[42]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.