Active Discriminative Text Representation Learning

We propose a new active learning (AL) method for text classification with convolutional neural networks (CNNs). In AL, one selects the instances to be manually labeled with the aim of maximizing model performance with minimal effort. Neural models capitalize on word embeddings as representations (features), tuning these to the task at hand. We argue that AL strategies for multi-layered neural models should focus on selecting instances that most affect the embedding space (i.e., induce discriminative word representations). This is in contrast to traditional AL approaches (e.g., entropy-based uncertainty sampling), which specify higher level objectives. We propose a simple approach for sentence classification that selects instances containing words whose embeddings are likely to be updated with the greatest magnitude, thereby rapidly learning discriminative, task-specific embeddings. We extend this approach to document classification by jointly considering: (1) the expected changes to the constituent word representations; and (2) the model's current overall uncertainty regarding the instance. The relative emphasis placed on these criteria is governed by a stochastic process that favors selecting instances likely to improve representations at the outset of learning, and then shifts toward general uncertainty sampling as AL progresses. Empirical results show that our method outperforms baseline AL approaches on both sentence and document classification tasks. We also show that, as expected, the method quickly learns discriminative word embeddings. To the best of our knowledge, this is the first work on AL addressing neural models for text classification.

[1]  Ye Zhang,et al.  MGNC-CNN: A Simple Approach to Exploiting Multiple Word Embeddings for Sentence Classification , 2016, NAACL.

[2]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[3]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[4]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[5]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[6]  Tong Zhang,et al.  Effective Use of Word Order for Text Categorization with Convolutional Neural Networks , 2014, NAACL.

[7]  Tianshun Yao,et al.  Active Learning with Sampling by Uncertainty and Density for Word Sense Disambiguation and Text Classification , 2008, COLING.

[8]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[9]  Carla E. Brodley,et al.  Active learning for biomedical citation screening , 2010, KDD.

[10]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[11]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[12]  Arnold Neumaier,et al.  Introduction to Numerical Analysis , 2001 .

[13]  Ye Zhang,et al.  A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification , 2015, IJCNLP.

[14]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[15]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[16]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[17]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[18]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[19]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[20]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[21]  Maria Eugenia Ramirez-Loaiza,et al.  Active learning: an empirical study of common baselines , 2017, Data Mining and Knowledge Discovery.

[22]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[23]  Ye Zhang,et al.  Rationale-Augmented Convolutional Neural Networks for Text Classification , 2016, EMNLP.

[24]  Mark Dredze,et al.  A large-scale quantitative analysis of latent factors and sentiment in online doctor reviews , 2014, J. Am. Medical Informatics Assoc..