Weakly-Supervised Text Classification Using Label Names Only

Current text classification methods typically require a good number of human-labeled documents as training data, which can be costly and difficult to obtain in real applications. Humans can perform classification without seeing any labeled examples but only based on a small set of words describing the categories to be classified. In this paper, we explore the potential of only using the label name of each class to train classification models on unlabeled data, without using any labeled documents. We use pre-trained neural language models both as general linguistic knowledge sources for category understanding and as representation learning models for document classification. Our method (1) associates semantically related words with the label names, (2) finds category-indicative words and trains the model to predict their implied categories, and (3) generalizes the model via self-training. We show that our model achieves around 90% accuracy on four benchmark datasets including topic and sentiment classification without using any labeled documents but learning from unlabeled data supervised by at most 3 words (1 in most cases) per class as the label name.

[1]  Diyi Yang,et al.  MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification , 2020, ACL.

[2]  Bing Liu,et al.  Review spam detection , 2007, WWW '07.

[3]  Alex Wang,et al.  What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.

[4]  Muktabh Mayank Srivastava,et al.  Train Once, Test Anywhere: Zero-Shot Learning for Text Classification , 2017, ArXiv.

[5]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[6]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[7]  Johannes Fürnkranz,et al.  All-in Text: Learning Document, Label, and Word Representations Jointly , 2016, AAAI.

[8]  Ramakanth Kavuluru,et al.  Few-Shot and Zero-Shot Multi-Label Learning for Structured Label Spaces , 2018, EMNLP.

[9]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Dan Roth,et al.  Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach , 2019, EMNLP.

[13]  Andrew M. Dai,et al.  Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.

[14]  Jiawei Han,et al.  Weakly-Supervised Neural Text Classification , 2018, CIKM.

[15]  Chao Zhang,et al.  Discriminative Topic Mining via Category-Name Guided Text Embedding , 2020, WWW.

[16]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[17]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[18]  Qiaozhu Mei,et al.  PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks , 2015, KDD.

[19]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[20]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[21]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[22]  Qimai Li,et al.  Reconstructing Capsule Networks for Zero-shot Intent Classification , 2019, EMNLP.

[23]  Quoc V. Le,et al.  Unsupervised Data Augmentation , 2019, ArXiv.

[24]  Jiawei Han,et al.  Weakly-Supervised Hierarchical Text Classification , 2018, AAAI.

[25]  Ting Liu,et al.  Document Modeling with Gated Recurrent Neural Network for Sentiment Classification , 2015, EMNLP.

[26]  Jingbo Shang,et al.  Contextualized Weak Supervision for Text Classification , 2020, ACL.

[27]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[28]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[29]  Yu Zhang,et al.  Minimally Supervised Categorization of Text with Metadata , 2020, SIGIR.

[30]  Chao Zhang,et al.  Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding , 2020, KDD.

[31]  Yike Guo,et al.  Integrating Semantic Knowledge to Tackle Zero-shot Text Classification , 2019, NAACL.

[32]  Heng Ji,et al.  Aspect-Based Sentiment Analysis by Aspect-Sentiment Joint Embedding , 2020, EMNLP.

[33]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[34]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[35]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[36]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[37]  Dan Roth,et al.  On Dataless Hierarchical Text Classification , 2014, AAAI.

[38]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[39]  Jian Xing,et al.  Effective Document Labeling with Very Few Seed Words: A Topic Model Approach , 2016, CIKM.

[40]  Peng Jin,et al.  Dataless Text Classification with Descriptive LDA , 2015, AAAI.

[41]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[42]  Ming-Wei Chang,et al.  Importance of Semantic Representation: Dataless Classification , 2008, AAAI.

[43]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[44]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[45]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[46]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[47]  Jure Leskovec,et al.  Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.

[48]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[49]  Philip S. Yu,et al.  Zero-shot User Intent Detection via Capsule Neural Networks , 2018, EMNLP.

[50]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.