Semi-Supervised Text Classification with Balanced Deep Representation Distributions

Semi-Supervised Text Classification (SSTC) mainly works under the spirit of self-training. They initialize the deep classifier by training over labeled texts; and then alternatively predict unlabeled texts as their pseudo-labels and train the deep classifier over the mixture of labeled and pseudo-labeled texts. Naturally, their performance is largely affected by the accuracy of pseudo-labels for unlabeled texts. Unfortunately, they often suffer from low accuracy because of the margin bias problem caused by the large difference between representation distributions of labels in SSTC. To alleviate this problem, we apply the angular margin loss, and perform Gaussian linear transformation to achieve balanced label angle variances, i.e., the variance of label angles of texts within the same label. More accuracy of predicted pseudo-labels can be achieved by constraining all label angle variances balanced, where they are estimated over both labeled and pseudo-labeled texts during self-training loops. With this insight, we propose a novel SSTC method, namely Semi-Supervised Text Classification with Balanced Deep representation Distributions (STC-BDD). To evaluate STCBDD, we compare it against the state-of-theart SSTC methods. Empirical results demonstrate the effectiveness of STC-BDD, especially when the labeled texts are scarce.

[1]  Diyi Yang,et al.  MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification , 2020, ACL.

[2]  Holger H. Hoos,et al.  A survey on semi-supervised learning , 2019, Machine Learning.

[3]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Chuchu Han,et al.  Deep Representation Learning on Long-Tailed Data: A Learnable Embedding Augmentation Perspective , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  ThrunSebastian,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000 .

[6]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Quoc V. Le,et al.  Semi-Supervised Sequence Modeling with Cross-View Training , 2018, EMNLP.

[8]  Ruslan Salakhutdinov,et al.  Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function , 2019, AAAI.

[9]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[10]  Zhou Yu,et al.  Incorporating Structured Commonsense Knowledge in Story Completion , 2018, AAAI.

[11]  Tolga Tasdizen,et al.  Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning , 2016, NIPS.

[12]  Andrew M. Dai,et al.  Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.

[13]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[14]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[15]  Ming-Wei Chang,et al.  Importance of Semantic Representation: Dataless Classification , 2008, AAAI.

[16]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[17]  Quoc V. Le,et al.  Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[18]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[19]  Noah A. Smith,et al.  Variational Pretraining for Semi-supervised Text Classification , 2019, ACL.

[20]  Roland Vollgraf,et al.  Pooled Contextualized Embeddings for Named Entity Recognition , 2019, NAACL.

[21]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[22]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[23]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[24]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[25]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[26]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.