MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification

This paper presents MixText, a semi-supervised learning method for text classification, which uses our newly designed data augmentation method called TMix. TMix creates a large amount of augmented training samples by interpolating text in hidden space. Moreover, we leverage recent advances in data augmentation to guess low-entropy labels for unlabeled data, hence making them as easy to use as labeled this http URL mixing labeled, unlabeled and augmented data, MixText significantly outperformed current pre-trained and fined-tuned models and other state-of-the-art semi-supervised learning methods on several text classification benchmarks. The improvement is especially prominent when supervision is extremely limited. We have publicly released our code at this https URL.

[1]  Ioannis Mitliagkas,et al.  Manifold Mixup: Better Representations by Interpolating Hidden States , 2018, ICML.

[2]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[3]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[4]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Zhou Yu,et al.  Incorporating Structured Commonsense Knowledge in Story Completion , 2018, AAAI.

[6]  Kevin Gimpel,et al.  Variational Sequential Labelers for Semi-Supervised Learning , 2019, EMNLP.

[7]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[8]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[9]  David Berthelot,et al.  MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[10]  Ming-Wei Chang,et al.  Importance of Semantic Representation: Dataless Classification , 2008, AAAI.

[11]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[12]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[13]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[14]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[15]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[16]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[17]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[18]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[19]  Kai Zou,et al.  EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.

[20]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[21]  Yoshua Bengio,et al.  Interpolation Consistency Training for Semi-Supervised Learning , 2019, IJCAI.

[22]  Mihai Surdeanu,et al.  An Exploration of Three Lightly-supervised Representation Learning Approaches for Named Entity Classification , 2018, COLING.

[23]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[25]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[26]  Andrew M. Dai,et al.  Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.

[27]  Jiawei Han,et al.  Weakly-Supervised Neural Text Classification , 2018, CIKM.

[28]  Mihai Surdeanu,et al.  Exploration of Noise Strategies in Semi-supervised Named Entity Classification , 2019, *SEMEVAL.

[29]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[30]  Quoc V. Le,et al.  Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[31]  Benoît Sagot,et al.  What Does BERT Learn about the Structure of Language? , 2019, ACL.

[32]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[33]  Carolyn Penstein Rosé,et al.  Weakly Supervised Role Identification in Teamwork Interactions , 2015, ACL.

[34]  Quoc V. Le,et al.  Semi-Supervised Sequence Modeling with Cross-View Training , 2018, EMNLP.

[35]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[36]  Nitesh V. Chawla,et al.  Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains , 2011, J. Artif. Intell. Res..

[37]  Partha Talukdar,et al.  Submodular Optimization-based Diverse Paraphrasing and its Effectiveness in Data Augmentation , 2019, NAACL.

[38]  Zhiting Hu,et al.  Improved Variational Autoencoders for Text Modeling using Dilated Convolutions , 2017, ICML.

[39]  Diyi Yang,et al.  Let’s Make Your Request More Persuasive: Modeling Persuasive Strategies via Semi-Supervised Neural Nets on Crowdfunding Platforms , 2019, NAACL.

[40]  Roland Vollgraf,et al.  Pooled Contextualized Embeddings for Named Entity Recognition , 2019, NAACL.

[41]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[42]  Noah A. Smith,et al.  Variational Pretraining for Semi-supervised Text Classification , 2019, ACL.

[43]  Rotem Dror,et al.  The Hitchhiker’s Guide to Testing Statistical Significance in Natural Language Processing , 2018, ACL.

[44]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[45]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[46]  Diyi Yang,et al.  Semi-supervised Models via Data Augmentation for Classifying Interactive Affective Responses , 2020, AffCon@AAAI.

[47]  Yoshua Bengio,et al.  GraphMix: Regularized Training of Graph Neural Networks for Semi-Supervised Learning , 2019, ArXiv.