Semisupervised Text Classification by Variational Autoencoder

Semisupervised text classification has attracted much attention from the research community. In this paper, a novel model, the semisupervised sequential variational autoencoder (SSVAE), is proposed to tackle this problem. By treating the categorical label of unlabeled data as a discrete latent variable, the proposed model maximizes the variational evidence lower bound of the data likelihood, which implicitly derives the underlying label distribution for the unlabeled data. Analytical work indicates that the autoregressive nature of the sequential model is the crucial issue that renders the vanilla model ineffective. To remedy this, two types of decoders are investigated in the SSVAE model and verified. In addition, a reweighting approach is proposed to circumvent the credit assignment problem that occurs during the reconstruction procedure, which can further improve performance for sparse text data. Experimental results show that our method significantly improves the classification accuracy compared with other modern methods.

[1]  Fabio Gagliardi Cozman,et al.  Semi-Supervised Learning of Mixture Models , 2003, ICML.

[2]  Lukás Burget,et al.  Semi-supervised training of Deep Neural Networks , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[3]  Mark Johnson,et al.  A Bayesian LDA-based model for semi-supervised part-of-speech tagging , 2007, NIPS.

[4]  Tong Zhang,et al.  Effective Use of Word Order for Text Categorization with Convolutional Neural Networks , 2014, NAACL.

[5]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[8]  Tong Zhang,et al.  The Value of Unlabeled Data for Classification Problems , 2000, ICML 2000.

[9]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[10]  Ying Tan,et al.  Variational Autoencoder for Semi-Supervised Text Classification , 2017, AAAI.

[11]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[12]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[13]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[14]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[15]  Andriy Mnih,et al.  Variational Inference for Monte Carlo Objectives , 2016, ICML.

[16]  Min Yang,et al.  Investigating Capsule Networks with Dynamic Routing for Text Classification , 2018, EMNLP.

[17]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[18]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[19]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[20]  Jost Tobias Springenberg,et al.  Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks , 2015, ICLR.

[21]  Larry P. Heck,et al.  Contextual LSTM (CLSTM) models for Large scale NLP tasks , 2016, ArXiv.

[22]  Zhiting Hu,et al.  Improved Variational Autoencoders for Text Modeling using Dilated Convolutions , 2017, ICML.

[23]  Songcan Chen,et al.  Safety-Aware Semi-Supervised Classification , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[24]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[25]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[26]  Lei Li,et al.  Reinforced Co-Training , 2018, NAACL.

[27]  David Vandyke,et al.  Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[28]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[29]  Ole Winther,et al.  Auxiliary Deep Generative Models , 2016, ICML.

[30]  Xiaocheng Feng,et al.  Effective LSTMs for Target-Dependent Sentiment Classification , 2015, COLING.

[31]  Thomas G. Dietterich Adaptive computation and machine learning , 1998 .

[32]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[33]  Marco Loog,et al.  Semi-Supervised Nearest Mean Classification Through a Constrained Log-Likelihood , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[34]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[35]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[36]  Chong Wang,et al.  TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency , 2016, ICLR.

[37]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[38]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[39]  Peter Glöckner,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2013 .

[40]  Lex Weaver,et al.  The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.

[41]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[42]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[43]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[44]  Qiang Yang,et al.  Transferring Naive Bayes Classifiers for Text Classification , 2007, AAAI.

[45]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[46]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[47]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[48]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[49]  Iñaki Inza,et al.  Semisupervised Multiclass Classification Problems With Scarcity of Labeled Data: A Theoretical Study , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[50]  Anil A. Bharath,et al.  Inverting the Generator of a Generative Adversarial Network , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[51]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[52]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[53]  Antonia Creswell,et al.  Denoising Adversarial Autoencoders , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[54]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[55]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[56]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[57]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[58]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[59]  Andrew M. Dai,et al.  Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.