Learning a Deep Hybrid Model for Semi-Supervised Text Classification

We present a novel fine-tuning algorithm in a deep hybrid architecture for semisupervised text classification. During each increment of the online learning process, the fine-tuning algorithm serves as a top-down mechanism for pseudo-jointly modifying model parameters following a bottom-up generative learning pass. The resulting model, trained under what we call the Bottom-Up-Top-Down learning algorithm, is shown to outperform a variety of competitive models and baselines trained across a wide range of splits between supervised and unsupervised training data.

[1]  David Reitter,et al.  Online Learning of Deep Hybrid Architectures for Semi-supervised Categorization , 2015, ECML/PKDD.

[2]  Jeff A. Bilmes,et al.  Soft-Supervised Learning for Text Classification , 2008, EMNLP.

[3]  R. Horton Rules and representations , 1993, The Lancet.

[4]  Barbara C. Scholz,et al.  Empirical assessment of stimulus poverty arguments , 2002 .

[5]  Noam Chomsky The new organology , 1980, Behavioral and Brain Sciences.

[6]  Geoffrey E. Hinton,et al.  Generative versus discriminative training of RBMs for classification of fMRI images , 2008, NIPS.

[7]  M. Tomasello Perceiving intentions and learning words in the second year of life , 2000 .

[8]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[9]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[10]  Razvan Pascanu,et al.  Learning Algorithms for the Classification Restricted Boltzmann Machine , 2012, J. Mach. Learn. Res..

[11]  J. Håstad Computational limitations of small-depth circuits , 1987 .

[12]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[13]  Cornelia Caragea,et al.  Researcher homepage classification using unlabeled data , 2013, WWW.

[14]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[15]  Tapani Raiko,et al.  Learning Deep Belief Networks from Non-stationary Streams , 2012, ICANN.

[16]  Geoffrey E. Hinton,et al.  Application of Deep Belief Networks for Natural Language Understanding , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Honglak Lee,et al.  Online Incremental Feature Learning with Denoising Autoencoders , 2012, AISTATS.

[18]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[19]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[20]  Cornelia Caragea,et al.  Automatic Identification of Research Articles from Crawled Documents , 2014, WSDM 2014.

[21]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[22]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[23]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[24]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[25]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[26]  Bo Xu,et al.  Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation , 2014, ACL.

[27]  Yadong Mu,et al.  Supervised deep learning with auxiliary networks , 2014, KDD.

[28]  Tao Liu,et al.  A Novel Text Classification Approach Based on Deep Belief Network , 2010, ICONIP.

[29]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[30]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[31]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[32]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[33]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[34]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[35]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[36]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[37]  Hang Li,et al.  A Deep Architecture for Matching Short Texts , 2013, NIPS.

[38]  Jakub M. Tomczak Prediction of breast cancer recurrence using Classification Restricted Boltzmann Machine with Dropping , 2013, ArXiv.

[39]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.