Online Semi-Supervised Learning with Deep Hybrid Boltzmann Machines and Denoising Autoencoders

Two novel deep hybrid architectures, the Deep Hybrid Boltzmann Machine and the Deep Hybrid Denoising Auto-encoder, are proposed for handling semi-supervised learning problems. The models combine experts that model relevant distributions at different levels of abstraction to improve overall predictive performance on discriminative tasks. Theoretical motivations and algorithms for joint learning for each are presented. We apply the new models to the domain of data-streams in work towards life-long learning. The proposed architectures show improved performance compared to a pseudo-labeled, drop-out rectifier network.

[1]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[2]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[3]  David Reitter,et al.  Learning a Deep Hybrid Model for Semi-Supervised Text Classification , 2015, EMNLP.

[4]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[5]  Razvan Pascanu,et al.  Learning Algorithms for the Classification Restricted Boltzmann Machine , 2012, J. Mach. Learn. Res..

[6]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[7]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Honglak Lee,et al.  Online Incremental Feature Learning with Denoising Autoencoders , 2012, AISTATS.

[9]  Marc'Aurelio Ranzato,et al.  Semi-supervised learning of compact document representations with deep networks , 2008, ICML '08.

[10]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[11]  Yoshua Bengio,et al.  Difference Target Propagation , 2014, ECML/PKDD.

[12]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[13]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[14]  Jason Weston,et al.  Deep learning via semi-supervised embedding , 2008, ICML '08.

[15]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[16]  João Gama,et al.  On evaluating stream learning algorithms , 2012, Machine Learning.

[17]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[18]  Geoffrey E. Hinton,et al.  A New Learning Algorithm for Mean Field Boltzmann Machines , 2002, ICANN.

[19]  Yoshua Bengio,et al.  How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation , 2014, ArXiv.

[20]  Yadong Mu,et al.  Supervised deep learning with auxiliary networks , 2014, KDD.

[21]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[22]  Pascal Vincent,et al.  The Manifold Tangent Classifier , 2011, NIPS.

[23]  Yoshua Bengio,et al.  Multi-Prediction Deep Boltzmann Machines , 2013, NIPS.

[24]  Yann Ollivier,et al.  Layer-wise learning of deep generative models , 2012, ArXiv.

[25]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[26]  Tapani Raiko,et al.  Learning Deep Belief Networks from Non-stationary Streams , 2012, ICANN.

[27]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[28]  Thomas Seidl,et al.  MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering , 2010, WAPA.

[29]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[30]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[31]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[32]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[33]  David Reitter,et al.  Online Learning of Deep Hybrid Architectures for Semi-supervised Categorization , 2015, ECML/PKDD.

[34]  R. Horton Rules and representations , 1993, The Lancet.

[35]  Hugo Larochelle,et al.  Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[36]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.