Learning Algorithms for the Classification Restricted Boltzmann Machine

Recent developments have demonstrated the capacity of restricted Boltzmann machines (RBM) to be powerful generative models, able to extract useful features from input data or construct deep artificial neural networks. In such settings, the RBM only yields a preprocessing or an initialization for some other model, instead of acting as a complete supervised model in its own right. In this paper, we argue that RBMs can provide a self-contained framework for developing competitive classifiers. We study the Classification RBM (ClassRBM), a variant on the RBM adapted to the classification setting. We study different strategies for training the ClassRBM and show that competitive classification performances can be reached when appropriately combining discriminative and generative training objectives. Since training according to the generative objective requires the computation of a generally intractable gradient, we also compare different approaches to estimating this gradient and address the issue of obtaining such a gradient for problems with very high dimensional inputs. Finally, we describe how to adapt the ClassRBM to two special cases of classification problems, namely semi-supervised and multitask learning.

[1]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[2]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[3]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[4]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[5]  M. Opper,et al.  Comparing the Mean Field Method and Belief Propagation for Approximate Inference in MRFs , 2001 .

[6]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[7]  Geoffrey E. Hinton,et al.  A New Learning Algorithm for Mean Field Boltzmann Machines , 2002, ICANN.

[8]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[9]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[10]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[11]  Guillaume Bouchard,et al.  The Tradeoff Between Generative and Discriminative Classifiers , 2004 .

[12]  Rong Yan,et al.  Mining Associated Text and Images with Dual-Wing Harmoniums , 2005, UAI.

[13]  Nicolas Le Roux,et al.  The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.

[14]  M. Pretti A message-passing algorithm with damping , 2005 .

[15]  Max Welling,et al.  Learning in Markov Random Fields with Contrastive Free Energies , 2005, AISTATS.

[16]  Peter V. Gehler,et al.  The rate adapting poisson model for information retrieval and object recognition , 2006, ICML.

[17]  Christopher Joseph Pal,et al.  Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification , 2006, AAAI.

[18]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[19]  Aapo Hyvärinen,et al.  Consistency of Pseudolikelihood Estimation of Fully Visible Boltzmann Machines , 2006, Neural Computation.

[20]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[21]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[22]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[23]  Alexander Zien,et al.  Label Propagation and Quadratic Criterion , 2006 .

[24]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[25]  Eric P. Xing,et al.  Harmonium Models for Semantic Video Representation and Classification , 2007, SDM.

[26]  Geoffrey E. Hinton,et al.  To recognize shapes, first learn to generate images. , 2007, Progress in brain research.

[27]  John F. Kalaska,et al.  Computational neuroscience : theoretical insights into brain function , 2007 .

[28]  Christopher Joseph Pal,et al.  Semi-supervised classification with hybrid generative/discriminative methods , 2007, KDD '07.

[29]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[30]  Geoffrey E. Hinton,et al.  Three new graphical models for statistical language modelling , 2007, ICML '07.

[31]  Daniel P. W. Ellis,et al.  Please Scroll down for Article Journal of New Music Research a Web-based Game for Collecting Music Metadata a Web-based Game for Collecting Music Metadata , 2022 .

[32]  Geoffrey E. Hinton,et al.  Generative versus discriminative training of RBMs for classification of fMRI images , 2008, NIPS.

[33]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[34]  Michael I. Jordan,et al.  An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators , 2008, ICML '08.

[35]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[36]  Paul Lamere,et al.  Social Tagging and Music Information Retrieval , 2008 .

[37]  Geoffrey E. Hinton,et al.  Using fast weights to improve persistent contrastive divergence , 2009, ICML '09.

[38]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[39]  Rossano Schifanella,et al.  Folks in Folksonomies: social link prediction from shared metadata , 2010, WSDM '10.

[40]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[41]  Douglas Eck,et al.  Learning Tags that Vary Within a Song , 2010, ISMIR.

[42]  Geoffrey E. Hinton,et al.  Gated Softmax Classification , 2010, NIPS.

[43]  Nicolas Le Roux,et al.  Deep Belief Networks Are Compact Universal Approximators , 2010, Neural Computation.

[44]  Andrew Gelfand,et al.  On Herding and the Perceptron Cycling Theorem , 2010, NIPS.

[45]  Padhraic Smyth,et al.  Learning with Blocks: Composite Likelihood and Contrastive Divergence , 2010, AISTATS.

[46]  Max Welling,et al.  Hidden-Unit Conditional Random Fields , 2011, AISTATS.

[47]  Razvan Pascanu,et al.  Contextual tag inference , 2011, TOMCCAP.

[48]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[49]  Razvan Pascanu,et al.  Autotagging music with conditional restricted Boltzmann machines , 2011, ArXiv.

[50]  Geoffrey E. Hinton,et al.  Conditional Restricted Boltzmann Machines for Structured Output Prediction , 2011, UAI.

[51]  Hugo Larochelle,et al.  Classification of Sets using Restricted Boltzmann Machines , 2011, UAI.

[52]  Harry Joe,et al.  Composite Likelihood Methods , 2012 .