Multi-Prediction Deep Boltzmann Machines

We introduce the multi-prediction deep Boltzmann machine (MP-DBM). The MP-DBM can be seen as a single probabilistic model trained to maximize a variational approximation to the generalized pseudolikelihood, or as a family of recurrent nets that share parameters and approximately solve different inference problems. Prior methods of training DBMs either do not perform well on classification tasks or require an initial learning pass that trains the DBM greedily, one layer at a time. The MP-DBM does not require greedy layerwise pretraining, and outperforms the standard DBM at classification, classification with missing inputs, and mean field prediction tasks.1

[1]  D. Blackwell Conditional Expectation and Unbiased Sequential Estimation , 1947 .

[2]  Calyampudi R. Rao,et al.  Linear statistical inference and its applications , 1965 .

[3]  Calyampudi R. Rao,et al.  Linear Statistical Inference and Its Applications. , 1975 .

[4]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5]  Emile H. L. Aarts,et al.  Boltzmann machines , 1998 .

[6]  L. Younes On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates , 1999 .

[7]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[8]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[9]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[10]  Adi Shraibman,et al.  Rank, Trace-Norm and Max-Norm , 2005, COLT.

[11]  Nicolas Le Roux,et al.  Representational Power of Restricted Boltzmann Machines and Deep Belief Networks , 2008, Neural Computation.

[12]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[13]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[14]  Veselin Stoyanov,et al.  Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.

[15]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[16]  Klaus-Robert Müller,et al.  Deep Boltzmann Machines and the Centering Trick , 2012, Neural Networks: Tricks of the Trade.

[17]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[18]  Yann Ollivier,et al.  Layer-wise learning of deep generative models , 2012, ArXiv.

[19]  K. Müller,et al.  Learning Feature Hierarchies with Centered Deep Boltzmann Machines , 2012, ArXiv.

[20]  Ian J. Goodfellow,et al.  Pylearn2: a machine learning research library , 2013, ArXiv.

[21]  Benjamin Schrauwen,et al.  Training energy-based models for time-series imputation , 2013, J. Mach. Learn. Res..

[22]  Yoshua Bengio,et al.  Scaling Up Spike-and-Slab Models for Unsupervised Feature Learning , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Yoshua Bengio,et al.  Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.