The Spike-and-Slab RBM and Extensions to Discrete and Sparse Data Distributions

The spike-and-slab restricted Boltzmann machine (ssRBM) is defined to have both a real-valued “slab” variable and a binary “spike” variable associated with each unit in the hidden layer. The model uses its slab variables to model the conditional covariance of the observation-thought to be important in capturing the statistical properties of natural images. In this paper, we present the canonical ssRBM framework together with some extensions. These extensions highlight the flexibility of the spike-and-slab RBM as a platform for exploring more sophisticated probabilistic models of high dimensional data in general and natural image data in particular. Here, we introduce the subspace-ssRBM focused on the task of learning invariant features. We highlight the behaviour of the ssRBM and its extensions through experiments with the MNIST digit recognition task and the CIFAR-10 object classification task.

[1]  Andrew Y. Ng,et al.  The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[2]  R. Fergus,et al.  Learning invariant features through topographic filter maps , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Bruno A. Olshausen,et al.  Learning Horizontal Connections in a Sparse Coding Model of Natural Images , 2007, NIPS.

[5]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[6]  Jörg Lücke,et al.  A Closed-Form EM Algorithm for Sparse Coding , 2011 .

[7]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[8]  Yoshua Bengio,et al.  Suitability of V1 Energy Models for Object Classification , 2011, Neural Computation.

[9]  Geoffrey E. Hinton,et al.  On deep generative models with applications to recognition , 2011, CVPR 2011.

[10]  Brian Sallans,et al.  A Hierarchical Community of Experts , 1999, Learning in Graphical Models.

[11]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[12]  Tapani Raiko,et al.  Parallel tempering is efficient for learning restricted Boltzmann machines , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[13]  Teuvo Kohonen,et al.  Emergence of invariant-feature detectors in the adaptive-subspace self-organizing map , 1996, Biological Cybernetics.

[14]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[15]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[16]  Aapo Hyvärinen,et al.  Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[17]  Yoshua Bengio,et al.  Texture Modeling with Convolutional Spike-and-Slab RBMs and Deep Extensions , 2012, AISTATS.

[18]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  Katherine A. Heller,et al.  Bayesian and L1 Approaches to Sparse Unsupervised Learning , 2011, ICML 2012.

[21]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[22]  Christopher K. I. Williams,et al.  Multiple Texture Boltzmann Machines , 2012, AISTATS.

[23]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[24]  E. Oja,et al.  Independent Component Analysis , 2013 .

[25]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[26]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[27]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[28]  A. Krizhevsky Convolutional Deep Belief Networks on CIFAR-10 , 2010 .

[29]  Yoshua Bengio,et al.  Large-Scale Feature Learning With Spike-and-Slab Sparse Coding , 2012, ICML.

[30]  Yoshua Bengio,et al.  Unsupervised Models of Images by Spikeand-Slab RBMs , 2011, ICML.

[31]  Geoffrey E. Hinton,et al.  Learning Sparse Topographic Representations with Products of Student-t Distributions , 2002, NIPS.

[32]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[33]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[34]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[35]  Pascal Vincent,et al.  Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines , 2010, AISTATS.

[36]  Quoc V. Le,et al.  Tiled convolutional neural networks , 2010, NIPS.

[37]  Geoffrey E. Hinton,et al.  Generating more realistic images using gated MRF's , 2010, NIPS.

[38]  Guillermo Sapiro,et al.  Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations , 2009, NIPS.

[39]  Andrew Y. Ng,et al.  Selecting Receptive Fields in Deep Networks , 2011, NIPS.

[40]  Ruslan Salakhutdinov,et al.  Learning in Markov Random Fields using Tempered Transitions , 2009, NIPS.

[41]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[42]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[43]  Geoffrey E. Hinton,et al.  Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images , 2010, AISTATS.

[44]  Stefan Schaal,et al.  Proc. Advances in Neural Information Processing Systems (NIPS '08) , 2008 .

[45]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[46]  Tong Zhang,et al.  Improved Local Coordinate Coding using Local Tangents , 2010, ICML.

[47]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[48]  Christopher K. I. Williams,et al.  Products of Gaussians and Probabilistic Minor Component Analysis , 2002, Neural Computation.

[49]  Miguel Lázaro-Gredilla,et al.  Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning , 2011, NIPS.

[50]  Ilya Sutskever,et al.  Parallelizable Sampling of Markov Random Fields , 2010, AISTATS.

[51]  L. Younes On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates , 1999 .