Sparse Group Restricted Boltzmann Machines

Since learning in Boltzmann machines is typically quite slow, there is a need to restrict connections within hidden layers. However, the resulting states of hidden units exhibit statistical dependencies. Based on this observation, we propose using l1/l2 regularization upon the activation probabilities of hidden units in restricted Boltzmann machines to capture the local dependencies among hidden units. This regularization not only encourages hidden units of many groups to be inactive given observed data but also makes hidden units within a group compete with each other for modeling observed data. Thus, the l1/l2 regularization on RBMs yields sparsity at both the group and the hidden unit levels. We call RBMs trained with the regularizer sparse group RBMs (SGRBMs). The proposed SGRBMs are applied to model patches of natural images, handwritten digits and OCR English letters. Then to emphasize that SGRBMs can learn more discriminative features we applied SGRBMs to pretrain deep networks for classification tasks. Furthermore, we illustrate the regularizer can also be applied to deep Boltzmann machines, which lead to sparse group deep Boltzmann machines. When adapted to the MNIST data set, a two-layer sparse group Boltzmann machine achieves an error rate of 0.84%, which is, to our knowledge, the best published result on the permutation-invariant version of the MNIST task.

[1]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[2]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[3]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[4]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[5]  Nando de Freitas,et al.  A tutorial on stochastic approximation algorithms for training Restricted Boltzmann Machines and Deep Belief Nets , 2010, 2010 Information Theory and Applications Workshop (ITA).

[6]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[7]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[8]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[9]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[10]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[11]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[12]  Bruno A. Olshausen,et al.  Learning Horizontal Connections in a Sparse Coding Model of Natural Images , 2007, NIPS.

[13]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[14]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[15]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[16]  Emile H. L. Aarts,et al.  Boltzmann machines , 1998 .

[17]  Geoffrey E. Hinton,et al.  Implicit Mixtures of Restricted Boltzmann Machines , 2008, NIPS.

[18]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[19]  Hugo Larochelle,et al.  Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[20]  David Haussler,et al.  Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.