Representational Power of Restricted Boltzmann Machines and Deep Belief Networks

Deep belief networks (DBN) are generative neural network models with many layers of hidden explanatory factors, recently introduced by Hinton, Osindero, and Teh (2006) along with a greedy layer-wise unsupervised learning algorithm. The building block of a DBN is a probabilistic model called a restricted Boltzmann machine (RBM), used to represent one layer of the model. Restricted Boltzmann machines are interesting because inference is easy in them and because they have been successfully used as building blocks for training deeper models. We first prove that adding hidden units yields strictly improved modeling power, while a second theorem shows that RBMs are universal approximators of discrete distributions. We then study the question of whether DBNs with more layers are strictly more powerful in terms of representational power. This suggests a new and less greedy criterion for training RBMs within DBNs.

[1]  Miklós Ajtai,et al.  ∑11-Formulae on finite structures , 1983, Ann. Pure Appl. Log..

[2]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[3]  J. Håstad Computational limitations of small-depth circuits , 1987 .

[4]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[5]  Eric Allender,et al.  Circuit Complexity before the Dawn of the New Millennium , 1996, FSTTCS.

[6]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[7]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[8]  Gerald Tesauro,et al.  Practical issues in temporal difference learning , 1992, Machine Learning.

[9]  Pascal Vincent,et al.  Non-Local Manifold Parzen Windows , 2005, NIPS.

[10]  Nicolas Le Roux,et al.  The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.

[11]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[12]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[13]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[14]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[15]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[16]  Max Welling Donald,et al.  Products of Experts , 2007 .

[17]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[18]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[19]  Geoffrey E. Hinton,et al.  Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.

[20]  Geoffrey E. Hinton Reducing the Dimensionality of Data with Neural , 2008 .

[21]  Josephine Yu,et al.  An implicitization challenge for binary factor analysis , 2010, J. Symb. Comput..