论文信息 - Representational Power of Restricted Boltzmann Machines and Deep Belief Networks

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks

Deep belief networks (DBN) are generative neural network models with many layers of hidden explanatory factors, recently introduced by Hinton, Osindero, and Teh (2006) along with a greedy layer-wise unsupervised learning algorithm. The building block of a DBN is a probabilistic model called a restricted Boltzmann machine (RBM), used to represent one layer of the model. Restricted Boltzmann machines are interesting because inference is easy in them and because they have been successfully used as building blocks for training deeper models. We first prove that adding hidden units yields strictly improved modeling power, while a second theorem shows that RBMs are universal approximators of discrete distributions. We then study the question of whether DBNs with more layers are strictly more powerful in terms of representational power. This suggests a new and less greedy criterion for training RBMs within DBNs.

Nicolas Le Roux | Yoshua Bengio | Yoshua Bengio

[1] Miklós Ajtai,et al. ∑11-Formulae on finite structures , 1983, Ann. Pure Appl. Log..

[2] Geoffrey E. Hinton,et al. A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[3] J. Håstad. Computational limitations of small-depth circuits , 1987 .

[4] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[5] Eric Allender,et al. Circuit Complexity before the Dawn of the New Millennium , 1996, FSTTCS.

[6] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[7] Geoffrey E. Hinton,et al. Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[8] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.

[9] Pascal Vincent,et al. Non-Local Manifold Parzen Windows , 2005, NIPS.

[10] Nicolas Le Roux,et al. The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.

[11] Miguel Á. Carreira-Perpiñán,et al. On Contrastive Divergence Learning , 2005, AISTATS.