论文信息 - Bounding the Bias of Contrastive Divergence Learning

Bounding the Bias of Contrastive Divergence Learning

Optimization based on k-step contrastive divergence (CD) has become a common way to train restricted Boltzmann machines (RBMs). The k-step CD is a biased estimator of the log-likelihood gradient relying on Gibbs sampling. We derive a new upper bound for this bias. Its magnitude depends on k, the number of variables in the RBM, and the maximum change in energy that can be produced by changing a single variable. The last reflects the dependence on the absolute values of the RBM parameters. The magnitude of the bias is also affected by the distance in variation between the modeled distribution and the starting distribution of the Gibbs chain.

Christian Igel | Asja Fischer | Asja Fischer | C. Igel

[1] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[2] Yoshua Bengio,et al. Justifying and Generalizing Contrastive Divergence , 2009, Neural Computation.

[3] Alan L. Yuille,et al. The Convergence of Contrastive Divergences , 2004, NIPS.

[4] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[5] Paul Smolensky,et al. Information processing in dynamical systems: foundations of harmony theory , 1986 .

[6] Igel Christian,et al. Contrastive Divergence Learning May Diverge When Training Restricted Boltzmann Machines , 2009 .

[7] John Odentrantz,et al. Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues , 2000, Technometrics.

[8] Miguel Á. Carreira-Perpiñán,et al. On Contrastive Divergence Learning , 2005, AISTATS.

[9] Geoffrey E. Hinton. Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[10] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[11] Pascal Vincent,et al. Parallel Tempering for Training of Restricted Boltzmann Machines , 2010 .

[12] D. Mackay,et al. Failures of the One-Step Learning Algorithm , 2001 .

[13] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[14] Christian Igel,et al. Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines , 2010, ICANN.