Information Geometry of Contrastive Divergence

The contrastive divergence(CD) method proposed by Hinton finds an approximate solution of the maximum likelihood of complex probability models. It is known empirically that the CD method gives a high-quality estimation in a small computation time. In this paper, we give an intuitive explanation about the reason why the CD method can approximate well by using the information geometry. We further propose an improved method that is consistent with the maximum likelihood (or MAP) estimation, while the CD method is biased in general.

[1]  S. Akaho The e-PCA and m-PCA: dimension reduction of parameters by information geometry , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[2]  F. Götze Differential-geometrical methods in statistics. Lecture notes in statistics - A. Shun-ichi. , 1987 .

[3]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[4]  E. Seneta Non-negative Matrices and Markov Chains , 2008 .

[5]  Shun-ichi Amari,et al.  Information geometry of the EM and em algorithms for neural networks , 1995, Neural Networks.

[6]  Shun-ichi Amari,et al.  Information geometry of turbo and low-density parity-check codes , 2004, IEEE Transactions on Information Theory.

[7]  Christopher K. I. Williams,et al.  An analysis of contrastive divergence learning in gaussian boltzmann machines , 2002 .

[8]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[9]  Takafumi Kanamori,et al.  Information Geometry of U-Boost and Bregman Divergence , 2004, Neural Computation.

[10]  L. Williams,et al.  Contents , 2020, Ophthalmology (Rochester, Minn.).

[11]  Alan L. Yuille,et al.  The Convergence of Contrastive Divergences , 2004, NIPS.

[12]  Valerie Isham,et al.  Non‐Negative Matrices and Markov Chains , 1983 .

[13]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[14]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[16]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[17]  Emile H. L. Aarts,et al.  Simulated Annealing: Theory and Applications , 1987, Mathematics and Its Applications.