CD notes

Contrastive divergence is an approximate ML learning algorithm proposed by Hinton (2001). The Hinton network is a determinsitic mapping from observable space x of dimension D to an energy function E(x; w) parameterised by parameters w. The energy defines a probability via the Boltzmann distribution: P (x|w) = exp[−E(x, w)] Z(w) (1) Z(w) = d D x exp[−E(x, w)] (2) Z is the normalising constant or partition function, which is very hard to evaluate. (Note that models composed of an energy which is a sum of terms correspond to products of probability distributions-so products of experts naturally fall into this framework. Undirected graphical models are another example, in which the energy is specified through non-directional compatibilities.) Differentiating the log-likelihood of the parameters we have;