Failures of the One-Step Learning Algorithm

The Hinton network (Hinton, 2001, personal communication) is a deterministic mapping from an observable space x to an energy function E(x; w), parameterized by parameters w. The energy deflnes a probability P(xjw) = exp(iE(x; w))=Z(w). A maximum likelihood learning algorithm for this density model takes steps ¢w/ihgi 0 + hgi 1 where hgi 0 is the average of the gradient g = @E=@w evaluated at points x drawn from the data density, and hgi 1 is the average gradient for points x drawn from P(xjw). If T is a Markov chain in x-space that has P(xjw) as its unique invariant density then we can approximate hgi 1 by taking the data points x and hitting each of them I times with T, where I is a large integer. In the one-step learning algorithm of Hinton (2001), we set I to 1. In this paper I give examples of models E(x; w) and Markov chains T for which the true likelihood is unimodal in the parameters, but the one-step algorithm does not necessarily converge to the maximum likelihood parameters. It is hoped that these negative examples will help pin down the conditions for the one-step algorithm to be a correctly convergent algorithm. The Hinton network (Hinton, 2001, personal communication) is a deterministic mapping from anobservablespace xofdimension Dtoanenergyfunction E(x;w),parameterizedbyparameters w. The energy deflnesa probability

[1]  P. Jana,et al.  MAXIMUM-ENTROPY APPROACH , 2003 .