Herding Dynamic Weights for Partially Observed Random Field Models

Learning the parameters of a (potentially partially observable) random field model is intractable in general. Instead of focussing on a single optimal parameter value we propose to treat parameters as dynamical quantities. We introduce an algorithm to generate complex dynamics for parameters and (both visible and hidden) state vectors. We show that under certain conditions averages computed over trajectories of the proposed dynamical system converge to averages computed over the data. Our "herding dynamics" does not require expensive operations such as exponentiation and is fully deterministic.

[1]  J. Besag Efficiency of pseudolikelihood estimation for simple Gaussian fields , 1977 .

[2]  L. Younes Parametric Inference for imperfectly observed Gibbsian fields , 1989 .

[3]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[4]  Wolfgang Maass,et al.  Dynamic Stochastic Synapses as Computational Units , 1997, Neural Computation.

[5]  A. Goetz Dynamics of piecewise isometries , 2000 .

[6]  Geoffrey E. Hinton,et al.  Self Supervised Boosting , 2002, NIPS.

[7]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[8]  Song-Chun Zhu,et al.  Learning in Gibbsian Fields: How Accurate and How Fast Can It Be? , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Alan L. Yuille,et al.  The Convergence of Contrastive Divergences , 2004, NIPS.

[10]  Zoubin Ghahramani,et al.  Bayesian Learning in Undirected Graphical Models: Approximate MCMC Algorithms , 2004, UAI.

[11]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[12]  Max Welling,et al.  Bayesian Random Fields: The Bethe-Laplace Approximation , 2006, UAI.

[13]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[14]  Geoffrey E. Hinton,et al.  Using fast weights to improve persistent contrastive divergence , 2009, ICML '09.

[15]  Max Welling,et al.  Herding dynamical weights to learn , 2009, ICML '09.