Particle Filtered MCMC-MLE with Connections to Contrastive Divergence

Learning undirected graphical models such as Markov random fields is an important machine learning task with applications in many domains. Since it is usually intractable to learn these models exactly, various approximate learning techniques have been developed, such as contrastive divergence (CD) and Markov chain Monte Carlo maximum likelihood estimation (MCMC-MLE). In this paper, we introduce particle filtered MCMC-MLE, which is a sampling-importance-resampling version of MCMC-MLE with additional MCMC rejuvenation steps. We also describe a unified view of (1) MCMC-MLE, (2) our particle filtering approach, and (3) a stochastic approximation procedure known as persistent contrastive divergence. We show how these approaches are related to each other and discuss the relative merits of each approach. Empirical results on various undirected models demonstrate that the particle filtering technique we propose in this paper can significantly outperform MCMC-MLE. Furthermore, in certain cases, the proposed technique is faster than persistent CD.

[1]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[2]  W. Gilks,et al.  Following a moving target—Monte Carlo inference for dynamic Bayesian models , 2001 .

[3]  C. Geyer,et al.  Constrained Monte Carlo Maximum Likelihood for Dependent Data , 1992 .

[4]  Martina Morris,et al.  statnet: Software Tools for the Representation, Visualization, Analysis and Simulation of Network Data. , 2008, Journal of statistical software.

[5]  Jun S. Liu,et al.  Monte Carlo Maximum Likelihood for Exponential Random Graph Models: From Snowballs to Umbrella Densities , 2009 .

[6]  Alan E. Gelfand,et al.  Bayesian statistics without tears: A sampling-resampling perspective , 1992 .

[7]  Geoffrey E. Hinton,et al.  Using fast weights to improve persistent contrastive divergence , 2009, ICML '09.

[8]  Peng Wang,et al.  Recent developments in exponential random graph (p*) models for social networks , 2007, Soc. Networks.

[9]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[10]  David Madigan,et al.  A Sequential Monte Carlo Method for Bayesian Analysis of Massive Datasets , 2003, Data Mining and Knowledge Discovery.

[11]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[12]  Stan Z. Li,et al.  Markov Random Field Models in Computer Vision , 1994, ECCV.

[13]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[14]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[15]  Aapo Hyvärinen,et al.  Consistency of Pseudolikelihood Estimation of Fully Visible Boltzmann Machines , 2006, Neural Computation.

[16]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[17]  Ruslan Salakhutdinov,et al.  Learning in Markov Random Fields using Tempered Transitions , 2009, NIPS.

[18]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[19]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[20]  Mark W. Schmidt,et al.  Accelerated training of conditional random fields with stochastic gradient methods , 2006, ICML.

[21]  Matthew King,et al.  A Stochastic approximation method for inference in probabilistic graphical models , 2009, NIPS.

[22]  Tom A. B. Snijders,et al.  Markov Chain Monte Carlo Estimation of Exponential Random Graph Models , 2002, J. Soc. Struct..

[23]  Jun S. Liu,et al.  Sequential Imputations and Bayesian Missing Data Problems , 1994 .

[24]  L. Younes Estimation and annealing for Gibbsian fields , 1988 .

[25]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[26]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[27]  Laurent Younes,et al.  Stochastic gradient estimation strategies for Markov random fields , 1998, Optics & Photonics.

[28]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[29]  Pascal Vincent,et al.  Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines , 2010, AISTATS.

[30]  Martial Hebert,et al.  Discriminative Fields for Modeling Spatial Dependencies in Natural Images , 2003, NIPS.

[31]  Padhraic Smyth,et al.  Learning with Blocks: Composite Likelihood and Contrastive Divergence , 2010, AISTATS.

[32]  G. Kitagawa Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models , 1996 .