A Contrastive Divergence for Combining Variational Inference and MCMC

We develop a method to combine Markov chain Monte Carlo (MCMC) and variational inference (VI), leveraging the advantages of both inference approaches. Specifically, we improve the variational distribution by running a few MCMC steps. To make inference tractable, we introduce the variational contrastive divergence (VCD), a new divergence that replaces the standard Kullback-Leibler (KL) divergence used in VI. The VCD captures a notion of discrepancy between the initial variational distribution and its improved version (obtained after running the MCMC steps), and it converges asymptotically to the symmetrized KL divergence between the variational distribution and the posterior of interest. The VCD objective can be optimized efficiently with respect to the variational parameters via stochastic optimization. We show experimentally that optimizing the VCD leads to better predictive performance on two latent variable models: logistic matrix factorization and variational autoencoders (VAEs).

[1]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[2]  D. Mackay,et al.  Bayesian neural networks and density networks , 1995 .

[3]  Miguel Lázaro-Gredilla,et al.  Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[4]  Scott W. Linderman,et al.  Variational Sequential Monte Carlo , 2017, AISTATS.

[5]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[6]  H. Robbins A Stochastic Approximation Method , 1951 .

[7]  Michael I. Jordan,et al.  Variational Bayesian Inference with Stochastic Search , 2012, ICML.

[8]  Matthew D. Hoffman,et al.  Learning Deep Latent Gaussian Models with Markov Chain Monte Carlo , 2017, ICML.

[9]  Stefano Ermon,et al.  Variational Rejection Sampling , 2018, AISTATS.

[10]  Matthew King,et al.  A Stochastic approximation method for inference in probabilistic graphical models , 2009, NIPS.

[11]  José Miguel Hernández-Lobato,et al.  Ergodic Inference: Accelerate Convergence by Optimisation , 2018 .

[12]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[13]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[14]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[15]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[16]  Aki Vehtari,et al.  Yes, but Did It Work?: Evaluating Variational Inference , 2018, ICML.

[17]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[18]  Zoubin Ghahramani,et al.  Ergodic Measure Preserving Flows , 2018 .

[19]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[20]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[21]  Yee Whye Teh,et al.  Filtering Variational Objectives , 2017, NIPS.

[22]  Qiang Liu,et al.  Approximate Inference with Amortised MCMC , 2017, ArXiv.

[23]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[24]  David Barber,et al.  Auxiliary Variational MCMC , 2018, International Conference on Learning Representations.

[25]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[26]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[27]  Michalis K. Titsias,et al.  Learning Model Reparametrizations: Implicit Variational Inference by Fitting MCMC distributions , 2017, 1708.01529.