Unbiased Gradient Estimation for Variational Auto-Encoders using Coupled Markov Chains

The variational auto-encoder (VAE) is a deep latent variable model that has two neural networks in an autoencoder-like architecture; one of them parameterizes the model's likelihood. Fitting its parameters via maximum likelihood is challenging since the computation of the likelihood involves an intractable integral over the latent space; thus the VAE is trained instead by maximizing a variational lower bound. Here, we develop a maximum likelihood training scheme for VAEs by introducing unbiased gradient estimators of the log-likelihood. We obtain the unbiased estimators by augmenting the latent space with a set of importance samples, similarly to the importance weighted auto-encoder (IWAE), and then constructing a Markov chain Monte Carlo (MCMC) coupling procedure on this augmented space. We provide the conditions under which the estimators can be computed in finite time and have finite variance. We demonstrate experimentally that VAEs fitted with unbiased estimators exhibit better predictive performance on three image datasets.

[1]  Pierre E. Jacob,et al.  Estimating Convergence of Markov chains with L-Lag Couplings , 2019, NeurIPS.

[2]  Ruslan Salakhutdinov,et al.  On the Quantitative Analysis of Decoder-Based Generative Models , 2016, ICLR.

[3]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[4]  John O'Leary,et al.  Unbiased Markov chain Monte Carlo with couplings , 2017, 1708.03625.

[5]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[6]  Peter W. Glynn,et al.  Exact estimation for Markov chain equilibrium expectations , 2014, Journal of Applied Probability.

[7]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[8]  Xi Chen,et al.  PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.

[9]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[10]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[11]  David Duvenaud,et al.  Reinterpreting Importance-Weighted Autoencoders , 2017, ICLR.

[12]  Xiao Wang,et al.  Unbiased Contrastive Divergence Algorithm for Training Energy-Based Latent Variable Models , 2020, ICLR.

[13]  Justin Domke,et al.  Importance Weighting and Variational Inference , 2018, NeurIPS.

[14]  A. Doucet,et al.  Particle Markov chain Monte Carlo methods , 2010 .

[15]  Arnaud Doucet,et al.  Unbiased Smoothing using Particle Independent Metropolis-Hastings , 2019, AISTATS.

[16]  Christophe Andrieu,et al.  Uniform ergodicity of the iterated conditional SMC and geometric ergodicity of particle Gibbs samplers , 2013, 1312.6432.

[17]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[18]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[19]  Noah D. Goodman,et al.  Amortized Inference in Probabilistic Reasoning , 2014, CogSci.

[20]  Yee Whye Teh,et al.  Tighter Variational Bounds are Not Necessarily Better , 2018, ICML.

[21]  Fredrik Lindsten,et al.  Smoothing With Couplings of Conditional Particle Filters , 2017, Journal of the American Statistical Association.

[22]  A. Doucet,et al.  Controlled sequential Monte Carlo , 2017, The Annals of Statistics.

[23]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[24]  T. Lindvall Lectures on the Coupling Method , 1992 .

[25]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[26]  Radford M. Neal,et al.  Sampling Latent States for High-Dimensional Non-Linear State Space Models with the Embedded HMM Method , 2016, Bayesian Analysis.

[27]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[28]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[29]  George Tucker,et al.  Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives , 2019, ICLR.

[30]  Miguel Lázaro-Gredilla,et al.  Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[31]  Fredrik Lindsten,et al.  Markovian Score Climbing: Variational Inference with KL(p||q) , 2020, NeurIPS.

[32]  Xiao-Li Meng,et al.  Double Happiness: Enhancing the Coupled Gains of L-lag Coupling via Control Variates. , 2020, 2008.12662.

[33]  Ryan P. Adams,et al.  SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models , 2020, ICLR.

[34]  Arnaud Doucet,et al.  Unbiased Markov chain Monte Carlo for intractable target distributions , 2020, Electronic Journal of Statistics.