Transport Score Climbing: Variational Inference Using Forward KL and Adaptive Neural Transport

Variational inference often minimizes the “reverse” Kullbeck-Leibler (KL) KL ( q || p ) from the approximate distribution q to the posterior p . Recent work studies the “forward” KL KL ( p || q ) , which unlike reverse KL does not lead to variational approximations that underestimate uncertainty. This paper introduces Transport Score Climbing (TSC), a method that optimizes KL ( p || q ) by using Hamiltonian Monte Carlo (HMC) and a novel adaptive transport map. The transport map improves the trajectory of HMC by acting as a change of variable between the latent variable space and a warped space. TSC uses HMC samples to dynamically train the transport map while optimizing KL ( p || q ) . TSC leverages synergies, where better transport maps lead to better HMC sampling, which then leads to better transport maps. We demonstrate TSC on synthetic and real data. We find that TSC achieves competitive performance when training variational autoencoders on large-scale data.

[1]  Jacob R. Gardner,et al.  Markov Chain Score Ascent: A Unifying Framework of Variational Inference with Markovian Gradients , 2022, NeurIPS.

[2]  Grant M. Rotskoff,et al.  Adaptive Monte Carlo augmented with normalizing flows , 2021, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Michael I. Jordan,et al.  Variational Refinement for Importance Sampling Using the Forward Kullback-Leibler Divergence , 2021, UAI.

[4]  Francisco J. R. Ruiz,et al.  Unbiased Gradient Estimation for Variational Auto-Encoders using Coupled Markov Chains , 2020, UAI.

[5]  Zhijian Ou,et al.  Joint Stochastic Approximation and Its Application to Learning Discrete Latent Variable Models , 2020, UAI.

[6]  C. A. Naesseth,et al.  Markovian Score Climbing: Variational Inference with KL(p||q) , 2020, NeurIPS.

[7]  John Paisley,et al.  Reweighted Expectation Maximization , 2019, ArXiv.

[8]  Fredrik Lindsten,et al.  Elements of Sequential Monte Carlo , 2019, Found. Trends Mach. Learn..

[9]  Joshua V. Dillon,et al.  NeuTra-lizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport , 2019, 1903.03704.

[10]  Arnaud Doucet,et al.  Hamiltonian Variational Auto-Encoder , 2018, NeurIPS.

[11]  Aki Vehtari,et al.  Yes, but Did It Work?: Evaluating Variational Inference , 2018, ICML.

[12]  Dustin Tran,et al.  TensorFlow Distributions , 2017, ArXiv.

[13]  Oren Mangoubi,et al.  Rapid Mixing of Hamiltonian Monte Carlo on Strongly Log-Concave Distributions , 2017, 1708.07114.

[14]  Matthew D. Hoffman,et al.  Learning Deep Latent Gaussian Models with Markov Chain Monte Carlo , 2017, ICML.

[15]  Ruslan Salakhutdinov,et al.  On the Quantitative Analysis of Decoder-Based Generative Models , 2016, ICLR.

[16]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[17]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[18]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[19]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[20]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[21]  Richard E. Turner,et al.  Neural Adaptive Sequential Monte Carlo , 2015, NIPS.

[22]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[25]  Yoshua Bengio,et al.  Reweighted Wake-Sleep , 2014, ICLR.

[26]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[27]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[28]  C. Andrieu,et al.  Markovian stochastic approximation with expanding projections , 2011, 1111.5421.

[29]  E. Tabak,et al.  A Family of Nonparametric Density Estimation Algorithms , 2013 .

[30]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[31]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[32]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[33]  C. Andrieu,et al.  On the ergodicity properties of some adaptive MCMC algorithms , 2006, math/0610317.

[34]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[35]  E. Kuhn,et al.  Coupling a stochastic approximation version of EM with an MCMC procedure , 2004 .

[36]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[37]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[38]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[39]  D. Dittmar Slice Sampling , 2000 .

[40]  Heikki Haario,et al.  Adaptive proposal distribution for random walk Metropolis algorithm , 1999, Comput. Stat..

[41]  F. Kong,et al.  A stochastic approximation algorithm with Markov chain Monte-carlo method for incomplete data estimation problems. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[43]  A. Kennedy,et al.  Hybrid Monte Carlo , 1988 .