Learning Deep Generative Models with Annealed Importance Sampling

Variational inference (VI) and Markov chain Monte Carlo (MCMC) are two main approximate approaches for learning deep generative models by maximizing marginal likelihood. In this paper, we propose using annealed importance sampling for learning deep generative models. Our proposed approach bridges VI with MCMC. It generalizes VI methods such as variational auto-encoders and importance weighted auto-encoders (IWAE) and the MCMC method proposed in (Hoffman, 2017). It also provides insights into why running multiple short MCMC chains can help learning deep generative models. Through experiments, we show that our approach yields better density models than IWAE and can effectively trade computation for model accuracy without increasing memory cost.

[1]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[2]  Tian Han,et al.  Alternating Back-Propagation for Generator Network , 2016, AAAI.

[3]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[4]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[5]  George Tucker,et al.  Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives , 2019, ICLR.

[6]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[7]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[8]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[9]  Ole Winther,et al.  Ladder Variational Autoencoders , 2016, NIPS.

[10]  Matt Hoffman Langevin Dynamics as Nonparametric Variational Inference , 2019 .

[11]  Yoshua Bengio,et al.  Reweighted Wake-Sleep , 2014, ICLR.

[12]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[13]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[14]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[15]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[16]  C. Jarzynski Nonequilibrium Equality for Free Energy Differences , 1996, cond-mat/9610209.

[17]  Tapani Raiko,et al.  Techniques for Learning Binary Stochastic Feedforward Neural Networks , 2014, ICLR.

[18]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[19]  Andriy Mnih,et al.  Variational Inference for Monte Carlo Objectives , 2016, ICML.

[20]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[21]  Ryan P. Adams,et al.  Sandwiching the marginal likelihood using bidirectional Monte Carlo , 2015, ArXiv.

[22]  Matthew D. Hoffman,et al.  Learning Deep Latent Gaussian Models with Markov Chain Monte Carlo , 2017, ICML.

[23]  Ruslan Salakhutdinov,et al.  On the Quantitative Analysis of Decoder-Based Generative Models , 2016, ICLR.

[24]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[25]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[26]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[27]  Yee Whye Teh,et al.  Filtering Variational Objectives , 2017, NIPS.

[28]  Arnaud Doucet,et al.  Hamiltonian Variational Auto-Encoder , 2018, NeurIPS.

[29]  David Duvenaud,et al.  Inference Suboptimality in Variational Autoencoders , 2018, ICML.

[30]  Ruslan Salakhutdinov,et al.  Learning Stochastic Feedforward Neural Networks , 2013, NIPS.

[31]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Yee Whye Teh,et al.  Tighter Variational Bounds are Not Necessarily Better , 2018, ICML.

[34]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[35]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[36]  David Duvenaud,et al.  Reinterpreting Importance-Weighted Autoencoders , 2017, ICLR.