Variational Inference with Normalizing Flows

The choice of approximate posterior distribution is one of the core problems in variational inference. Most applications of variational inference employ simple families of posterior approximations in order to allow for efficient inference, focusing on mean-field or other simple structured approximations. This restriction has a significant impact on the quality of inferences made using variational methods. We introduce a new approach for specifying flexible, arbitrarily complex and scalable approximate posterior distributions. Our approximations are distributions constructed through a normalizing flow, whereby a simple initial density is transformed into a more complex one by applying a sequence of invertible transformations until a desired level of complexity is attained. We use this view of normalizing flows to develop categories of finite and infinitesimal flows and provide a unified view of approaches for constructing rich posterior approximations. We demonstrate that the theoretical advantages of having posteriors that better match the true posterior, combined with the scalability of amortized variational approaches, provides a clear improvement in performance and applicability of variational inference.

[1]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[2]  C. Cruz,et al.  Improving the Mean Field Approximation via the Use of Mixture Distributions , 1998 .

[3]  A. Genz Methods for Generating Random Orthogonal Matrices , 2000 .

[4]  P. Dayan Helmholtz Machines and Wake-Sleep Learning , 2000 .

[5]  Gareth O. Roberts,et al.  Non-centred parameterisations for hierarchical models and data augmentation. , 2003 .

[6]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[7]  Bo Wang,et al.  Convergence and Asymptotic Normality of Variational Bayesian Approximations for Expon , 2004, UAI.

[8]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[9]  Johan A. K. Suykens,et al.  On-Line Learning Fokker-Planck Machine , 1998, Neural Processing Letters.

[10]  L. Baird,et al.  One-step neural network inversion with PDF learning and emulation , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[11]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[12]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[13]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[14]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[15]  A. Krizhevsky Convolutional Deep Belief Networks on CIFAR-10 , 2010 .

[16]  E. Tabak,et al.  DENSITY ESTIMATION BY DUAL ASCENT OF THE LOG-LIKELIHOOD ∗ , 2010 .

[17]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[18]  Richard E. Turner,et al.  Two problems with variational expectation maximisation for time-series models , 2011 .

[19]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[20]  David Barber,et al.  Affine Independent Variational Inference , 2012, NIPS.

[21]  Ahn,et al.  Bayesian posterior sampling via stochastic gradient Fisher scoring Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[22]  David M. Blei,et al.  Nonparametric variational inference , 2012, ICML.

[23]  David Wingate,et al.  Automated Variational Inference in Probabilistic Programming , 2013, ArXiv.

[24]  Ryan P. Adams,et al.  High-Dimensional Probability Estimation with Deep Density Models , 2013, ArXiv.

[25]  E. Tabak,et al.  A Family of Nonparametric Density Estimation Algorithms , 2013 .

[26]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[27]  Noah D. Goodman,et al.  Learning Stochastic Inverses , 2013, NIPS.

[28]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[29]  Miguel Lázaro-Gredilla,et al.  Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[30]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[31]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[32]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[33]  Daan Wierstra,et al.  Deep AutoRegressive Networks , 2013, ICML.

[34]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[35]  Hugo Larochelle,et al.  A Deep and Tractable Density Estimator , 2013, ICML.

[36]  Noah D. Goodman,et al.  Amortized Inference in Probabilistic Reasoning , 2014, CogSci.

[37]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[38]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[39]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.