Stochastic Back-propagation and Variational Inference in Deep Latent Gaussian Models

We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent approximate posterior distributions, and that acts as a stochastic encoder of the data. We develop stochastic backpropagation – rules for back-propagation through stochastic variables – and use this to develop an algorithm that allows for joint optimisation of the parameters of both the generative and recognition model. We demonstrate on several real-world data sets that the model generates realistic samples, provides accurate imputations of missing data and is a useful tool for high-dimensional data visualisation.

[1]  Robert Price,et al.  A useful theorem for nonlinear devices having Gaussian inputs , 1958, IRE Trans. Inf. Theory.

[2]  G. Bonnet Transformations des signaux aléatoires a travers les systèmes non linéaires sans mémoire , 1964 .

[3]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[4]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[5]  R. Zemel A minimum description length framework for unsupervised learning , 1994 .

[6]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[7]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..

[8]  Brendan J. Frey,et al.  Variational Learning in Nonlinear Gaussian Belief Networks , 1999, Neural Computation.

[9]  P. Dayan Helmholtz Machines and Wake-Sleep Learning , 2000 .

[10]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[11]  Stan Lipovetsky,et al.  Latent Variable Models and Factor Analysis , 2001, Technometrics.

[12]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[13]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[14]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[15]  Manfred Opper,et al.  The Variational Gaussian Approximation Revisited , 2009, Neural Computation.

[16]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[17]  Malik Magdon-Ismail,et al.  Approximating the Covariance Matrix of GMMs with Low-Rank Perturbations , 2010, IDEAL.

[18]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[19]  Hugo Larochelle,et al.  The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[20]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[21]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[22]  Ahn Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[23]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[24]  Pascal Vincent,et al.  Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.

[25]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[26]  Daan Wierstra,et al.  Deep AutoRegressive Networks , 2013, ICML.

[27]  Hugo Larochelle,et al.  A Deep and Tractable Density Estimator , 2013, ICML.