Stochastic Backpropagation and Approximate Inference in Deep Generative Models

We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent approximate posterior distributions, and that acts as a stochastic encoder of the data. We develop stochastic back-propagation -- rules for back-propagation through stochastic variables -- and use this to develop an algorithm that allows for joint optimisation of the parameters of both the generative and recognition model. We demonstrate on several real-world data sets that the model generates realistic samples, provides accurate imputations of missing data and is a useful tool for high-dimensional data visualisation.

[1]  Robert Price,et al.  A useful theorem for nonlinear devices having Gaussian inputs , 1958, IRE Trans. Inf. Theory.

[2]  G. Bonnet Transformations des signaux aléatoires a travers les systèmes non linéaires sans mémoire , 1964 .

[3]  References , 1971 .

[4]  James R. Wilson Variance Reduction Techniques for Digital Simulation , 1984 .

[5]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[6]  R. Zemel A minimum description length framework for unsupervised learning , 1994 .

[7]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[8]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[9]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..

[10]  Brendan J. Frey,et al.  Variational Learning in Nonlinear Gaussian Belief Networks , 1999, Neural Computation.

[11]  Antti Honkela,et al.  Bayesian Non-Linear Independent Component Analysis by Multi-Layer Perceptrons , 2000 .

[12]  P. Dayan Helmholtz Machines and Wake-Sleep Learning , 2000 .

[13]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[14]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[15]  Antti Honkela,et al.  Unsupervised Variational Bayesian Learning of Nonlinear Models , 2004, NIPS.

[16]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[17]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[18]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[19]  Manfred Opper,et al.  The Variational Gaussian Approximation Revisited , 2009, Neural Computation.

[20]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[21]  Malik Magdon-Ismail,et al.  Approximating the Covariance Matrix of GMMs with Low-Rank Perturbations , 2010, IDEAL.

[22]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[23]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[24]  Hugo Larochelle,et al.  The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[25]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[26]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[27]  Ahn,et al.  Bayesian posterior sampling via stochastic gradient Fisher scoring Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[28]  Max Welling,et al.  Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012, ICML.

[29]  Michael I. Jordan,et al.  Variational Bayesian Inference with Stochastic Search , 2012, ICML.

[30]  Malik Magdon-Ismail,et al.  Approximating the covariance matrix of GMMs with low-rank perturbations , 2010, Int. J. Data Min. Model. Manag..

[31]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[32]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[33]  Pascal Vincent,et al.  Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.

[34]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[35]  Yoshua Bengio,et al.  Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.

[36]  Daan Wierstra,et al.  Deep AutoRegressive Networks , 2013, ICML.

[37]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[38]  Hugo Larochelle,et al.  A Deep and Tractable Density Estimator , 2013, ICML.

[39]  Noah D. Goodman,et al.  Amortized Inference in Probabilistic Reasoning , 2014, CogSci.

[40]  David A. Knowles,et al.  On Using Control Variates with Stochastic Approximation for Variational Bayes and its Connection to Stochastic Linear Regression , 2014, 1401.1022.