Fast Second Order Stochastic Backpropagation for Variational Inference

We propose a second-order (Hessian or Hessian-free) based optimization method for variational inference inspired by Gaussian backpropagation, and argue that quasi-Newton optimization can be developed as well. This is accomplished by generalizing the gradient computation in stochastic backpropagation via a reparametrization trick with lower complexity. As an illustrative example, we apply this approach to the problems of Bayesian logistic regression and variational auto-encoder (VAE). Additionally, we compute bounds on the estimator variance of intractable expectations for the family of Lipschitz continuous function. Our method is practical, scalable and model free. We demonstrate our method on several real-world datasets and provide comparisons with other stochastic gradient methods to show substantial enhancement in convergence rates.

[1]  Robert Price,et al.  A useful theorem for nonlinear devices having Gaussian inputs , 1958, IRE Trans. Inf. Theory.

[2]  T. Ferguson LOCATION AND SCALE PARAMETERS IN EXPONENTIAL FAMILIES OF DISTRIBUTIONS , 1962 .

[3]  G. Bonnet Transformations des signaux aléatoires a travers les systèmes non linéaires sans mémoire , 1964 .

[4]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[5]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[6]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[7]  Jean Charles Gilbert,et al.  Numerical Optimization: Theoretical and Practical Aspects , 2003 .

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[10]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[11]  J. Frédéric Bonnans,et al.  Numerical Optimization: Theoretical and Practical Aspects (Universitext) , 2006 .

[12]  J. Renaud Numerical Optimization, Theoretical and Practical Aspects— , 2006, IEEE Transactions on Automatic Control.

[13]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[14]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[15]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[16]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[17]  Ilya Sutskever,et al.  Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[18]  Neil D. Lawrence,et al.  Fast Variational Inference in the Conjugate Exponential Family , 2012, NIPS.

[19]  Tara N. Sainath,et al.  Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[21]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[22]  Miguel Lázaro-Gredilla,et al.  Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[23]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[24]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[25]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[26]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[27]  Mohammad E. Khan,et al.  Decoupled Variational Gaussian Inference , 2014, NIPS.

[28]  Razvan Pascanu,et al.  Revisiting Natural Gradient for Deep Networks , 2013, ICLR.

[29]  Zhe Gan,et al.  Deep Temporal Sigmoid Belief Networks for Sequence Modeling , 2015, NIPS.

[30]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[31]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[32]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.