Assumed Density Filtering Methods for Learning Bayesian Neural Networks

Buoyed by the success of deep multilayer neural networks, there is renewed interest in scalable learning of Bayesian neural networks. Here, we study algorithms that utilize recent advances in Bayesian inference to efficiently learn distributions over network weights. In particular, we focus on recently proposed assumed density filtering based methods for learning Bayesian neural networks – Expectation and Probabilistic backpropagation. Apart from scaling to large datasets, these techniques seamlessly deal with non-differentiable activation functions and provide parameter (learning rate, momentum) free learning. In this paper, we first rigorously compare the two algorithms and in the process develop several extensions, including a version of EBP for continuous regression problems and a PBP variant for binary classification. Next, we extend both algorithms to deal with multiclass classification and count regression problems. On a variety of diverse real world benchmarks, we find our extensions to be effective, achieving results competitive with the state-of-the-art.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Michael I. Jordan,et al.  Variational Bayesian Inference with Stochastic Search , 2012, ICML.

[3]  Masashi Sugiyama,et al.  Bayesian Dark Knowledge , 2015 .

[4]  G. M. El-Sayyad,et al.  Bayesian and Classical Analysis of Poisson Regression , 1973 .

[5]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[6]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[7]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[8]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[9]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[10]  Heike Freud,et al.  On Line Learning In Neural Networks , 2016 .

[11]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[12]  Mohammad Emtiyaz Khan,et al.  Variational learning for latent Gaussian model of discrete data , 2012 .

[13]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[14]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[15]  Vivek Rathod,et al.  Bayesian dark knowledge , 2015, NIPS.

[16]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[17]  Manfred Opper,et al.  A Bayesian approach to on-line learning , 1999 .

[18]  Michael Biehl,et al.  On-line Learning in Neural Networks , 1998 .

[19]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[20]  Ron Meir,et al.  Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights , 2014, NIPS.

[21]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[22]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[23]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[24]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[25]  Daniel Soudry,et al.  Training Binary Multilayer Neural Networks for Image Classification using Expectation Backpropagation , 2015, ArXiv.

[26]  Hadi Fanaee-T,et al.  Event labeling combining ensemble detectors and background knowledge , 2014, Progress in Artificial Intelligence.

[27]  Aki Vehtari,et al.  Expectation propagation for neural networks with sparsity-promoting priors , 2013, J. Mach. Learn. Res..

[28]  Nuno Vasconcelos,et al.  Bayesian Poisson regression for crowd counting , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Miguel Lázaro-Gredilla,et al.  Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[31]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[32]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[33]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[34]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..