Dropout Inference in Bayesian Neural Networks with Alpha-divergences

To obtain uncertainty estimates with real-world Bayesian deep learning models, practical inference approximations are needed. Dropout variational inference (VI) for example has been used for machine vision and medical applications, but VI can severely underestimates model uncertainty. Alpha-divergences are alternative divergences to VI's KL objective, which are able to avoid VI's uncertainty underestimation. But these are hard to use in practice: existing techniques can only use Gaussian approximating distributions, and require existing models to be changed radically, thus are of limited use for practitioners. We propose a reparametrisation of the alpha-divergence objectives, deriving a simple inference technique which, together with dropout, can be easily implemented with existing models by simply changing the loss of the model. We demonstrate improved uncertainty estimates and accuracy compared to VI in dropout networks. We study our model's epistemic uncertainty far away from the data using adversarial images, showing that these can be distinguished from non-adversarial images by examining our model's uncertainty.

[1]  E. Hellinger,et al.  Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen. , 1909 .

[2]  A. Rényi On Measures of Entropy and Information , 1961 .

[3]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[4]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[5]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[6]  Yann LeCun,et al.  Transforming Neural-Net Output Levels to Probability Distributions , 1990, NIPS.

[7]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[8]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[9]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[10]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[11]  Charles M. Bishop,et al.  Ensemble learning in Bayesian neural networks , 1998 .

[12]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[13]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[14]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[15]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[16]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[17]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[18]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[19]  Richard E. Turner,et al.  Two problems with variational expectation maximisation for time-series models , 2011 .

[20]  Evgueni A. Haroutunian,et al.  Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.

[21]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[24]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[25]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[26]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[27]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[28]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[29]  G. Crooks On Measures of Entropy and Information , 2015 .

[30]  Zoubin Ghahramani,et al.  Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference , 2015, ArXiv.

[31]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[32]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[33]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[34]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[35]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Richard E. Turner,et al.  Stochastic Expectation Propagation , 2015, NIPS.

[37]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Rico Sennrich,et al.  Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[39]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[40]  Richard E. Turner,et al.  Rényi Divergence Variational Inference , 2016, NIPS.

[41]  Daniel Hernández-Lobato,et al.  Deep Gaussian Processes for Regression using Approximate Expectation Propagation , 2016, ICML.

[42]  Xiao Yang,et al.  Fast Predictive Image Registration , 2016, LABELS/DLMIA@MICCAI.

[43]  Richard E. Turner,et al.  Black-box α-divergence minimization , 2016, ICML 2016.

[44]  Daniel Hernández-Lobato,et al.  Black-Box Alpha Divergence Minimization , 2015, ICML.

[45]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[46]  Patrick D. McDaniel,et al.  Cleverhans V0.1: an Adversarial Machine Learning Library , 2016, ArXiv.

[47]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[48]  Finale Doshi-Velez,et al.  Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks , 2016, ICLR.

[49]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[50]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).