Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

Deep learning tools have gained tremendous attention in applied machine learning. However such tools for regression and classification do not capture model uncertainty. In comparison, Bayesian models offer a mathematically grounded framework to reason about model uncertainty, but usually come with a prohibitive computational cost. In this paper we develop a new theoretical framework casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes. A direct result of this theory gives us tools to model uncertainty with dropout NNs -- extracting information from existing models that has been thrown away so far. This mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy. We perform an extensive study of the properties of dropout's uncertainty. Various network architectures and non-linearities are assessed on tasks of regression and classification, using MNIST as an example. We show a considerable improvement in predictive log-likelihood and RMSE compared to existing state-of-the-art methods, and finish by using dropout's uncertainty in deep reinforcement learning.

[1]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[2]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[3]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[4]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[5]  Christopher K. I. Williams Computing with Infinite Networks , 1996, NIPS.

[6]  M. Artés Statistical errors. , 1977, Medicina clinica.

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  Charles M. Bishop,et al.  Ensemble learning in Bayesian neural networks , 1998 .

[9]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[10]  Kiyoshi Asai,et al.  Marginalized kernels for biological sequences , 2002, ISMB.

[11]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[12]  Alessio Lomuscio,et al.  Editorial , 2005, J. Appl. Log..

[13]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[14]  Milos Manic,et al.  Neural Network based Intrusion Detection System for critical infrastructures , 2009, 2009 International Joint Conference on Neural Networks.

[15]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[16]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[17]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[18]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[19]  Michael I. Jordan,et al.  Variational Bayesian Inference with Stochastic Search , 2012, ICML.

[20]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[21]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[22]  Naomi S. Altman,et al.  Points of significance: Importance of being uncertain , 2013, Nature Methods.

[23]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[24]  Stefan M. Herzog,et al.  Experimental biology: Sometimes Bayesian statistics are better , 2013, Nature.

[25]  Christopher D. Manning,et al.  Fast dropout training , 2013, ICML.

[26]  Shin-ichi Maeda,et al.  A Bayesian encourages dropout , 2014, ArXiv.

[27]  Miguel Lázaro-Gredilla,et al.  Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[28]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[29]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[30]  P. Baldi,et al.  Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[31]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[32]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[33]  Regina Nuzzo,et al.  Scientific method: Statistical errors , 2014, Nature.

[34]  Sören Bergmann,et al.  On the use of artificial neural networks in simulation-based manufacturing control , 2014, J. Simulation.

[35]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[36]  O. Anjos,et al.  Neural networks applied to discriminate botanical origin of honeys. , 2015, Food chemistry.

[37]  Richard E. Turner,et al.  Improving the Gaussian Process Sparse Spectrum Approximation by Representing Uncertainty in Frequency Inputs , 2015, ICML.

[38]  Zoubin Ghahramani,et al.  Probabilistic machine learning and artificial intelligence , 2015, Nature.

[39]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[40]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[43]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[44]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.