Improving predictive uncertainty estimation using Dropout–Hamiltonian Monte Carlo

Estimating predictive uncertainty is crucial for many computer vision tasks, from image classification to autonomous driving systems. Hamiltonian Monte Carlo (HMC) is an sampling method for performing Bayesian inference. On the other hand, Dropout regularization has been proposed as an approximate model averaging technique that tends to improve generalization in large-scale models such as deep neural networks. Although HMC provides convergence guarantees for most standard Bayesian models, it do not handle discrete parameters arising from Dropout regularization. In this paper, we present a robust methodology for improving predictive uncertainty in classification problems, based on Dropout and HMC. Even though Dropout induces a non-smooth energy function with no such convergence guarantees, the resulting discretization of the Hamiltonian proves empirical success. The proposed method allows to effectively estimate the predictive accuracy and to provide better generalization for difficult test examples.

[1]  Max Welling,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS 2015.

[2]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[3]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[4]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[5]  Rico Sennrich,et al.  Regularization techniques for fine-tuning in neural machine translation , 2017, EMNLP.

[6]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[7]  Liam Paninski,et al.  Auxiliary-variable Exact Hamiltonian Monte Carlo Samplers for Binary Distributions , 2013, NIPS.

[8]  Dustin Tran,et al.  Deep Probabilistic Programming , 2017, ICLR.

[9]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[10]  Pierre Baldi,et al.  The dropout learning algorithm , 2014, Artif. Intell..

[11]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[12]  Siegfried Wahl,et al.  Leveraging uncertainty information from deep neural networks for disease detection , 2016, Scientific Reports.

[13]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[14]  Marcelo Pereyra,et al.  Proximal Markov chain Monte Carlo algorithms , 2013, Statistics and Computing.

[15]  Simon J. D. Prince,et al.  Computer Vision: Index , 2012 .

[16]  Jean-Yves Tourneret,et al.  A Hamiltonian Monte Carlo Method for Non-Smooth Energy Sampling , 2014, IEEE Transactions on Signal Processing.

[17]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[18]  Simon J. D. Prince,et al.  Computer Vision: Models, Learning, and Inference , 2012 .

[19]  Yoshua Bengio,et al.  An empirical analysis of dropout in piecewise linear networks , 2013, ICLR.

[20]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[21]  Arnaud Doucet,et al.  Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach , 2014, ICML.

[22]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[23]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[25]  J. M. Sanz-Serna,et al.  Optimal tuning of the hybrid Monte Carlo algorithm , 2010, 1001.4460.

[26]  G. Roberts,et al.  Langevin Diffusions and Metropolis-Hastings Algorithms , 2002 .

[27]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[28]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[29]  Justin Domke,et al.  Reflection, Refraction, and Hamiltonian Monte Carlo , 2015, NIPS.

[30]  Nando de Freitas,et al.  Adaptive Hamiltonian and Riemann Manifold Monte Carlo , 2013, ICML.

[31]  Tal Hassner,et al.  Age and Gender Estimation of Unfiltered Faces , 2014, IEEE Transactions on Information Forensics and Security.

[32]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[33]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[34]  Andreas Griewank,et al.  Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.

[35]  Tal Hassner,et al.  Age and gender classification using convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[36]  Pierre Baldi,et al.  Understanding Dropout , 2013, NIPS.

[37]  Matthew D. Hoffman,et al.  Learning Deep Latent Gaussian Models with Markov Chain Monte Carlo , 2017, ICML.

[38]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[39]  J. Rosenthal,et al.  Optimal scaling of discrete approximations to Langevin diffusions , 1998 .

[40]  Jianfeng Lu,et al.  Discontinuous Hamiltonian Monte Carlo for models with discrete parameters and discontinuous likelihoods , 2017 .