Novel Uncertainty Framework for Deep Learning Ensembles

Deep neural networks have become the default choice for many of the machine learning tasks such as classification and regression. Dropout, a method commonly used to improve the convergence of deep neural networks, generates an ensemble of thinned networks with extensive weight sharing. Recent studies that dropout can be viewed as an approximate variational inference in Gaussian processes, and used as a practical tool to obtain uncertainty estimates of the network. We propose a novel statistical mechanics based framework to dropout and use this framework to propose a new generic algorithm that focuses on estimates of the variance of the loss as measured by the ensemble of thinned networks. Our approach can be applied to a wide range of deep neural network architectures and machine learning tasks. In classification, this algorithm allows the generation of a don't-know answer to be generated, which can increase the reliability of the classifier. Empirically we demonstrate state-of-the-art AUC results on publicly available benchmarks.

[1]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[2]  Yann LeCun,et al.  Comparing dynamics: deep neural networks versus glassy systems , 2018, ICML.

[3]  Yann LeCun,et al.  The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[4]  Radford M. Neal,et al.  Near Shannon limit performance of low density parity check codes , 1996 .

[5]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[6]  Jascha Sohl-Dickstein,et al.  A Correspondence Between Random Neural Networks and Statistical Field Theory , 2017, ArXiv.

[7]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[8]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[9]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[10]  Blaz Zupan,et al.  Predictive data mining in clinical medicine: Current issues and guidelines , 2008, Int. J. Medical Informatics.

[11]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[12]  M. Kardar Statistical physics of fields , 2007 .

[13]  Paulo J. G. Lisboa,et al.  The Use of Artificial Neural Networks in Decision Support in Cancer: a Systematic Review , 2005 .

[14]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[15]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[16]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[17]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[18]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[19]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[20]  Stephen J. Roberts,et al.  A tutorial on variational Bayesian inference , 2012, Artificial Intelligence Review.

[21]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[22]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.