Quantifying Aleatoric and Epistemic Uncertainty Using Density Estimation in Latent Space

The distribution of a neural network's latent representations has been successfully used to detect Out-of-Distribution (OOD) data. Since OOD detection denotes a popular benchmark for epistemic uncertainty estimates, this raises the question of a deeper correlation. This work investigates whether the distribution of latent representations indeed contains information about the uncertainty associated with the predictions of a neural network. Prior work identifies epistemic uncertainty with the surprise, thus the negative log-likelihood, of observing a particular latent representation, which we verify empirically. Moreover, we demonstrate that the output-conditional distribution of hidden representations allows quantifying aleatoric uncertainty via the entropy of the predictive distribution. We analyze epistemic and aleatoric uncertainty inferred from the representations of different layers and conclude with the exciting finding that the hidden repesentations of a deterministic neural network indeed contain information about its uncertainty. We verify our findings on both classification and regression models.

[1]  A. Kiureghian,et al.  Aleatory or epistemic? Does it matter? , 2009 .

[2]  Patrick D. McDaniel,et al.  Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning , 2018, ArXiv.

[3]  Ullrich Köthe,et al.  Guided Image Generation with Conditional Invertible Neural Networks , 2019, ArXiv.

[4]  S. Srihari Mixture Density Networks , 1994 .

[5]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[6]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[8]  Sebastian Nowozin,et al.  How Good is the Bayes Posterior in Deep Neural Networks Really? , 2020, ICML.

[9]  Roland Siegwart,et al.  The Fishyscapes Benchmark: Measuring Blind Spots in Semantic Segmentation , 2019, International Journal of Computer Vision.

[10]  Alexander A. Alemi,et al.  Density of States Estimation for Out-of-Distribution Detection , 2020, ArXiv.

[11]  Ivica Crnkovic,et al.  Safety for mobile robotic systems: A systematic mapping study from a software engineering perspective , 2019, J. Syst. Softw..

[12]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[13]  Guodong Zhang,et al.  Noisy Natural Gradient as Variational Inference , 2017, ICML.

[14]  Alexander Binder,et al.  Deep One-Class Classification , 2018, ICML.

[15]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[16]  Václav Smídl,et al.  Are generative deep models for novelty detection truly better? , 2018, ArXiv.

[17]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[18]  Seong Joon Oh,et al.  Modeling Uncertainty with Hedged Instance Embedding , 2018, ICLR 2018.

[19]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[20]  Federico Tombari,et al.  Sampling-Free Epistemic Uncertainty Estimation Using Approximated Variance Propagation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[22]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[24]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[25]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[26]  Kibok Lee,et al.  Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples , 2017, ICLR.

[27]  Georg Langs,et al.  Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery , 2017, IPMI.

[28]  Daphna Weinshall,et al.  Distance-based Confidence Score for Neural Network Classifiers , 2017, ArXiv.

[29]  Mark J. F. Gales,et al.  Predictive Uncertainty Estimation via Prior Networks , 2018, NeurIPS.

[30]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[31]  Sebastian Ramos,et al.  Lost and Found: detecting small road hazards for self-driving vehicles , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[32]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[33]  Yee Whye Teh,et al.  Do Deep Generative Models Know What They Don't Know? , 2018, ICLR.

[34]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[35]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[36]  Eyke Hüllermeier,et al.  Reliable classification: Learning classifiers that distinguish aleatoric and epistemic uncertainty , 2014, Inf. Sci..

[37]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[38]  Andreas Geiger,et al.  Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art , 2017, Found. Trends Comput. Graph. Vis..

[39]  David A. Clifton,et al.  A review of novelty detection , 2014, Signal Process..

[40]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[41]  Raghavendra Chalapathy University of Sydney,et al.  Deep Learning for Anomaly Detection: A Survey , 2019, ArXiv.

[42]  Ian Osband,et al.  Risk versus Uncertainty in Deep Learning: Bayes, Bootstrap and the Dangers of Dropout , 2016 .

[43]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[44]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[45]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[46]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Wolfram Burgard,et al.  The limits and potentials of deep learning for robotics , 2018, Int. J. Robotics Res..

[49]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[50]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[51]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[52]  Richard E. Harang,et al.  Towards Principled Uncertainty Estimation for Deep Neural Networks , 2018, 1810.12278.

[53]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[54]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[55]  Alexander A. Alemi,et al.  WAIC, but Why? Generative Ensembles for Robust Anomaly Detection , 2018 .

[56]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[57]  Yee Whye Teh,et al.  Simple and Scalable Epistemic Uncertainty Estimation Using a Single Deep Deterministic Neural Network , 2020, ICML.

[58]  Yarin Gal,et al.  Dropout Inference in Bayesian Neural Networks with Alpha-divergences , 2017, ICML.

[59]  Alexander A. Alemi,et al.  Uncertainty in the Variational Information Bottleneck , 2018, ArXiv.

[60]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[61]  Eric T. Nalisnick,et al.  Detecting Out-of-Distribution Inputs to Deep Generative Models Using Typicality , 2019 .

[62]  Kevin Smith,et al.  Bayesian Uncertainty Estimation for Batch Normalized Deep Networks , 2018, ICML.

[63]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[64]  See-Kiong Ng,et al.  Anomaly Detection with Generative Adversarial Networks for Multivariate Time Series , 2018, ArXiv.

[65]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[67]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[68]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.