Second-Moment Loss: A Novel Regression Objective for Improved Uncertainties

Quantification of uncertainty is one of the most promising approaches to establish safe machine learning. Despite its importance, it is far from being generally solved, especially for neural networks. One of the most commonly used approaches so far is Monte Carlo dropout, which is computationally cheap and easy to apply in practice. However, it can underestimate the uncertainty. We propose a new objective, referred to as second-moment loss (SML), to address this issue. While the full network is encouraged to model the mean, the dropout networks are explicitly used to optimize the model variance. We analyze the performance of the new objective on various toy and UCI regression datasets. Comparing to the state-of-the-art of deep ensembles, SML leads to comparable prediction accuracies and uncertainty estimates while only requiring a single model. Under distribution shift, we observe moderate improvements. From a safety perspective also the study of worst-case uncertainties is crucial. In this regard we improve considerably. Finally, we show that SML can be successfully applied to SqueezeDet, a modern object detection network. We improve on its uncertainty-related scores while not deteriorating regression quality. As a side result, we introduce an intuitive Wasserstein distance-based uncertainty measure that is non-saturating and thus allows to resolve quality differences between any two uncertainty estimates.

[1]  Martial Hebert,et al.  An Uncertain Future: Forecasting from Static Images Using Variational Autoencoders , 2016, ECCV.

[2]  Stefano Ermon,et al.  Accurate Uncertainties for Deep Learning Using Calibrated Regression , 2018, ICML.

[3]  Xiangyu Zhang,et al.  Bounding Box Regression With Uncertainty for Accurate Object Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[6]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[7]  Silvio Savarese,et al.  Data-driven 3D Voxel Patterns for object category recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[9]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[12]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[13]  Forrest N. Iandola,et al.  SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[15]  Martin Lauer,et al.  Capturing Object Detection Uncertainty in Multi-Layer Grid Maps , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[16]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[17]  R. Rockafellar,et al.  Conditional Value-at-Risk for General Loss Distributions , 2001 .

[18]  Federico Tombari,et al.  Sampling-Free Epistemic Uncertainty Estimation Using Approximated Variance Propagation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[20]  A. Weigend,et al.  Estimating the mean and variance of the target probability distribution , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[21]  H. Landau Sampling, data transmission, and the Nyquist rate , 1967 .

[22]  M. Stephens EDF Statistics for Goodness of Fit and Some Comparisons , 1974 .

[23]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[24]  Alex Kendall,et al.  Concrete Dropout , 2017, NIPS.

[25]  Tom Heskes,et al.  Practical Confidence and Prediction Intervals , 1996, NIPS.

[26]  Amir F. Atiya,et al.  Comprehensive Review of Neural Network-Based Prediction Intervals and New Advances , 2011, IEEE Transactions on Neural Networks.

[27]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[28]  Sidong Liu,et al.  Early diagnosis of Alzheimer's disease with deep learning , 2014, 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI).

[29]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[30]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[31]  Tomoharu Iwata,et al.  Improving Output Uncertainty Estimation and Generalization in Deep Learning via Neural Network Gaussian Processes , 2017, 1707.05922.

[32]  Elliot Meyerson,et al.  Quantifying Point-Prediction Uncertainty in Neural Networks via Residual Estimation with an I/O Kernel , 2020, ICLR.

[33]  Mark J. F. Gales,et al.  Predictive Uncertainty Estimation via Prior Networks , 2018, NeurIPS.

[34]  Balaji Lakshminarayanan,et al.  Deep Ensembles: A Loss Landscape Perspective , 2019, ArXiv.

[35]  Thomas B. Schön,et al.  Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[36]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[37]  Thomas Brox,et al.  Uncertainty Estimates and Multi-hypotheses Networks for Optical Flow , 2018, ECCV.

[38]  Gustavo Carneiro,et al.  Probabilistic Object Detection: Definition and Evaluation , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[39]  David Barber,et al.  A Scalable Laplace Approximation for Neural Networks , 2018, ICLR.

[40]  Niko Sünderhauf,et al.  Dropout Sampling for Robust Object Detection in Open-Set Conditions , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[42]  Bernt Schiele,et al.  Long-Term On-board Prediction of People in Traffic Scenes Under Uncertainty , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.