Uncertainty estimation for classification and risk prediction in medical settings

In a data-scarce field such as healthcare, where models often deliver predictions on patients with rare conditions, the ability to measure the uncertainty of a model's prediction could potentially lead to improved effectiveness of decision support tools and increased user trust. This work advances the understanding of uncertainty estimation for classification and risk prediction on medical tabular data, in a three-fold way. First, we analyze two families of promising methods and discuss the preferred approach for uncertainty estimation for classification and risk prediction. Second, these remarks are enriched by considerations of the interplay of uncertainty estimation with class imbalance, post-modeling calibration and other modeling procedures. Finally, we expand and refine the set of heuristics to select an uncertainty estimation technique, introducing tests for clinically-relevant scenarios such as generalization to uncommon pathologies, changes in clinical protocol and simulations of corrupted data. These findings are supported by an array of experiments on toy and real-world data

[1]  Maya R. Gupta,et al.  To Trust Or Not To Trust A Classifier , 2018, NeurIPS.

[2]  Aidan N. Gomez,et al.  Benchmarking Bayesian Deep Learning with Diabetic Retinopathy Diagnosis , 2019 .

[3]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[4]  Giovanni Ciná,et al.  Bayesian Modelling in Practice: Using Uncertainty to Improve Trustworthiness in Medical Applications , 2019, ArXiv.

[5]  G. Collins,et al.  Prediction models for cardiovascular disease risk in the general population: systematic review , 2016, British Medical Journal.

[6]  Siegfried Wahl,et al.  Leveraging uncertainty information from deep neural networks for disease detection , 2016, Scientific Reports.

[7]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[8]  Yan Liu,et al.  Benchmark of Deep Learning Models on Large Healthcare MIMIC Datasets , 2017, ArXiv.

[9]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[10]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[11]  Fabio Roli,et al.  Analysis of error-reject trade-off in linearly combined multiple classifiers , 2004, Pattern Recognit..

[12]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[13]  Eyke Hüllermeier,et al.  Reliable classification: Learning classifiers that distinguish aleatoric and epistemic uncertainty , 2014, Inf. Sci..

[14]  Igor Kononenko,et al.  Reliable Classifications with Machine Learning , 2002, ECML.

[15]  Aram Galstyan,et al.  Multitask learning and benchmarking with clinical time series data , 2017, Scientific Data.

[16]  Christopher M. Bishop,et al.  Novelty detection and neural network validation , 1994 .

[17]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[18]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[19]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[20]  A. Kiureghian,et al.  Aleatory or epistemic? Does it matter? , 2009 .

[21]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[22]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[23]  Yarin Gal,et al.  A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks , 2019, ArXiv.

[24]  Willem Waegeman,et al.  Aleatoric and Epistemic Uncertainty in Machine Learning: A Tutorial Introduction , 2019, ArXiv.

[25]  Rowena J Dolor,et al.  Effectiveness-based guidelines for the prevention of cardiovascular disease in women--2011 update: a guideline from the American Heart Association. , 2011, Journal of the American College of Cardiology.

[26]  Sankaran Mahadevan,et al.  Quantification of Aleatoric and Epistemic Uncertainty in Computational Models of Complex Systems , 2011 .

[27]  Monique W. M. Jaspers,et al.  Effects of clinical decision-support systems on practitioner performance and patient outcomes: a synthesis of high-quality systematic review findings , 2011, J. Am. Medical Informatics Assoc..

[29]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..