Analyzing the role of model uncertainty for electronic health records

In medicine, both ethical and monetary costs of incorrect predictions can be significant, and the complexity of the problems often necessitates increasingly complex models. Recent work has shown that changing just the random seed is enough for otherwise well-tuned deep neural networks to vary in their individual predicted probabilities. In light of this, we investigate the role of model uncertainty methods in the medical domain. Using RNN ensembles and various Bayesian RNNs, we show that population-level metrics, such as AUC-PR, AUC-ROC, log-likelihood, and calibration error, do not capture model uncertainty. Meanwhile, the presence of significant variability in patient-specific predictions and optimal decisions motivates the need for capturing model uncertainty. Understanding the uncertainty for individual patients is an area with clear clinical impact, such as determining when a model decision is likely to be brittle. We further show that RNNs with only Bayesian embeddings can be a more efficient way to capture model uncertainty compared to ensembles, and we analyze how model uncertainty is impacted across individual input features and patient subgroups.

[1]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[2]  Dustin Tran,et al.  BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning , 2020, ICLR.

[3]  Brian H Rowe,et al.  Comparison of the Canadian CT Head Rule and the New Orleans Criteria in patients with minor head injury. , 2005, JAMA.

[4]  Oriol Vinyals,et al.  Bayesian Recurrent Neural Networks , 2017, ArXiv.

[5]  R Gillon,et al.  Medical ethics: four principles plus attention to scope , 1994, BMJ.

[6]  F. Knight The economic nature of the firm: From Risk, Uncertainty, and Profit , 2009 .

[7]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[8]  Jeffrey Dean,et al.  Scalable and accurate deep learning with electronic health records , 2018, npj Digital Medicine.

[9]  Dustin Tran,et al.  Bayesian Layers: A Module for Neural Network Uncertainty , 2018, NeurIPS.

[10]  Marion Smits,et al.  External validation of the Canadian CT Head Rule and the New Orleans Criteria for CT scanning in patients with minor head injury. , 2005, JAMA.

[11]  Jimeng Sun,et al.  RAIM: Recurrent Attentive and Intensive Model of Multimodal Patient Monitoring Data , 2018, KDD.

[12]  Alistair E. W. Johnson,et al.  The eICU Collaborative Research Database, a freely available multi-center database for critical care research , 2018, Scientific Data.

[13]  Max Welling,et al.  Multiplicative Normalizing Flows for Variational Bayesian Neural Networks , 2017, ICML.

[14]  Jeremy Nixon,et al.  Measuring Calibration in Deep Learning , 2019, CVPR Workshops.

[15]  Dustin Tran,et al.  Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors , 2018, ArXiv.

[16]  Andrew Gordon Wilson,et al.  The Case for Bayesian Deep Learning , 2020, ArXiv.

[17]  Jimeng Sun,et al.  MiME: Multilevel Medical Embedding of Electronic Health Records for Predictive Healthcare , 2018, NeurIPS.

[18]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[19]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[20]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  M BleiDavid,et al.  Automatic differentiation variational inference , 2017 .

[23]  Michael W. Dusenberry,et al.  Artificial neural networks: Predicting head CT findings in elderly patients presenting with minor head injury after a fall☆ , 2017, The American journal of emergency medicine.

[24]  M. Howell,et al.  Ensuring Fairness in Machine Learning to Advance Health Equity , 2018, Annals of Internal Medicine.

[25]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[26]  Teri A Reynolds A Tunisian, a Canadian, and an American walk into a bar (sustaining mild head injury). , 2013, Annals of emergency medicine.

[27]  George A Wells,et al.  The Canadian CT Head Rule for patients with minor head injury , 2001, The Lancet.

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Dustin Tran,et al.  Automatic Differentiation Variational Inference , 2016, J. Mach. Learn. Res..

[30]  Mark J. F. Gales,et al.  Predictive Uncertainty Estimation via Prior Networks , 2018, NeurIPS.

[31]  Aram Galstyan,et al.  Multitask learning and benchmarking with clinical time series data , 2017, Scientific Data.

[32]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[33]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[34]  ScienceDirect,et al.  Author's response to letter: Can on-admission anemia predict severe COVID-19 cases? A discussion about statistical and clinical significance , 2021, The American Journal of Emergency Medicine.

[35]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.