Deep Bayesian Gaussian processes for uncertainty estimation in electronic health records

One major impediment to the wider use of deep learning for clinical decision making is the difficulty of assigning a level of confidence to model predictions. Currently, deep Bayesian neural networks and sparse Gaussian processes are the main two scalable uncertainty estimation methods. However, deep Bayesian neural network suffers from lack of expressiveness, and more expressive models such as deep kernel learning, which is an extension of sparse Gaussian process, captures only the uncertainty from the higher level latent space. Therefore, the deep learning model under it lacks interpretability and ignores uncertainty from the raw data. In this paper, we merge features of the deep Bayesian learning framework with deep kernel learning to leverage the strengths of both methods for more comprehensive uncertainty estimation. Through a series of experiments on predicting the first incidence of heart failure, diabetes and depression applied to large-scale electronic medical records, we demonstrate that our method is better at capturing uncertainty than both Gaussian processes and deep Bayesian neural networks in terms of indicating data insufficiency and distinguishing true positive and false positive predictions, with a comparable generalisation performance. Furthermore, by assessing the accuracy and area under the receiver operating characteristic curve over the predictive probability, we show that our method is less susceptible to making overconfident predictions, especially for the minority class in imbalanced datasets. Finally, we demonstrate how uncertainty information derived by the model can inform risk factor analysis towards model interpretability.

[1]  O L Wade,et al.  Prescribing and the British National Formulary , 1966, British medical journal.

[2]  Pia Hardelid,et al.  Data Resource Profile: Hospital Episode Statistics Admitted Patient Care (HES APC) , 2017, International journal of epidemiology.

[3]  Marcus Liwicki,et al.  A Comprehensive guide to Bayesian Convolutional Neural Network with Variational Inference , 2019, ArXiv.

[4]  Amir H. Payberah,et al.  Predicting the risk of emergency admission with machine learning: Development and validation using linked electronic health records , 2018, PLoS medicine.

[5]  Kazem Rahimi,et al.  BEHRT: Transformer for Electronic Health Records , 2019, Scientific Reports.

[6]  M. Rosenblatt A CENTRAL LIMIT THEOREM AND A STRONG MIXING CONDITION. , 1956, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Li Li,et al.  Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records , 2016, Scientific Reports.

[8]  Kathryn M Refshauge,et al.  Chronic low back pain and the risk of depression or anxiety symptoms: insights from a longitudinal twin study. , 2017, The spine journal : official journal of the North American Spine Society.

[9]  K. Bhaskaran,et al.  Data Resource Profile: Clinical Practice Research Datalink (CPRD) , 2015, International journal of epidemiology.

[10]  Soumya Ghosh,et al.  Quality of Uncertainty Quantification for Bayesian Neural Network Inference , 2019, ArXiv.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Spiros C. Denaxas,et al.  A chronological map of 308 physical and mental health conditions from 4 million individuals in the English National Health Service , 2019, The Lancet. Digital health.

[13]  Hude Quan,et al.  Implementation of ICD-10 in Canada: how has it impacted coded hospital discharge data? , 2012, BMC Health Services Research.

[14]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[15]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[16]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[19]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[20]  Haitao Liu,et al.  When Gaussian Process Meets Big Data: A Review of Scalable GPs , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Simone Fischer-Hübner,et al.  Privacy-Enhancing Technologies , 2009, Encyclopedia of Database Systems.

[22]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[23]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[24]  Walter F. Stewart,et al.  Doctor AI: Predicting Clinical Events via Recurrent Neural Networks , 2015, MLHC.

[25]  Carl E. Rasmussen,et al.  Rates of Convergence for Sparse Variational Gaussian Process Regression , 2019, ICML.

[26]  Dit-Yan Yeung,et al.  Towards Bayesian Deep Learning: A Survey , 2016, ArXiv.

[27]  M. Mildner,et al.  Re-epithelialization and immune cell behaviour in an ex vivo human skin model , 2020, Scientific Reports.

[28]  Ling Liu,et al.  Encyclopedia of Database Systems , 2009, Encyclopedia of Database Systems.

[29]  Willem Waegeman,et al.  Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods , 2019, Machine Learning.

[30]  Darren Lunn,et al.  Data resource profile: Clinical Practice Research Datalink (CPRD) Aurum , 2019, International journal of epidemiology.

[31]  Andrew Gordon Wilson,et al.  GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration , 2018, NeurIPS.

[32]  Guozhong An,et al.  The Effects of Adding Noise During Backpropagation Training on a Generalization Performance , 1996, Neural Computation.

[33]  Gunnar Rätsch,et al.  Improving Clinical Predictions through Unsupervised Time Series Representation Learning , 2018, ArXiv.

[34]  Jimeng Sun,et al.  RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism , 2016, NIPS.

[35]  Richard E. Turner,et al.  A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation , 2016, J. Mach. Learn. Res..

[36]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Hedvig Kjellström,et al.  Advances in Variational Inference , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[39]  Marc Peter Deisenroth,et al.  Doubly Stochastic Variational Inference for Deep Gaussian Processes , 2017, NIPS.

[40]  Harry Hemingway,et al.  Temporal trends and patterns in heart failure incidence: a population-based study of 4 million individuals , 2017, The Lancet.

[41]  Jeremy Nixon,et al.  Analyzing the role of model uncertainty for electronic health records , 2019, CHIL.

[42]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[43]  Svetha Venkatesh,et al.  $\mathtt {Deepr}$: A Convolutional Net for Medical Records , 2016, IEEE Journal of Biomedical and Health Informatics.

[44]  Q. Wang Probability distribution and entropy as a measure of uncertainty , 2006, cond-mat/0612076.

[45]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[46]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[47]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[48]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[49]  S. Coughlin,et al.  Anxiety and Depression: Linkages with Viral Diseases , 2012, Public Health Reviews.