UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data

Successful health risk prediction demands accuracy and reliability of the model. Existing predictive models mainly depend on mining electronic health records (EHR) with advanced deep learning techniques to improve model accuracy. However, they all ignore the importance of publicly available online health data, especially socioeconomic status, environmental factors, and detailed demographic information for each location, which are all strong predictive signals and can definitely augment precision medicine. To achieve model reliability, the model needs to provide accurate prediction and uncertainty score of the prediction. However, existing uncertainty estimation approaches often failed in handling high-dimensional data, which are present in multi-sourced data. To fill the gap, we propose UNcertaInTy-based hEalth risk prediction (UNITE) model. Building upon an adaptive multimodal deep kernel and a stochastic variational inference module, UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data including EHR data, patient demographics, and public health data collected from the web. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer’s disease (AD). UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to 19% over the best baseline. We also show UNITE can model meaningful uncertainties and can provide evidence-based clinical support by clustering similar patients.

[1]  Andrew Gordon Wilson,et al.  Thoughts on Massively Scalable Gaussian Processes , 2015, ArXiv.

[2]  Deli Zhao,et al.  Scalable Gaussian Process Regression Using Deep Neural Networks , 2015, IJCAI.

[3]  B. Madrazo,et al.  Diagnosis of Nonalcoholic Steatohepatitis Without Liver Biopsy. , 2017, Gastroenterology & hepatology.

[4]  Andrew Gordon Wilson,et al.  Gaussian Process Regression Networks , 2011, ICML.

[5]  Fenglong Ma,et al.  A Multi-task Framework for Monitoring Health Conditions via Attention-based Recurrent Neural Networks , 2017, AMIA.

[6]  Thomas Lukasiewicz,et al.  Deep Bayesian Gaussian processes for uncertainty estimation in electronic health records , 2020, Scientific Reports.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Katherine A. Heller,et al.  Learning to Treat Sepsis with Multi-Output Gaussian Process Deep Recurrent Q-Networks , 2018 .

[9]  Kai Li,et al.  Sparse multi-output Gaussian processes for online medical time series prediction , 2020, BMC Medical Informatics and Decision Making.

[10]  Katherine A. Heller,et al.  An Improved Multi-Output Gaussian Process RNN with Real-Time Validation for Early Sepsis Detection , 2017, MLHC.

[11]  Yasha Wang,et al.  ConCare: Personalized Clinical Feature Embedding via Capturing the Healthcare Context , 2019, AAAI.

[12]  Qinghua Zheng,et al.  An Interpretable Fast Model for Predicting The Risk of Heart Failure , 2019, SDM.

[13]  John M. Starr,et al.  Environmental risk factors for dementia: a systematic review , 2016, BMC Geriatrics.

[14]  Katherine A. Heller,et al.  Learning to Detect Sepsis with a Multitask Gaussian Process RNN Classifier , 2017, ICML.

[15]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[16]  Andreas Spanias,et al.  Attend and Diagnose: Clinical Time Series Analysis using Attention Models , 2017, AAAI.

[17]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[18]  Dustin Tran,et al.  Bayesian Layers: A Module for Neural Network Uncertainty , 2018, NeurIPS.

[19]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[20]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[21]  Geoffrey E. Hinton,et al.  Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes , 2007, NIPS.

[22]  Haitao Liu,et al.  When Gaussian Process Meets Big Data: A Review of Scalable GPs , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Le Song,et al.  GRAM: Graph-based Attention Model for Healthcare Representation Learning , 2016, KDD.

[24]  Eunho Yang,et al.  Deep Mixed Effect Model Using Gaussian Processes: A Personalized and Reliable Prediction for Healthcare , 2020, AAAI.

[25]  Andrew Gordon Wilson,et al.  Stochastic Variational Deep Kernel Learning , 2016, NIPS.

[26]  Jimeng Sun,et al.  MiME: Multilevel Medical Embedding of Electronic Health Records for Predictive Healthcare , 2018, NeurIPS.

[27]  Walter F. Stewart,et al.  Doctor AI: Predicting Clinical Events via Recurrent Neural Networks , 2015, MLHC.

[28]  Carl E. Rasmussen,et al.  Manifold Gaussian Processes for regression , 2014, 2016 International Joint Conference on Neural Networks (IJCNN).

[29]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[30]  Jimeng Sun,et al.  RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism , 2016, NIPS.

[31]  Il-Chul Moon,et al.  Diagnosis Prediction via Medical Context Attention Networks Using Deep Generative Modeling , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[32]  Fei Wang,et al.  Patient Subtyping via Time-Aware LSTM Networks , 2017, KDD.

[33]  Buyue Qian,et al.  INPREM: An Interpretable and Trustworthy Predictive Model for Healthcare , 2020, KDD.

[34]  Hedvig Kjellström,et al.  Advances in Variational Inference , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Marc Peter Deisenroth,et al.  Doubly Stochastic Variational Inference for Deep Gaussian Processes , 2017, NIPS.

[36]  Fenglong Ma,et al.  KAME: Knowledge-based Attention Model for Diagnosis Prediction in Healthcare , 2018, CIKM.

[37]  Qinghua Zheng,et al.  KnowRisk: An Interpretable Knowledge-Guided Model for Disease Risk Prediction , 2019, 2019 IEEE International Conference on Data Mining (ICDM).

[38]  Charles Elkan,et al.  Learning to Diagnose with LSTM Recurrent Neural Networks , 2015, ICLR.

[39]  Fenglong Ma,et al.  Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks , 2017, KDD.

[40]  Ognjen Rudovic,et al.  Personalized Gaussian Processes for Future Prediction of Alzheimer's Disease Progression , 2017, ArXiv.

[41]  Yujia Li,et al.  Learning the Graphical Structure of Electronic Health Records with Graph Convolutional Transformer , 2020, AAAI.

[42]  Jeremy Nixon,et al.  Analyzing the role of model uncertainty for electronic health records , 2019, CHIL.

[43]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[44]  J. Misdraji,et al.  Secondary causes of nonalcoholic fatty liver disease , 2012, Therapeutic advances in gastroenterology.

[45]  Xing Xie,et al.  CAMP: Co-Attention Memory Networks for Diagnosis Prediction in Healthcare , 2019, 2019 IEEE International Conference on Data Mining (ICDM).