Nonstationary multivariate Gaussian processes for electronic health records

Advances in the modeling and analysis of electronic health records (EHR) have the potential to improve patient risk stratification, leading to better patient outcomes. The modeling of complex temporal relations across the multiple clinical variables inherent in EHR data is largely unexplored. Existing approaches to modeling EHR data often lack the flexibility to handle time-varying correlations across multiple clinical variables, or they are too complex for clinical interpretation. Therefore, we propose a novel nonstationary multivariate Gaussian process model for EHR data to address the aforementioned drawbacks of existing methodologies. Our proposed model is able to capture time-varying scale, correlation and smoothness across multiple clinical variables. We also provide details on two inference approaches: Maximum a Posteriori and Hamilton Monte Carlo. Our model is validated on synthetic data and then we demonstrate its effectiveness on EHR data from Kaiser Permanente Division of Research (KPDOR). Finally, we use the KPDOR EHR data to investigate the relationships between a clinical patient risk metric and the latent processes of our proposed model and demonstrate statistically significant correlations between these entities.

[1]  M. P. Griffin,et al.  Increased Nonstationarity of Neonatal Heart Rate Before the Clinical Diagnosis of Sepsis , 2004, Annals of Biomedical Engineering.

[2]  Jonathan H Chen,et al.  Assessing clinical heterogeneity in sepsis through treatment patterns and machine learning , 2019, J. Am. Medical Informatics Assoc..

[3]  Ronald P. Barry,et al.  Constructing and fitting models for cokriging and multivariable spatial prediction , 1998 .

[4]  Mark J. Schervish,et al.  Nonstationary Covariance Functions for Gaussian Process Regression , 2003, NIPS.

[5]  Alan E. Gelfand,et al.  Model choice: A minimum posterior predictive loss approach , 1998, AISTATS.

[6]  Fenglong Ma,et al.  HiTANet: Hierarchical Time-Aware Attention Networks for Risk Prediction on Electronic Health Records , 2020, KDD.

[7]  T. Gneiting,et al.  Matérn Cross-Covariance Functions for Multivariate Random Fields , 2010 .

[8]  David A. Clifton,et al.  Multi-task Gaussian process models for biomedical applications , 2014, IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI).

[9]  T. Lasko,et al.  Computational Phenotype Discovery Using Unsupervised Feature Learning over Noisy, Sparse, and Irregular Clinical Data , 2013, PloS one.

[10]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[11]  Patricia Kipnis,et al.  Risk-adjusting Hospital Mortality Using a Comprehensive Electronic Record in an Integrated Health Care Delivery System , 2013, Medical care.

[12]  Tatiyana V. Apanasovich,et al.  Cross-covariance functions for multivariate random fields based on latent dimensions , 2010 .

[13]  Aristeidis Samitas,et al.  Financial crises and stock market contagion in a multivariate time-varying asymmetric framework , 2011 .

[14]  M. Goulard,et al.  Linear coregionalization model: Tools for estimation and choice of cross-variogram matrix , 1992 .

[15]  Juho Rousu,et al.  Non-Stationary Gaussian Process Regression with Hamiltonian Monte Carlo , 2015, AISTATS.

[16]  G. Maddala,et al.  A Comparative Study of Unit Root Tests with Panel Data and a New Simple Test , 1999 .

[17]  Denis Marcotte,et al.  Multivariable variogram and its application to the linear model of coregionalization , 1991 .

[18]  G. Escobar,et al.  Hospital deaths in patients with sepsis from 2 independent cohorts. , 2014, JAMA.

[19]  C. F. Sirmans,et al.  Nonstationary multivariate process modeling through spatially varying coregionalization , 2004 .

[20]  A. Gelman Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) , 2004 .

[21]  Nathan Shapiro,et al.  Shock Index and Early Recognition of Sepsis in the Emergency Department: Pilot Study , 2013, The western journal of emergency medicine.

[22]  Dongha Lee,et al.  Harmonized representation learning on dynamic EHR graphs , 2020, J. Biomed. Informatics.

[23]  C. Rhee,et al.  The CMS Sepsis Mandate: Right Disease, Wrong Measure , 2016, Annals of Internal Medicine.

[24]  J. Moorman,et al.  Vital signs and their cross-correlation in sepsis and NEC: A study of 1065 very low birth weight infants in two NICUs , 2016, Pediatric Research.

[25]  Wei Dong,et al.  Endpoint prediction of heart failure using electronic health records , 2020, J. Biomed. Informatics.

[26]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[27]  Ahmed M. Alaa,et al.  Personalized Risk Scoring for Critical Care Prognosis Using Mixtures of Gaussian Processes , 2016, IEEE Transactions on Biomedical Engineering.

[28]  Andrew O. Finley,et al.  Modeling Complex Spatial Dependencies: Low-Rank Spatially Varying Cross-Covariances With Application to Soil Nutrient Data , 2013 .

[29]  Fenglong Ma,et al.  Risk Prediction on Electronic Health Records with Prior Medical Knowledge , 2018, KDD.

[30]  Benjamin M. Marlin,et al.  A scalable end-to-end Gaussian process adapter for irregularly sampled time series classification , 2016, NIPS.

[31]  Peter Szolovits,et al.  A Multivariate Timeseries Modeling Approach to Severity of Illness Assessment and Forecasting in ICU with Sparse, Heterogeneous Clinical Data , 2015, AAAI.

[32]  Alexander Ilin,et al.  Variational Gaussian-process factor analysis for modeling spatio-temporal data , 2009, NIPS.

[33]  Noel A Cressie,et al.  Statistics for Spatio-Temporal Data , 2011 .

[34]  Katherine A. Heller,et al.  Learning to Detect Sepsis with a Multitask Gaussian Process RNN Classifier , 2017, ICML.

[35]  Zhiming Luo,et al.  Joint imbalanced classification and feature selection for hospital readmissions , 2020, Knowl. Based Syst..

[36]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[37]  T. Rea,et al.  Assessment of Clinical Criteria for Sepsis: For the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). , 2016, JAMA.

[38]  Thomas A Lasko,et al.  Nonstationary Gaussian Process Regression for Evaluating Clinical Laboratory Test Sampling Strategies. , 2015, Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence.

[39]  Murali Haran,et al.  Computer Model Calibration with Multivariate Spatial Output : A Case Study , 2010 .

[40]  Marc Moss,et al.  The effect of age on the development and outcome of adult sepsis* , 2006, Critical care medicine.

[41]  Kai Li,et al.  Sparse Multi-Output Gaussian Processes for Medical Time Series Prediction , 2017 .

[42]  M. Tester,et al.  Growth curve registration for evaluating salinity tolerance in barley , 2017, Plant Methods.

[43]  N. Cressie,et al.  Universal cokriging under intrinsic coregionalization , 1994 .

[44]  Katherine A. Heller,et al.  An Improved Multi-Output Gaussian Process RNN with Real-Time Validation for Early Sepsis Detection , 2017, MLHC.

[45]  Mihaela van der Schaar,et al.  Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes , 2017, NIPS.

[46]  Suchi Saria,et al.  Clustering Longitudinal Clinical Marker Trajectories from Electronic Health Data: Applications to Phenotyping and Endotype Discovery , 2015, AAAI.

[47]  M. Genton,et al.  Functional Boxplots , 2011 .

[48]  Ronald P. Barry,et al.  Flexible Spatial Models for Kriging and Cokriging Using Moving Averages and the Fast Fourier Transform (FFT) , 2004 .

[49]  George Hripcsak,et al.  Parameterizing time in electronic health record studies , 2015, J. Am. Medical Informatics Assoc..

[50]  Nigam H. Shah,et al.  Implications of non-stationarity on predictive modeling using EHRs , 2015, J. Biomed. Informatics.