Sparse multi-output Gaussian processes for online medical time series prediction

Background For real-time monitoring of hospital patients, high-quality inference of patients’ health status using all information available from clinical covariates and lab test results is essential to enable successful medical interventions and improve patient outcomes. Developing a computational framework that can learn from observational large-scale electronic health records (EHRs) and make accurate real-time predictions is a critical step. In this work, we develop and explore a Bayesian nonparametric model based on multi-output Gaussian process (GP) regression for hospital patient monitoring. Methods We propose MedGP, a statistical framework that incorporates 24 clinical covariates and supports a rich reference data set from which relationships between observed covariates may be inferred and exploited for high-quality inference of patient state over time. To do this, we develop a highly structured sparse GP kernel to enable tractable computation over tens of thousands of time points while estimating correlations among clinical covariates, patients, and periodicity in patient observations. MedGP has a number of benefits over current methods, including (i) not requiring an alignment of the time series data, (ii) quantifying confidence regions in the predictions, (iii) exploiting a vast and rich database of patients, and (iv) inferring interpretable relationships among clinical covariates. Results We evaluate and compare results from MedGP on the task of online prediction for three patient subgroups from two medical data sets across 8,043 patients. We find MedGP improves online prediction over baseline and state-of-the-art methods for nearly all covariates across different disease subgroups and hospitals. Conclusions The MedGP framework is robust and efficient in estimating the temporal dependencies from sparse and irregularly sampled medical time series data for online prediction. The publicly available code is at https://github.com/bee-hive/MedGP .

[1]  Neil D. Lawrence,et al.  Computationally Efficient Convolved Multiple Output Gaussian Processes , 2011, J. Mach. Learn. Res..

[2]  Suchi Saria,et al.  Integrative Analysis using Coupled Latent Variable Models for Individualizing Prognoses , 2016, J. Mach. Learn. Res..

[3]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[4]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[5]  David B. Dunson,et al.  Generalized Beta Mixtures of Gaussians , 2011, NIPS.

[6]  Benjamin M. Marlin,et al.  Unsupervised pattern discovery in electronic health care data using probabilistic clustering models , 2012, IHI '12.

[7]  T. Murdoch,et al.  The inevitable application of big data to health care. , 2013, JAMA.

[8]  G. Clermont,et al.  Epidemiology of severe sepsis in the United States: Analysis of incidence, outcome, and associated costs of care , 2001, Critical care medicine.

[9]  C. Newgard,et al.  Missing Data: How to Best Account for What Is Not Known. , 2015, JAMA.

[10]  Suchi Saria,et al.  Clustering Longitudinal Clinical Marker Trajectories from Electronic Health Data: Applications to Phenotyping and Endotype Discovery , 2015, AAAI.

[11]  James M. Blum,et al.  Temporal Features and Kernel Methods for Predicting Sepsis in Postoperative Patients , 2010 .

[12]  Neil D. Lawrence,et al.  Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[13]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[14]  Ryan P. Adams,et al.  Bayesian Online Changepoint Detection , 2007, 0710.3742.

[15]  Carl E. Rasmussen,et al.  Gaussian Process Change Point Models , 2010, ICML.

[16]  David A. Clifton,et al.  Multitask Gaussian Processes for Multivariate Physiological Time-Series Analysis , 2015, IEEE Transactions on Biomedical Engineering.

[17]  Ryan P. Adams,et al.  Discovering shared cardiovascular dynamics within a patient cohort , 2012, 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[18]  Joydeep Ghosh,et al.  Septic Shock Prediction for Patients with Missing Data , 2014, TMIS.

[19]  Dimitris Rizopoulos,et al.  A Bayesian semiparametric multivariate joint model for multiple longitudinal outcomes and a time‐to‐event , 2011, Statistics in medicine.

[20]  A. Vander,et al.  Vander, Sherman, Luciano's Human Physiology: The Mechanisms of Body Function , 2003 .

[21]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[22]  Miguel Lázaro-Gredilla,et al.  Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning , 2011, NIPS.

[23]  P. Pronovost,et al.  A targeted real-time early warning score (TREWScore) for septic shock , 2015, Science Translational Medicine.

[24]  Christopher D. Brown,et al.  A latent factor model with a mixture of sparse and dense factors to model gene expression data with confounding effects , 2013, 1310.4792.

[25]  R. Hotchkiss,et al.  The pathophysiology and treatment of sepsis. , 2003, The New England journal of medicine.

[26]  Peter Szolovits,et al.  A Multivariate Timeseries Modeling Approach to Severity of Illness Assessment and Forecasting in ICU with Sparse, Heterogeneous Clinical Data , 2015, AAAI.

[27]  Oliver Stegle,et al.  Gaussian Process Robust Regression for Noisy Heart Rate Data , 2008, IEEE Transactions on Biomedical Engineering.

[28]  T. Lasko,et al.  Computational Phenotype Discovery Using Unsupervised Feature Learning over Noisy, Sparse, and Irregular Clinical Data , 2013, PloS one.

[29]  J. Vincent,et al.  Sepsis biomarkers: a review , 2010, Critical care.

[30]  S Roberts,et al.  Gaussian processes for time-series modelling , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[31]  Kai Li,et al.  Large Linear Multi-output Gaussian Process Learning for Time Series , 2017 .

[32]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[33]  Yvonne Freer,et al.  A Hierarchical Switching Linear Dynamical System Applied to the Detection of Sepsis in Neonatal Condition Monitoring , 2014, UAI.

[34]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[35]  G. Kumar,et al.  Nationwide trends of severe sepsis in the 21st century (2000-2007). , 2011, Chest.

[36]  Suchi Saria,et al.  A Framework for Individualizing Predictions of Disease Trajectories by Exploiting Multi-Resolution Structure , 2015, NIPS.

[37]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[38]  Timothy C. Coburn,et al.  Geostatistics for Natural Resources Evaluation , 2000, Technometrics.

[39]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[40]  Ursula Gather,et al.  Graphical models for multivariate time series from intensive care monitoring , 2002, Statistics in medicine.

[41]  T. Clasen,et al.  Impact of Indian Total Sanitation Campaign on Latrine Coverage and Use: A Cross-Sectional Study in Orissa Three Years following Programme Implementation , 2013, PloS one.

[42]  Edwin V. Bonilla,et al.  Collaborative Multi-output Gaussian Processes , 2014, UAI.

[43]  Shamim Nemati,et al.  A Physiological Time Series Dynamics-Based Approach to Patient Monitoring and Outcome Prediction , 2014, IEEE Journal of Biomedical and Health Informatics.

[44]  Yee Whye Teh,et al.  Semiparametric latent factor models , 2005, AISTATS.

[45]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[46]  Sayan Mukherjee,et al.  Bayesian group factor analysis with structured sparsity , 2016, J. Mach. Learn. Res..

[47]  M Nalos,et al.  Phenotypical analysis of peripheral human T lymphocytes in early sepsis , 2010, Critical Care.

[48]  M. Ghassemi,et al.  State of the art review: the data revolution in critical care , 2015, Critical Care.

[49]  R. Dahlhaus Graphical interaction models for multivariate time series1 , 2000 .

[50]  J. Griffin Human Physiology, The Mechanisms of Body Function , 1971 .

[51]  James G. Scott,et al.  Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction , 2022 .

[52]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[53]  Emily B. Fox,et al.  Bayesian Structure Learning for Stationary Time Series , 2015, UAI.

[54]  George Hripcsak,et al.  Next-generation phenotyping of electronic health records , 2012, J. Am. Medical Informatics Assoc..