Identifying Sepsis Subphenotypes via Time-Aware Multi-Modal Auto-Encoder

Sepsis is a heterogeneous clinical syndrome that is the leading cause of mortality in hospital intensive care units (ICUs). Identification of sepsis subphenotypes may allow for more precise treatments and lead to more targeted clinical interventions. Recently, sepsis subtyping on electronic health records (EHRs) has attracted interest from healthcare researchers. However, most sepsis subtyping studies ignore the temporality of EHR data and suffer from missing values. In this paper, we propose a new sepsis subtyping framework to address the two issues. Our subtyping framework consists of a novel Time-Aware Multi-modal auto-Encoder (TAME) model which introduces time-aware attention mechanism and incorporates multi-modal inputs (e.g., demographics, diagnoses, medications, lab tests and vital signs) to impute missing values, a dynamic time wrapping (DTW) method to measure patients' temporal similarity based on the imputed EHR data, and a weighted k-means algorithm to cluster patients. Comprehensive experiments on real-world datasets show TAME outperforms the baselines on imputation accuracy. After analyzing TAME-imputed EHR data, we identify four novel subphenotypes of sepsis patients, paving the way for improved personalization of sepsis management.

[1]  Fei Wang,et al.  Data-Driven Subtyping of Parkinson’s Disease Using Longitudinal Clinical Records: A Cohort Study , 2019, Scientific Reports.

[2]  G. Escobar,et al.  Hospital deaths in patients with sepsis from 2 independent cohorts. , 2014, JAMA.

[3]  R. Bellomo,et al.  The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). , 2016, JAMA.

[4]  Edgar Acuña,et al.  The Treatment of Missing Values and its Effect on Classifier Accuracy , 2004 .

[5]  Chao Yan,et al.  Deep Imputation of Temporal Data , 2019, 2019 IEEE International Conference on Healthcare Informatics (ICHI).

[6]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[7]  Estimating Missing Values in Multivariate-Time-Series Clinical Data using Gradient Boosting Tree on Temporal and Cross-variable Features , 2019, 2019 IEEE International Conference on Healthcare Informatics (ICHI).

[8]  Zina M. Ibrahim,et al.  On classifying sepsis heterogeneity in the ICU: insight using machine learning , 2019, J. Am. Medical Informatics Assoc..

[9]  Adil Rafiq Rather,et al.  The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3) , 2015 .

[10]  Jessica S. Ancker,et al.  Identifying sub-phenotypes of acute kidney injury using structured and unstructured electronic health record data with memory networks , 2019, J. Biomed. Informatics.

[11]  Kaworu Ebana,et al.  Multi-task Gaussian process for imputing missing data in multi-trait and multi-environment trials , 2016, Theoretical and Applied Genetics.

[12]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Jeremy C. Weiss,et al.  Derivation, Validation, and Potential Treatment Implications of Novel Clinical Phenotypes for Sepsis. , 2019, JAMA.

[14]  Peter Szolovits,et al.  3D-MICE: integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data , 2017, J. Am. Medical Informatics Assoc..

[15]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  J. Vincent,et al.  The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure , 1996, Intensive Care Medicine.

[17]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[18]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[19]  Fei Wang,et al.  Patient Subtyping via Time-Aware LSTM Networks , 2017, KDD.

[20]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[21]  Wei Cao,et al.  BRITS: Bidirectional Recurrent Imputation for Time Series , 2018, NeurIPS.

[22]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[23]  William E. Strawderman Statistical Analysis with Missing Data (Roderick J. A. Little and Donald B. Rubin) , 1989, SIAM Rev..

[24]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[25]  Kejing Yin,et al.  Context-Aware Imputation for Clinical Time Series , 2019, 2019 IEEE International Conference on Healthcare Informatics (ICHI).

[26]  Qiuling Suo,et al.  Recurrent Imputation for Multivariate Time Series with Missing Values , 2019, 2019 IEEE International Conference on Healthcare Informatics (ICHI).