Sequences of Events from the Electronic Medical Record and the Onset of Infection

We present a novel model of time‐series analysis to learn from electronic health record (EHR) data when infection occurred in the intensive care unit (ICU) by translating methods from proteomics and Bayesian statistics. Using 48,536 patients hospitalized in an ICU, we describe each hospital course as an ‘alphabet’ of 23 physician actions (‘events’) in temporal order. We analyze these as k‐mers of length 3–12 events and apply a Bayesian model of (cumulative) relative risk (RR). The log2‐transformed RR (median=0.248, mean=0.226) supported the conclusion that the events selected were individually associated with increased risk of infection. Selecting from all possible cutoffs of maximum gain (MG), MG>0.0244 predicts administration of antibiotics with PPV 82.0 %, NPV 44.4 %, and AUC 0.706. Our approach holds value for retrospective analysis of other clinical syndromes for which time‐of‐onset is critical to analysis but poorly marked in EHRs, including delirium and decompensation.

[1]  Na Hong,et al.  State of the Art of Machine Learning–Enabled Clinical Decision Support in Intensive Care Units: Literature Review , 2022, JMIR medical informatics.

[2]  Dayeong Kim,et al.  Prediction of Bacteremia Based on 12-Year Medical Data Using a Machine Learning Approach: Effect of Medical Data by Extraction Time , 2022, Diagnostics.

[3]  Ilaria Gandin,et al.  Interpretability of time-series deep learning models: A study in cardiovascular patients admitted to Intensive care unit , 2021, J. Biomed. Informatics.

[4]  F. Tuon,et al.  Development and validation of a risk score for predicting positivity of blood cultures and mortality in patients with bacteremia and fungemia , 2021, Brazilian Journal of Microbiology.

[5]  B. Sokhansanj,et al.  Amino Acid k-mer Feature Extraction for Quantitative Antimicrobial Resistance (AMR) Prediction by Machine Learning and Model Interpretation for Biological Insights , 2020, Biology.

[6]  Yuan Wang,et al.  Utilizing imbalanced electronic health records to predict acute kidney injury by ensemble learning and time series model , 2020, BMC Medical Informatics and Decision Making.

[7]  K. Carey,et al.  The Development and Validation of a Machine Learning Model to Predict Bacteremia and Fungemia in Hospitalized Patients Using Electronic Health Record Data , 2020, Critical care medicine.

[8]  Fei Wang,et al.  A Time-Phased Machine Learning Model for Real-Time Prediction of Sepsis in Critical Care , 2020, Critical care medicine.

[9]  B. Allegranzi,et al.  Epidemiology and burden of sepsis acquired in hospitals and intensive care units: a systematic review and meta-analysis , 2020, Intensive Care Medicine.

[10]  Toktam Khatibi,et al.  An intelligent warning model for early prediction of cardiac arrest in sepsis patients , 2019, Comput. Methods Programs Biomed..

[11]  Jonathan H Chen,et al.  Assessing clinical heterogeneity in sepsis through treatment patterns and machine learning , 2019, J. Am. Medical Informatics Assoc..

[12]  Eric Widen,et al.  A Real-Time Early Warning System for Monitoring Inpatient Mortality Risk: Prospective Study Using Electronic Medical Record Data , 2019, Journal of medical Internet research.

[13]  Bo Thiesson,et al.  Early detection of sepsis utilizing deep learning on electronic health record event sequences , 2019, Artif. Intell. Medicine.

[14]  Junchao Ma,et al.  Using the Shapes of Clinical Data Trajectories to Predict Mortality in ICUs , 2019, Critical care explorations.

[15]  Burkhard Morgenstern,et al.  The number of k-mer matches between two DNA sequences as a function of k and applications to estimate phylogenetic distances , 2019, bioRxiv.

[16]  Uli K. Chettipally,et al.  Multicenter validation of a sepsis prediction algorithm using only vital sign data in the emergency department, general ward and ICU , 2018, bioRxiv.

[17]  Shamim Nemati,et al.  An Interpretable Machine Learning Model for Accurate Prediction of Sepsis in the ICU , 2017, Critical care medicine.

[18]  Jonas S. Almeida,et al.  Alignment-free sequence comparison: benefits, applications, and tools , 2017, Genome Biology.

[19]  Leo A. Celi,et al.  The MIMIC Code Repository: enabling reproducibility in critical care research , 2017, J. Am. Medical Informatics Assoc..

[20]  Aram Galstyan,et al.  Multitask learning and benchmarking with clinical time series data , 2017, Scientific Data.

[21]  Sebastian Deorowicz,et al.  KMC 3: counting and manipulating k‐mer statistics , 2017, Bioinform..

[22]  I. Nookaew,et al.  Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer , 2017, Scientific Reports.

[23]  François Laviolette,et al.  Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons , 2016, BMC Genomics.

[24]  V. Torman,et al.  Bayesian models as a unified approach to estimate relative risk (or prevalence ratio) in binary and polytomous outcomes , 2015, Emerging Themes in Epidemiology.

[25]  M. Waterman,et al.  New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing , 2014, Briefings Bioinform..

[26]  Gesine Reinert,et al.  Alignment-Free Sequence Comparison (II): Theoretical Power of Comparison Statistics , 2010, J. Comput. Biol..

[27]  Gesine Reinert,et al.  Alignment-Free Sequence Comparison (I): Statistics and Power , 2009, J. Comput. Biol..

[28]  Se-Ran Jun,et al.  Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions , 2009, Proceedings of the National Academy of Sciences.

[29]  Sander Greenland,et al.  Bayesian perspectives for epidemiological research: I. Foundations and basic methods. , 2006, International journal of epidemiology.