Validation of a Derived International Patient Severity Phenotype to Support COVID-19 Analytics from Electronic Health Record Data

Introduction. The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) includes hundreds of hospitals internationally using a federated computational approach to COVID-19 research using the EHR. Objective. We sought to develop and validate a standard definition of COVID-19 severity from readily accessible EHR data across the Consortium. Methods. We developed an EHR-based severity algorithm and validated it on patient hospitalization data from 12 4CE clinical sites against the outcomes of ICU admission and/or death. We also used a machine learning approach to compare selected predictors of severity to the 4CE algorithm at one site. Results. The 4CE severity algorithm performed with pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of single code categories for acuity were unacceptably inaccurate - varying by up to 0.65 across sites. A multivariate machine learning approach identified codes resulting in mean AUC 0.956 (95% CI: 0.952, 0.959) compared to 0.903 (95% CI: 0.886, 0.921) using expert-derived codes. Billing codes were poor proxies of ICU admission, with 49% precision and recall compared against chart review at one partner institution. Discussion. We developed a proxy measure of severity that proved resilient to coding variability internationally by using a set of 6 code classes. In contrast, machine-learning approaches may tend to overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold standard outcomes, possibly due to pandemic conditions. Conclusion. We developed an EHR-based algorithm for COVID-19 severity and validated it at 12 international sites.

[1]  Kavishwar B. Wagholikar,et al.  Transitive Sequencing Medical Records for Mining Predictive and Interpretable Temporal Representations , 2020, Patterns.

[2]  Brendan Delaney,et al.  The science of Learning Health Systems: Foundations for a new journal , 2016, Learning health systems.

[3]  Johan Gustav Bellika,et al.  The Learning Healthcare System: Where are we now? A systematic review , 2016, J. Biomed. Informatics.

[4]  Torsten Hothorn,et al.  Model-based Boosting 2.0 , 2010, J. Mach. Learn. Res..

[5]  World Health Organization Clinical management of severe acute respiratory infection (SARI) when COVID-19 disease is suspected. Interim guidance , 2020 .

[6]  Peter Szolovits,et al.  Enabling phenotypic big data with PheNorm , 2018, J. Am. Medical Informatics Assoc..

[7]  Q. Guo,et al.  Modified IDSA/ATS Minor Criteria for Severe Community-Acquired Pneumonia Best Predicted Mortality , 2015, Medicine.

[8]  Jeffrey G. Klann,et al.  International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium , 2020, npj Digital Medicine.

[9]  Stephen B. Johnson,et al.  A review of approaches to identifying patient phenotype cohorts using electronic health records , 2013, J. Am. Medical Informatics Assoc..

[10]  Francis S. Collins,et al.  PCORnet: turning a dream into reality , 2014, J. Am. Medical Informatics Assoc..

[11]  Shawn N. Murphy,et al.  Transitive Sequential Pattern Mining for Discrete Clinical Data , 2020, AIME.

[12]  Kenneth D. Mandl,et al.  Data interchange using i2b2 , 2016, J. Am. Medical Informatics Assoc..

[13]  Griffin M. Weber,et al.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) , 2010, J. Am. Medical Informatics Assoc..

[14]  Shyam Visweswaran,et al.  Accrual to Clinical Trials (ACT): A Clinical and Translational Science Award Consortium Network , 2018, JAMIA open.

[15]  Harry Hemingway,et al.  Use of electronic health records to ascertain, validate and phenotype acute myocardial infarction: A systematic review and recommendations. , 2015, International journal of cardiology.

[16]  Torsten Hothorn,et al.  Model-Based Boosting [R package mboost version 2.9-4] , 2020 .

[17]  I. Kohane,et al.  Biases in electronic health record data due to processes within the healthcare system: retrospective observational study , 2018, British Medical Journal.

[18]  M J Tobin,et al.  Advances in mechanical ventilation. , 2001, The New England journal of medicine.

[19]  Yu-Chuan Li,et al.  Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers , 2015, MedInfo.

[20]  K. Yuen,et al.  Clinical Characteristics of Coronavirus Disease 2019 in China , 2020, The New England journal of medicine.

[21]  Gerard Tromp,et al.  Design patterns for the development of electronic health record-driven phenotype extraction algorithms , 2014, J. Biomed. Informatics.

[22]  G. Guyatt,et al.  Treatment of patients with nonsevere and severe coronavirus disease 2019: an evidence-based guideline , 2020, Canadian Medical Association Journal.

[23]  Melissa A. Basford,et al.  Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[24]  George Hripcsak,et al.  Next-generation phenotyping of electronic health records , 2012, J. Am. Medical Informatics Assoc..