Validation of Prediction Models for Critical Care Outcomes Using Natural Language Processing of Electronic Health Record Data

Key Points Question Can a prediction model for mortality in the intensive care unit be improved by using more laboratory values, vital signs, and clinical text in electronic health records? Findings In this cohort study of 101 196 patients in the intensive care unit, a machine learning–based model using all available measurements of vital signs and laboratory values, plus clinical text, exhibited good calibration and discrimination in predicting in-hospital mortality, yielding an area under the receiver operating characteristic curve of 0.922. Meaning Applying methods from machine learning and natural language processing to information already routinely collected in electronic health records, including laboratory test results, vital signs, and clinical free-text notes, significantly improves a prediction model for mortality in the intensive care unit compared with approaches that use only the most abnormal vital sign and laboratory values.

[1]  L. Ungar,et al.  Inclusion of Unstructured Clinical Text Improves Early Prediction of Death or Prolonged ICU Stay* , 2018, Critical care medicine.

[2]  Spencer S. Jones,et al.  Development and Evaluation of an Automated Machine Learning Algorithm for In-Hospital Mortality Risk Adjustment Among Critical Care Patients* , 2018, Critical care medicine.

[3]  Omar Badawi,et al.  Evaluation of ICU Risk Models Adapted for Use as Continuous Markers of Severity of Illness Throughout the ICU Stay* , 2018, Critical care medicine.

[4]  Jeffrey Dean,et al.  Scalable and accurate deep learning with electronic health records , 2018, npj Digital Medicine.

[5]  Michael O Harhay,et al.  Discriminative Accuracy of Physician and Nurse Predictions for Survival and Functional Outcomes 6 Months After an ICU Admission , 2017, JAMA.

[6]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[7]  Ben J. Marafino,et al.  Efficient and sparse feature selection for biomedical text classification via the elastic net: Application to ICU risk stratification from nursing notes , 2015, J. Biomed. Informatics.

[8]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[9]  G. Collins,et al.  Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement , 2015, Annals of Internal Medicine.

[10]  Ben J. Marafino,et al.  Research and applications: N-gram support vector machines for scalable procedure and diagnosis classification, with applications to clinical free text data from the intensive care unit , 2014, J. Am. Medical Informatics Assoc..

[11]  William Rose,et al.  Practical implementation of an existing smoking detection pipeline and reduced support vector machine training corpus requirements , 2014, J. Am. Medical Informatics Assoc..

[12]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[13]  Omar Badawi,et al.  Severity scoring in the critically ill: part 2: maximizing value from outcome prediction scoring systems. , 2012, Chest.

[14]  Omar Badawi,et al.  Severity scoring in the critically ill: part 1--interpretation and accuracy of outcome prediction scoring systems. , 2012, Chest.

[15]  Mohammed Saeed,et al.  Risk Stratification of ICU Patients Using Topic Models Inferred from Unstructured Progress Notes , 2012, AMIA.

[16]  Marianthi Markatou,et al.  Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection , 2011, J. Am. Medical Informatics Assoc..

[17]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18]  S. Mani,et al.  Extracting and integrating data from entire electronic health records for detecting colorectal cancer cases. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[19]  J. Vincent,et al.  Clinical review: Scoring systems in the critically ill , 2010, Critical care.

[20]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[21]  Mitzi L. Dean,et al.  Variation in ICU risk-adjusted mortality: impact of methods of assessment and potential confounders. , 2008, Chest.

[22]  D. Wagner,et al.  Veterans Affairs intensive care unit risk adjustment model: Validation, updating, recalibration* , 2008, Critical care medicine.

[23]  J. Zimmerman,et al.  Assessing the calibration of mortality benchmarks in critical care: The Hosmer-Lemeshow test revisited* , 2007, Critical care medicine.

[24]  D. Teres,et al.  Assessing contemporary intensive care unit outcome: An updated Mortality Probability Admission Model (MPM0-III)* , 2007, Critical care medicine.

[25]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[26]  J. Zimmerman,et al.  Acute Physiology and Chronic Health Evaluation (APACHE) IV: Hospital mortality assessment for today’s critically ill patients* , 2006, Critical care medicine.

[27]  Holger J Schünemann,et al.  Mortality predictions in the intensive care unit: Comparing physicians with scoring systems* , 2006, Critical care medicine.

[28]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[29]  Richard Simon,et al.  Bias in error estimation when using cross-validation for model selection , 2006, BMC Bioinformatics.

[30]  Peter Bauer,et al.  SAPS 3—From evaluation of the patient to evaluation of the intensive care unit. Part 2: Development of a prognostic model for hospital mortality at ICU admission , 2005, Intensive Care Medicine.

[31]  J. le Gall,et al.  SAPS 3—From evaluation of the patient to evaluation of the intensive care unit. Part 1: Objectives, methods and cohort description , 2005, Intensive Care Medicine.

[32]  Rodney X. Sturdivant,et al.  Applied Logistic Regression: Hosmer/Applied Logistic Regression , 2005 .

[33]  T. Higgins,et al.  SAPS 3--From evaluation of the patient to evaluation of the intensive care unit. Part 1: Objectives, methods and cohort description. , 2005 .

[34]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[35]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[36]  D. Wagner,et al.  Automated intensive care unit risk adjustment: results from a National Veterans Affairs study. , 2003, Critical care medicine.

[37]  J. Austin,et al.  Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. , 2002, Radiology.

[38]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[39]  M. Kollef,et al.  Automated computerized intensive care unit severity of illness measure in the Department of Veterans Affairs: Preliminary results , 2000, Critical care medicine.

[40]  K. Rowan,et al.  Outcome data and scoring systems , 1999, BMJ.

[41]  S. Lemeshow,et al.  A new Simplified Acute Physiology Score (SAPS II) based on a European/North American multicenter study. , 1993, JAMA.

[42]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .