Impact of Different Approaches to Preparing Notes for Analysis With Natural Language Processing on the Performance of Prediction Models in Intensive Care

Supplemental Digital Content is available in the text. OBJECTIVES: To evaluate whether different approaches in note text preparation (known as preprocessing) can impact machine learning model performance in the case of mortality prediction ICU. DESIGN: Clinical note text was used to build machine learning models for adults admitted to the ICU. Preprocessing strategies studied were none (raw text), cleaning text, stemming, term frequency-inverse document frequency vectorization, and creation of n-grams. Model performance was assessed by the area under the receiver operating characteristic curve. Models were trained and internally validated on University of California San Francisco data using 10-fold cross validation. These models were then externally validated on Beth Israel Deaconess Medical Center data. SETTING: ICUs at University of California San Francisco and Beth Israel Deaconess Medical Center. SUBJECTS: Ten thousand patients in the University of California San Francisco training and internal testing dataset and 27,058 patients in the external validation dataset, Beth Israel Deaconess Medical Center. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: Mortality rate at Beth Israel Deaconess Medical Center and University of California San Francisco was 10.9% and 7.4%, respectively. Data are presented as area under the receiver operating characteristic curve (95% CI) for models validated at University of California San Francisco and area under the receiver operating characteristic curve for models validated at Beth Israel Deaconess Medical Center. Models built and trained on University of California San Francisco data for the prediction of inhospital mortality improved from the raw note text model (AUROC, 0.84; CI, 0.80–0.89) to the term frequency-inverse document frequency model (AUROC, 0.89; CI, 0.85–0.94). When applying the models developed at University of California San Francisco to Beth Israel Deaconess Medical Center data, there was a similar increase in model performance from raw note text (area under the receiver operating characteristic curve at Beth Israel Deaconess Medical Center: 0.72) to the term frequency-inverse document frequency model (area under the receiver operating characteristic curve at Beth Israel Deaconess Medical Center: 0.83). CONCLUSIONS: Differences in preprocessing strategies for note text impacted model discrimination. Completing a preprocessing pathway including cleaning, stemming, and term frequency-inverse document frequency vectorization resulted in the preprocessing strategy with the greatest improvement in model performance. Further study is needed, with particular emphasis on how to manage author implicit bias present in note text, before natural language processing algorithms are implemented in the clinical setting.

[1]  Anna Goldenberg,et al.  Patient safety and quality improvement: Ethical principles for a regulatory approach to bias in healthcare machine learning , 2020, J. Am. Medical Informatics Assoc..

[2]  Federico Cabitza,et al.  The Revival of the Notes Field: Leveraging the Unstructured Content in Electronic Health Records , 2019, Front. Med..

[3]  A. Naidech,et al.  A natural language processing algorithm to extract characteristics of subdural hematoma from head CT reports , 2019, Emergency Radiology.

[4]  Ben J. Marafino,et al.  Validation of Prediction Models for Critical Care Outcomes Using Natural Language Processing of Electronic Health Record Data , 2018, JAMA network open.

[5]  Yuan Luo,et al.  Big Data and Data Science in Critical Care. , 2018, Chest.

[6]  L. Ungar,et al.  Inclusion of Unstructured Clinical Text Improves Early Prediction of Death or Prolonged ICU Stay* , 2018, Critical care medicine.

[7]  R. Kozol,et al.  Predicting Mortality in the Surgical Intensive Care Unit Using Artificial Intelligence and Natural Language Processing of Physician Documentation , 2018, The American surgeon.

[8]  Chun-Nan Hsu,et al.  Identifying and characterizing highly similar notes in big clinical note datasets , 2018, J. Biomed. Informatics.

[9]  Ben J. Marafino,et al.  Accurate and interpretable intensive care risk adjustment for fused clinical data with generalized additive models , 2018, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[10]  K. Lillemoe,et al.  Measuring Processes of Care in Palliative Surgery: A Novel Approach Using Natural Language Processing , 2017, Annals of surgery.

[11]  Tianxi Cai,et al.  Large-scale identification of patients with cerebral aneurysms using natural language processing , 2016, Neurology.

[12]  Ding cheng Li,et al.  Natural language processing: use in EBM and a guide for appraisal , 2016, Evidence-Based Medicine.

[13]  Wen-wai Yim,et al.  Natural Language Processing in Oncology: A Review. , 2016, JAMA oncology.

[14]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[15]  Dingcheng Li,et al.  Toward a Learning Health-care System – Knowledge Delivery at the Point of Care Empowered by Big Data and NLP , 2016, Biomedical informatics insights.

[16]  C. Blinderman,et al.  Comfort Care for Patients Dying in the Hospital. , 2015, The New England journal of medicine.

[17]  Ben J. Marafino,et al.  Efficient and sparse feature selection for biomedical text classification via the elastic net: Application to ICU risk stratification from nursing notes , 2015, J. Biomed. Informatics.

[18]  Ben J. Marafino,et al.  Research and applications: N-gram support vector machines for scalable procedure and diagnosis classification, with applications to clinical free text data from the intensive care unit , 2014, J. Am. Medical Informatics Assoc..

[19]  Justin M. Weis,et al.  Copy, paste, and cloned notes in electronic health records: prevalence, benefits, risks, and best practice recommendations. , 2014, Chest.

[20]  Elsayed M. Saad,et al.  Toward an ARABIC Stop-Words List Generation , 2012 .

[21]  Cui Tao,et al.  Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis , 2012, J. Am. Medical Informatics Assoc..

[22]  Fred L. Drake,et al.  Python 3 Reference Manual , 2009 .

[23]  G. Kaplan,et al.  Weekend versus weekday admission and mortality from gastrointestinal hemorrhage caused by peptic ulcer disease. , 2009, Clinical gastroenterology and hepatology : the official clinical practice journal of the American Gastroenterological Association.

[24]  Mitzi L. Dean,et al.  Variation in ICU risk-adjusted mortality: impact of methods of assessment and potential confounders. , 2008, Chest.

[25]  J. Zimmerman,et al.  Acute Physiology and Chronic Health Evaluation (APACHE) IV: Hospital mortality assessment for today’s critically ill patients* , 2006, Critical care medicine.

[26]  Michael D Pasquale,et al.  Use of admission Glasgow Coma Score, pupil size, and pupil reactivity to determine outcome for trauma patients. , 2003, The Journal of trauma.

[27]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[28]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artif. Intell..

[29]  Eric Nguyen,et al.  Text Mining and Network Analysis of Digital Libraries in R , 2014 .

[30]  Serkan Günal,et al.  The impact of preprocessing on text classification , 2014, Inf. Process. Manag..

[31]  K. Lobdell,et al.  Can timing of tracheal extubation predict improved outcomes after cardiac surgery? , 2009, HSR proceedings in intensive care & cardiovascular anesthesia.