Leveraging UMLS-driven NLP to enhance identification of influenza predictors derived from electronic medical record data

Objective Multiple clinical prediction rules have been developed, but lack validation. This study aims to identify a set of prediction algorithms for influenza, based on electronic health record (EHR) structured data and clinical notes derived data using Unified Medical Language System (UMLS) driven natural language processing (NLP). Materials and Methods Data were extracted from an enterprise-wide data warehouse for all patients who tested positive for influenza and were seen in ambulatory care between 2009 and 2019 (N = 7,278). A text processing pipeline was used to analyze chart notes for UMLS terms for symptoms of interest to improve data quality completeness. Three models, which step up complexity of the dataset and predictors, were tested with least absolute shrinkage and selection operator (LASSO)-selected parameters to identify predictors for influenza. Receiver operating characteristic (ROC) curves compared test accuracy across the three models. Results Three models identified 7, 8, and 10 predictors, and the most complex model performed best. The addition of the UMLS-driven NLP symptoms data improved data quality (false negatives) and increased the number of significant predictors. NLP also increased the strength of the models, as did the addition of two-way predictor interactions. Discussion The EHR is a feasible source for offering rapidly accessible datasets for influenza related prediction research that was used to produce a prediction model for influenza. Combining data collected in routine care with data science methods improved a prediction model for influenza, and in the future, could be used to drive diagnostics at the point of care.

[1]  Steven G. Johnson,et al.  A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data , 2016, EGEMS.

[2]  Peter L. Elkin,et al.  Comparison of Natural Language Processing Biosurveillance Methods for Identifying Influenza From Encounter Notes , 2012, Annals of Internal Medicine.

[3]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[4]  Gary S Collins,et al.  Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration , 2015, Annals of Internal Medicine.

[5]  Fabio Rinaldi,et al.  Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review , 2019, JMIR medical informatics.

[6]  Lucy Vanderwende,et al.  A New Way of Representing Clinical Reports for Rapid Phenotyping , 2016, CRI.

[7]  Sabine Maguire,et al.  Methodological standards for the development and evaluation of clinical prediction rules: a review of the literature , 2019, Diagnostic and Prognostic Research.

[8]  Zhiyong Lu,et al.  DNorm: disease name normalization with pairwise learning to rank , 2013, Bioinform..

[9]  Borislav D. Dimitrov,et al.  Developing an International Register of Clinical Prediction Rules for Use in Primary Care: A Descriptive Analysis , 2014, The Annals of Family Medicine.

[10]  P. Glasziou,et al.  Systematic review of the effects of care provided with and without diagnostic clinical prediction rules , 2017, Diagnostic and Prognostic Research.

[11]  Cosmin Adrian Bejan,et al.  Assertion modeling and its role in clinical phenotype identification , 2013, J. Biomed. Informatics.

[12]  M. Ebell,et al.  A Systematic Review of Clinical Decision Rules for the Diagnosis of Influenza , 2011, The Annals of Family Medicine.

[13]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[14]  Dina Demner-Fushman,et al.  MetaMap Lite: an evaluation of a new Java implementation of MetaMap , 2017, J. Am. Medical Informatics Assoc..

[15]  K. Vandemaele,et al.  Revision of clinical case definitions: influenza-like illness and severe acute respiratory infection , 2017, Bulletin of the World Health Organization.

[16]  Ralph Gonzales,et al.  Development and Validation of a Clinical Decision Rule for the Diagnosis of Influenza , 2012, The Journal of the American Board of Family Medicine.

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  Ye Ye,et al.  Comparison of machine learning classifiers for influenza detection from emergency department free-text reports , 2015, J. Biomed. Informatics.