Natural Language Processing for Rapid Response to Emergent Diseases: Case Study of Calcium Channel Blockers and Hypertension in the COVID-19 Pandemic

Background A novel disease poses special challenges for informatics solutions. Biomedical informatics relies for the most part on structured data, which require a preexisting data or knowledge model; however, novel diseases do not have preexisting knowledge models. In an emergent epidemic, language processing can enable rapid conversion of unstructured text to a novel knowledge model. However, although this idea has often been suggested, no opportunity has arisen to actually test it in real time. The current coronavirus disease (COVID-19) pandemic presents such an opportunity. Objective The aim of this study was to evaluate the added value of information from clinical text in response to emergent diseases using natural language processing (NLP). Methods We explored the effects of long-term treatment by calcium channel blockers on the outcomes of COVID-19 infection in patients with high blood pressure during in-patient hospital stays using two sources of information: data available strictly from structured electronic health records (EHRs) and data available through structured EHRs and text mining. Results In this multicenter study involving 39 hospitals, text mining increased the statistical power sufficiently to change a negative result for an adjusted hazard ratio to a positive one. Compared to the baseline structured data, the number of patients available for inclusion in the study increased by 2.95 times, the amount of available information on medications increased by 7.2 times, and the amount of additional phenotypic information increased by 11.9 times. Conclusions In our study, use of calcium channel blockers was associated with decreased in-hospital mortality in patients with COVID-19 infection. This finding was obtained by quickly adapting an NLP pipeline to the domain of the novel disease; the adapted pipeline still performed sufficiently to extract useful information. When that information was used to supplement existing structured data, the sample size could be increased sufficiently to see treatment effects that were not previously statistically detectable.

[1]  Naoaki Okazaki,et al.  Simple and Efficient Algorithm for Approximate Dictionary Matching , 2010, COLING.

[2]  Yu-Chuan Li,et al.  Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers , 2015, MedInfo.

[3]  Steven D Pizer,et al.  Falsification Testing of Instrumental Variables Methods for Comparative Effectiveness Research. , 2016, Health services research.

[4]  Anita Burgun-Parenthoine,et al.  Improving a full-text search engine: the importance of negation detection and family history context to identify cases in a biomedical data warehouse , 2017, J. Am. Medical Informatics Assoc..

[5]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[6]  Peter L. Elkin,et al.  Comparison of Natural Language Processing Biosurveillance Methods for Identifying Influenza From Encounter Notes , 2012, Annals of Internal Medicine.

[7]  Hao Li,et al.  Calcium channel blocker amlodipine besylate therapy is associated with reduced case fatality rate of COVID-19 patients with hypertension , 2020, medRxiv.

[8]  Roland Vollgraf,et al.  FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP , 2019, NAACL.

[9]  Wendy W. Chapman,et al.  Evaluating Natural Language Processing Applications Applied to Outbreak and Disease Surveillance , 2004 .

[10]  D. Lindberg,et al.  Unified Medical Language System , 2020, Definitions.

[11]  D.,et al.  Regression Models and Life-Tables , 2022 .

[12]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[13]  J. Snow On the Mode of Communication of Cholera , 1856, Edinburgh medical journal.