A post-processing algorithm for building longitudinal medication dose data from extracted medication information using natural language processing from electronic health records

Objective We developed a post-processing algorithm to convert raw natural language processing output from electronic health records into a usable format for analysis. This algorithm was specifically developed for creating datasets that can be used for medication-based studies. Materials and Methods The algorithm was developed using output from two natural language processing systems, MedXN and medExtractR. We extracted medication information from deidentified clinical notes from Vanderbilt’s electronic health record system for two medications, tacrolimus and lamotrigine, which have widely different prescribing patterns. The algorithm consists of two parts. Part I parses the raw output and connects entities together and Part II removes redundancies and calculates dose intake and daily dose. We evaluated both parts of the algorithm by comparing to gold standards that were generated using approximately 300 records from 10 subjects for both medications and both NLP systems. Results Both parts of the algorithm performed well. For MedXN, the F-measures for Part I were at or above 0.94 and for Part II they were at or above 0.98. For medExtractR the F-measures for Part I were at or above 0.98 and for Part II they were at or above 0.91. Discussion Our post-processing algorithm is useful for drug-based studies because it converts NLP output to analyzable data. It performed well, although it cannot handle highly complicated cases, which usually occurred when a NLP incorrectly extracted dose information. Future work will focus on identifying the most likely correct dose when conflicting doses are extracted on the same day.

[1]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[2]  D A Evans,et al.  Automating concept identification in the electronic medical record: an experiment in extracting dosage information. , 1996, Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium.

[3]  Hong Yu,et al.  Lancet: a high precision medication event extraction system for clinical text , 2010, J. Am. Medical Informatics Assoc..

[4]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[5]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[6]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[7]  Min Li,et al.  High accuracy information extraction of medication information from clinical notes: 2009 i2b2 medication extraction challenge , 2010, J. Am. Medical Informatics Assoc..

[8]  Marylyn D. Ritchie,et al.  The use of a DNA biobank linked to electronic medical records to characterize pharmacogenomic predictors of tacrolimus dose requirement in kidney transplant recipients , 2012, Pharmacogenetics and genomics.

[9]  Hongfang Liu,et al.  Research and applications: MedXN: an open source medication extraction and normalization tool for clinical text , 2014, J. Am. Medical Informatics Assoc..

[10]  Hongfang Liu,et al.  CLAMP – a toolkit for efficiently building customized clinical natural language processing pipelines , 2017, J. Am. Medical Informatics Assoc..

[11]  Joshua C. Denny,et al.  medExtractR: A medication extraction algorithm for electronic health records using the R programming language , 2019, AMIA.

[12]  Hannah L Weeks,et al.  medExtractR: A targeted, customizable approach to medication extraction from electronic health records , 2020, J. Am. Medical Informatics Assoc..

[13]  Hannah L Weeks,et al.  Development of a System for Postmarketing Population Pharmacokinetic and Pharmacodynamic Studies Using Real‐World Data From Electronic Health Records , 2020, Clinical pharmacology and therapeutics.