Integrating existing natural language processing tools for medication extraction from discharge summaries

OBJECTIVE To develop an automated system to extract medications and related information from discharge summaries as part of the 2009 i2b2 natural language processing (NLP) challenge. This task required accurate recognition of medication name, dosage, mode, frequency, duration, and reason for drug administration. DESIGN We developed an integrated system using several existing NLP components developed at Vanderbilt University Medical Center, which included MedEx (to extract medication information), SecTag (a section identification system for clinical notes), a sentence splitter, and a spell checker for drug names. Our goal was to achieve good performance with minimal to no specific training for this document corpus; thus, evaluating the portability of those NLP tools beyond their home institution. The integrated system was developed using 17 notes that were annotated by the organizers and evaluated using 251 notes that were annotated by participating teams. MEASUREMENTS The i2b2 challenge used standard measures, including precision, recall, and F-measure, to evaluate the performance of participating systems. There were two ways to determine whether an extracted textual finding is correct or not: exact matching or inexact matching. The overall performance for all six types of medication-related findings across 251 annotated notes was considered as the primary metric in the challenge. RESULTS Our system achieved an overall F-measure of 0.821 for exact matching (0.839 precision; 0.803 recall) and 0.822 for inexact matching (0.866 precision; 0.782 recall). The system ranked second out of 20 participating teams on overall performance at extracting medications and related information. CONCLUSIONS The results show that the existing MedEx system, together with other NLP components, can extract medication information in clinical text from institutions other than the site of algorithm development with reasonable performance.

[1]  Fei Xia,et al.  Community annotation experiment for ground truth generation for the i2b2 medication challenge , 2010, J. Am. Medical Informatics Assoc..

[2]  Vasudevan Jagannathan,et al.  Assessment of commercial NLP engines for medication information extraction from dictated clinical notes , 2009, Int. J. Medical Informatics.

[3]  David L. Reich,et al.  Extraction and Mapping of Drug Names from Free Text to a Standardized Nomenclature , 2007, AMIA.

[4]  Özlem Uzuner,et al.  Extracting medication information from clinical text , 2010, J. Am. Medical Informatics Assoc..

[5]  Geoff Gordon,et al.  Use of natural language programming to extract medication from unstructured electronic medical records. , 2007, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[6]  Peggy L. Peissig,et al.  Study of Effect of Drug Lexicons on Medication Extraction from Electronic Medical Records , 2004, Pacific Symposium on Biocomputing.

[7]  Martin Kay,et al.  Algorithm schemata and data structures in syntactic processing , 1986 .

[8]  Randolph A. Miller,et al.  Research Paper: Evaluation of a Method to Identify and Categorize Section Headers in Clinical Documents , 2009, J. Am. Medical Informatics Assoc..

[9]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[10]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[11]  George Hripcsak,et al.  Extracting Structured Medication Event Information from Discharge Summaries , 2008, AMIA.