Vaccine adverse event text mining system for extracting features from vaccine safety reports

OBJECTIVE To develop and evaluate a text mining system for extracting key clinical features from vaccine adverse event reporting system (VAERS) narratives to aid in the automated review of adverse event reports. DESIGN Based upon clinical significance to VAERS reviewing physicians, we defined the primary (diagnosis and cause of death) and secondary features (eg, symptoms) for extraction. We built a novel vaccine adverse event text mining (VaeTM) system based on a semantic text mining strategy. The performance of VaeTM was evaluated using a total of 300 VAERS reports in three sequential evaluations of 100 reports each. Moreover, we evaluated the VaeTM contribution to case classification; an information retrieval-based approach was used for the identification of anaphylaxis cases in a set of reports and was compared with two other methods: a dedicated text classifier and an online tool. MEASUREMENTS The performance metrics of VaeTM were text mining metrics: recall, precision and F-measure. We also conducted a qualitative difference analysis and calculated sensitivity and specificity for classification of anaphylaxis cases based on the above three approaches. RESULTS VaeTM performed best in extracting diagnosis, second level diagnosis, drug, vaccine, and lot number features (lenient F-measure in the third evaluation: 0.897, 0.817, 0.858, 0.874, and 0.914, respectively). In terms of case classification, high sensitivity was achieved (83.1%); this was equal and better compared to the text classifier (83.1%) and the online tool (40.7%), respectively. CONCLUSION Our VaeTM implementation of a semantic text mining strategy shows promise in providing accurate and efficient extraction of key features from VAERS narratives.

[1]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[2]  John F. Hurdle,et al.  Automated identification of adverse events related to central venous catheters , 2007, J. Biomed. Informatics.

[3]  Jun'ichi Tsujii,et al.  Corpus annotation for mining biomedical events from literature , 2008, BMC Bioinformatics.

[4]  Fang Liu,et al.  Creating a Vaccine Adverse Event Ontology for Public Health , 2005, AMIA.

[5]  J. Loeb,et al.  The JCAHO patient safety event taxonomy: a standardized terminology and classification schema for near misses and adverse events. , 2005, International journal for quality in health care : journal of the International Society for Quality in Health Care.

[6]  Robert T. Chen,et al.  Understanding vaccine safety information from the Vaccine Adverse Event Reporting System. , 2004, The Pediatric infectious disease journal.

[7]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[8]  Mark R. Dambro,et al.  The 5-minute clinical consult , 1993 .

[9]  Scott T. Weiss,et al.  Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system , 2006, BMC Medical Informatics Decis. Mak..

[10]  George Hripcsak,et al.  Automated encoding of clinical documents based on natural language processing. , 2004, Journal of the American Medical Informatics Association : JAMIA.

[11]  Jan Bonhoeffer,et al.  Anaphylaxis: case definition and guidelines for data collection, analysis, and presentation of immunization safety data. , 2007, Vaccine.

[12]  James W. Cooper,et al.  Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model , 2009, J. Biomed. Informatics.

[13]  richard F leblond,et al.  DeGowin's Diagnostic Examination , 1999 .

[14]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[15]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[16]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[17]  Angus Roberts,et al.  Building a semantically annotated corpus of clinical texts , 2009, J. Biomed. Informatics.

[18]  George Hripcsak,et al.  Automated detection of adverse events using natural language processing of discharge summaries. , 2005, Journal of the American Medical Informatics Association : JAMIA.

[19]  Carol Friedman,et al.  A broad-coverage natural language processing system , 2000, AMIA.

[20]  Anand Kumar,et al.  Text mining and ontologies in biomedicine: Making sense of raw text , 2005, Briefings Bioinform..

[21]  Marianthi Markatou,et al.  Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection , 2011, J. Am. Medical Informatics Assoc..

[22]  Małgorzata Marciniak,et al.  Rule-based information extraction from patients' clinical data , 2009, J. Biomed. Informatics.

[23]  Wendy W. Chapman,et al.  ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports , 2009, J. Biomed. Informatics.

[24]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[25]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.