Knowledge-based extraction of adverse drug events from biomedical text

BackgroundMany biomedical relation extraction systems are machine-learning based and have to be trained on large annotated corpora that are expensive and cumbersome to construct. We developed a knowledge-based relation extraction system that requires minimal training data, and applied the system for the extraction of adverse drug events from biomedical text. The system consists of a concept recognition module that identifies drugs and adverse effects in sentences, and a knowledge-base module that establishes whether a relation exists between the recognized concepts. The knowledge base was filled with information from the Unified Medical Language System. The performance of the system was evaluated on the ADE corpus, consisting of 1644 abstracts with manually annotated adverse drug events. Fifty abstracts were used for training, the remaining abstracts were used for testing.ResultsThe knowledge-based system obtained an F-score of 50.5%, which was 34.4 percentage points better than the co-occurrence baseline. Increasing the training set to 400 abstracts improved the F-score to 54.3%. When the system was compared with a machine-learning system, jSRE, on a subset of the sentences in the ADE corpus, our knowledge-based system achieved an F-score that is 7 percentage points higher than the F-score of jSRE trained on 50 abstracts, and still 2 percentage points higher than jSRE trained on 90% of the corpus.ConclusionA knowledge-based approach can be successfully used to extract adverse drug events from biomedical text without need for a large training set. Whether use of a knowledge base is equally advantageous for other biomedical relation-extraction tasks remains to be investigated.

[1]  Russ B. Altman,et al.  Using ODIN for a PharmGKB revalidation experiment , 2012, Database J. Biol. Databases Curation.

[2]  Fabio Rinaldi,et al.  Mining of relations between proteins over biomedical scientific literature using a deep-linguistic approach , 2007, Artif. Intell. Medicine.

[3]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[4]  J. Tiedemann Proceedings of the Language Resources and Evaluation Conference (LREC) 2008 , 2008 .

[5]  Daniel Hanisch,et al.  ProMiner: rule-based protein and gene entity recognition , 2005, BMC Bioinformatics.

[6]  Peer Bork,et al.  Extraction of regulatory gene/protein networks from Medline , 2006, Bioinform..

[7]  Mark Tuttle,et al.  Drug knowledge expressed as computable semantic triples. , 2011, Studies in health technology and informatics.

[8]  Marcelo Fiszman,et al.  Semantic Interpretation for the Biomedical Research Literature , 2005 .

[9]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[10]  Jun'ichi Tsujii,et al.  Syntax Annotation for the GENIA Corpus , 2005, IJCNLP.

[11]  Hans-Peter Kriegel,et al.  Extraction of semantic biomedical relations from text using conditional random fields , 2008, BMC Bioinformatics.

[12]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[13]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[14]  Jari Björne,et al.  Comparative analysis of five protein-protein interaction corpora , 2008, BMC Bioinformatics.

[15]  Elena Beisswanger,et al.  The Extraction of Pharmacogenetic and Pharmacogenomic Relations - A Case Study Using PharmGKB , 2011, Pacific Symposium on Biocomputing.

[16]  Zhiyong Lu,et al.  A context-blocks model for identifying clinical relationships in patient records , 2011, BMC Bioinformatics.

[17]  Qing Zeng-Treitler,et al.  Exploring Relations among Semantic Groups: A Comparison of Concept Co-occurrence in Biomedical Sources , 2010, MedInfo.

[18]  Kyu-Chul Lee,et al.  Finding the evidence for protein-protein interactions from PubMed abstracts , 2006, ISMB.

[19]  Charu C. Aggarwal,et al.  Mining Text Data , 2012 .

[20]  Erik M. van Mulligen,et al.  Using rule-based natural language processing to improve disease normalization in biomedical text , 2012, J. Am. Medical Informatics Assoc..

[21]  A. Valencia,et al.  Overview of the protein-protein interaction annotation extraction task of BioCreative II , 2008, Genome Biology.

[22]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[23]  Clement J. McDonald,et al.  What can natural language processing do for clinical decision support? , 2009, J. Biomed. Informatics.

[24]  Dina Demner-Fushman,et al.  Biomedical Text Mining: A Survey of Recent Progress , 2012, Mining Text Data.

[25]  Martijn J. Schuemie,et al.  Peregrine: Lightweight gene name normalization by dictionary lookup , 2007 .

[26]  Teruyoshi Hishiki,et al.  Extraction of Gene-Disease Relations from Medline Using Domain Dictionaries and Machine Learning , 2005, Pacific Symposium on Biocomputing.

[27]  Erik M. van Mulligen,et al.  Training text chunkers on a silver standard corpus: can silver replace gold? , 2011, BMC Bioinformatics.

[28]  A Valencia,et al.  An Overview of BioCreative II.5 , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[29]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[30]  Dragomir R. Radev,et al.  Semi-Supervised Classification for Extracting Protein Interaction Sentences using Dependency Parsing , 2007, EMNLP.

[31]  BMC Bioinformatics , 2005 .

[32]  Alfonso Valencia,et al.  Text-mining approaches in molecular biology and biomedicine. , 2005, Drug discovery today.

[33]  Luca Toldo,et al.  Extraction of potential adverse drug events from medical case reports , 2012, Journal of biomedical semantics.

[34]  David M. Shotton,et al.  CiTO, the Citation Typing Ontology , 2010, J. Biomed. Semant..

[35]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..

[36]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[37]  Pieter W. Adriaans,et al.  Learning Relations from Biomedical Corpora Using Dependency Trees , 2006, KDECB.

[38]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[39]  Olivier Bodenreider,et al.  Exploring semantic groups through visual approaches , 2003, J. Biomed. Informatics.

[40]  Russ B. Altman,et al.  Pharmacogenomics and bioinformatics: PharmGKB. , 2010, Pharmacogenomics.

[41]  William R. Hersh,et al.  A Survey of Current Work in Biomedical Text Mining , 2005 .

[42]  K. Bretonnel Cohen,et al.  U-Compare: share and compare text mining tools with UIMA , 2009, Bioinform..

[43]  Terri K. Attwood,et al.  Learning to extract relations for protein annotation , 2007, ISMB/ECCB.

[44]  K. Bretonnel Cohen,et al.  Frontiers of biomedical text mining: current progress , 2007, Briefings Bioinform..

[45]  Luca Toldo,et al.  Extraction of Adverse Drug Effects from Medical Case Reports , 2012, J. Biomed. Semant..

[46]  Xiaoyan Zhu,et al.  A hybrid method for relation extraction from biomedical literature , 2006, Int. J. Medical Informatics.

[47]  Pieter W. Adriaans,et al.  Learning Relations from Biomedical Corpora Using Dependency Tree Levels , 2006 .

[48]  Wen-Lian Hsu,et al.  New Challenges for Biological Text-Mining in the Next Decade , 2010, Journal of Computer Science and Technology.

[49]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[50]  Juliane Fluck,et al.  Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports , 2012, J. Biomed. Informatics.

[51]  Marti A. Hearst,et al.  A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , 2002, Pacific Symposium on Biocomputing.

[52]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[53]  Peter M. A. Sloot,et al.  A robust approach to extract biomedical events from literature , 2012, Bioinform..

[54]  Martijn J. Schuemie,et al.  Rewriting and suppressing UMLS terms for improved biomedical term identification , 2010, J. Biomed. Semant..

[55]  Erik M. van Mulligen,et al.  Comparing and combining chunkers of biomedical text , 2011, J. Biomed. Informatics.

[56]  Zhiyong Lu,et al.  Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases , 2011 .

[57]  Debra Revere,et al.  Characterizing Biomedical Concept Relationships , 2005 .

[58]  Dan Klein,et al.  Improved Identification of Noun Phrases in Clinical Radiology Reports Using a High-Performance Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon , 2005 .

[59]  Thomas C. Rindflesch,et al.  Semantic Processing for Enhanced Access to Biomedical Knowledge , 2002 .