Towards Pathway Curation Through Literature Mining - A Case Study Using PharmGKB

The creation of biological pathway knowledge bases is largely driven by manual effort to curate based on evidences from the scientific literature. It is highly challenging for the curators to keep up with the literature. Text mining applications have been developed in the last decade to assist human curators to speed up the curation pace where majority of them aim to identify the most relevant papers for curation with little attempt to directly extract the pathway information from text. In this paper, we describe a rule-based literature mining system to extract pathway information from text. We evaluated the system using curated pharmacokinetic (PK) and pharmacodynamic (PD) pathways in PharmGKB. The system achieved an F-measure of 63.11% and 34.99% for entity extraction and event extraction respectively against all PubMed abstracts cited in PharmGKB. It may be possible to improve the system performance by incorporating using statistical machine learning approaches. This study also helped us gain insights into the barriers towards automated event extraction from text for pathway curation.

[1]  Jari Björne,et al.  Extracting Complex Biological Events with Rich Graph-Based Feature Sets , 2009, BioNLP@HLT-NAACL.

[2]  Alfonso Valencia,et al.  Implementing the iHOP concept for navigation of biomedical literature , 2005, ECCB/JBI.

[3]  Tapio Salakoski,et al.  EVEX: A PubMed-Scale Resource for Homology-Based Generalization of Text Mining Predictions , 2011, BioNLP@ACL.

[4]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[5]  K. E. Ravikumar,et al.  A Biological Named Entity Recognizer , 2002, Pacific Symposium on Biocomputing.

[6]  Zhiyong Lu,et al.  The gene normalization task in BioCreative III , 2011, BMC Bioinformatics.

[7]  Branimir Boguraev,et al.  Anaphora for Everyone: Pronominal Anaphora Resolution without a Parser , 1996, COLING.

[8]  Alfonso Valencia,et al.  Overview of BioCreAtIvE: critical assessment of information extraction for biology , 2005, BMC Bioinformatics.

[9]  David Milward,et al.  Mining protein-protein interactions from published literature using Linguamatics I2E. , 2009, Methods in molecular biology.

[10]  Russ B. Altman,et al.  Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text , 2009, BMC Bioinformatics.

[11]  Udo Hahn,et al.  High-performance gene name normalization with GENO , 2009, Bioinform..

[12]  K. Bretonnel Cohen,et al.  Manual curation is not sufficient for annotation of genomic databases , 2007, ISMB/ECCB.

[13]  Karin M. Verspoor,et al.  From Graphs to Events: A Subgraph Matching Approach for Information Extraction from Biomedical Text , 2011, BioNLP@ACL.

[14]  Julie Bryant,et al.  Protein Networks and Pathway Analysis , 2009, Methods in Molecular Biology.

[15]  Russ B. Altman,et al.  PharmGKB: the Pharmacogenetics Knowledge Base , 2002, Nucleic Acids Res..

[16]  G. Schuler,et al.  Entrez: molecular biology database and retrieval system. , 1996, Methods in enzymology.

[17]  Yukiko Matsuoka,et al.  PathText: a text mining integrator for biological pathway visualizations , 2010, Bioinform..

[18]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[19]  Hao Chen,et al.  Content-rich biological network constructed by mining PubMed abstracts , 2004, BMC Bioinformatics.

[20]  Sergei Egorov,et al.  Pathway studio - the analysis and navigation of molecular networks , 2003, Bioinform..

[21]  Sophia Ananiadou,et al.  Proceedings of BioNLP 2011 Workshop , 2011 .

[22]  Jari Björne,et al.  Complex event extraction at PubMed scale , 2010, Bioinform..

[23]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[24]  Lincoln Stein,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Res..

[25]  Sophia Ananiadou,et al.  Extracting semantically enriched events from biomedical literature , 2012, BMC Bioinformatics.

[26]  Michael Krauthammer,et al.  GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data , 2004, J. Biomed. Informatics.

[27]  Manabu Torii,et al.  Pathway curation: Application of text-mining tools eGIFT and RLIMS-P , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops.

[28]  Karin M. Verspoor,et al.  Approximate Subgraph Matching-Based Literature Mining for Biomedical Events and Relations , 2013, PloS one.