Coreference resolution improves extraction of Biological Expression Language statements from texts

We describe a system that automatically extracts biological events from biomedical journal articles, and translates those events into Biological Expression Language (BEL) statements. The system incorporates existing text mining components for coreference resolution, biological event extraction and a previously formally untested strategy for BEL statement generation. Although addressing the BEL track (Track 4) at BioCreative V (2015), we also investigate how incorporating coreference resolution might impact event extraction in the biomedical domain. In this paper, we report that our system achieved the best performance of 20.2 and 35.2 in F-score for the full BEL statement level on both stage 1, and stage 2 using provided gold standard entities, respectively. We also report that our results evaluated on the training dataset show benefit from integrating coreference resolution with event extraction.

[1]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[2]  Akinori Yonezawa,et al.  Overview of Genia Event Task in BioNLP Shared Task 2011 , 2011, BioNLP@ACL.

[3]  Jari Björne,et al.  Generalizing Biomedical Event Extraction , 2011, BioNLP@ACL.

[4]  Karin M. Verspoor,et al.  Analysis of Coreference Relations in the Biomedical Literature , 2014, ALTA.

[5]  Yue Wang,et al.  The Genia Event Extraction Shared Task, 2013 Edition - Overview , 2013, BioNLP@ACL.

[6]  K. Bretonnel Cohen,et al.  Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters , 2014, BMC Bioinformatics.

[7]  D. Rebholz-Schuhmann,et al.  Text-mining solutions for biomedical research: enabling integrative biology , 2012, Nature Reviews Genetics.

[8]  José Luís Oliveira,et al.  BeCAS: biomedical concept recognition services and visualization , 2013, Bioinform..

[9]  Sonu Kumar,et al.  The G protein-coupled receptors in the pufferfish Takifugu rubripes , 2011, BMC Bioinformatics.

[10]  Vincent Ng,et al.  Anaphora resolution in biomedical literature: a hybrid approach , 2012, BCB.

[11]  Akinori Yonezawa,et al.  The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011 , 2012, BMC Bioinformatics.

[12]  Juliane Fluck,et al.  Training and evaluation corpora for the extraction of causal relationships encoded in biological expression language (BEL) , 2016, Database J. Biol. Databases Curation.

[13]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[14]  Jari Björne,et al.  Large-Scale Event Extraction from Literature with Multi-Level Gene Normalization , 2013, PloS one.

[15]  Jian Zhang,et al.  The Protein Ontology: a structured representation of protein forms and complexes , 2010, Nucleic Acids Res..

[16]  Hongfang Liu,et al.  Adapting a rule-based relation extraction system for BioCreative V BEL task , 2015 .

[17]  Jing Zhang,et al.  Coreference resolution in biomedical texts , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[18]  Kei-Hoi Cheung,et al.  BioPAX – A community standard for pathway data sharing , 2010, Nature Biotechnology.

[19]  Cédric Notredame,et al.  Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee , 2012, BMC Bioinformatics.

[20]  Karin M. Verspoor,et al.  Evaluation of Coreference Resolution for Biomedical Text , 2014, MedIR@SIGIR.

[21]  Haibin Liu,et al.  Integrating Coreference Resolution for BEL Statement Generation , 2015 .

[22]  K. Bretonnel Cohen,et al.  Manual curation is not sufficient for annotation of genomic databases , 2007, ISMB/ECCB.

[23]  Jacky L. Snoep,et al.  BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems , 2005, Nucleic Acids Res..

[24]  Heeyoung Lee,et al.  Deterministic Coreference Resolution Based on Entity-Centric, Precision-Ranked Rules , 2013, CL.

[25]  Sampo Pyysalo,et al.  EXTRACTING BIO‐MOLECULAR EVENTS FROM LITERATURE—THE BIONLP’09 SHARED TASK , 2011, Comput. Intell..

[26]  Yusuke Miyao,et al.  AKANE System : Protein-Protein Interaction 1 AKANE System : Protein-Protein Interaction Pairs in the BioCreAtIvE 2 Challenge , PPI-IPS subtask , 2007 .

[27]  Casey S. Greene,et al.  Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery , 2015, Briefings Bioinform..

[28]  Po-Ting Lai,et al.  NCU-IISR System for BioCreative BEL Task 1 , 2015 .

[29]  Junichi Tsujii,et al.  Event extraction for systems biology by text mining the literature. , 2010, Trends in biotechnology.

[30]  Claire Cardie,et al.  Coreference Resolution with Reconcile , 2010, ACL.

[31]  H. Kitano,et al.  A comprehensive pathway map of epidermal growth factor receptor signaling , 2005, Molecular systems biology.

[32]  Ellen Riloff,et al.  The Taming of Reconcile as a Biomedical Coreference Resolver , 2011, BioNLP@ACL.

[33]  Zhiyong Lu,et al.  PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..

[34]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[35]  Karin M. Verspoor,et al.  A categorical analysis of coreference resolution errors in biomedical texts , 2016, J. Biomed. Informatics.

[36]  Sophia Ananiadou,et al.  Boosting automatic event extraction from the literature using domain adaptation and coreference resolution , 2012, Bioinform..

[37]  Graciela Gonzalez,et al.  BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[38]  Dietrich Rebholz-Schuhmann,et al.  Biological network extraction from scientific literature: state of the art and challenges , 2014, Briefings Bioinform..

[39]  Lincoln Stein,et al.  Reactome knowledgebase of human biological pathways and processes , 2008, Nucleic Acids Res..

[40]  Jin-Dong Kim,et al.  Overview of the protein coreference task in BioNLP Shared Task 2011 , 2011 .

[41]  Jun'ichi Tsujii,et al.  Event Extraction with Complex Event Classification Using Rich Features , 2010, J. Bioinform. Comput. Biol..

[42]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[43]  Jari Björne,et al.  Extracting Complex Biological Events with Rich Graph-Based Feature Sets , 2009, BioNLP@HLT-NAACL.

[44]  David Warde-Farley,et al.  GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function , 2008, Genome Biology.

[45]  Jun'ichi Tsujii,et al.  Improving protein coreference resolution by simple semantic classification , 2011, BMC Bioinformatics.

[46]  Zhiyong Lu,et al.  DNorm: disease name normalization with pairwise learning to rank , 2013, Bioinform..

[47]  A. Valencia,et al.  Overview of the protein-protein interaction annotation extraction task of BioCreative II , 2008, Genome Biology.

[48]  Juliane Fluck,et al.  BioCreative V track 4: a shared task for the extraction of causal network information using the Biological Expression Language , 2016, Database J. Biol. Databases Curation.

[49]  Sampo Pyysalo,et al.  Overview of BioNLP Shared Task 2013 , 2013, BioNLP@ACL.

[50]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[51]  Fabio Rinaldi,et al.  Track 4 Overview: Extraction of Causal Network Information in Biological Expression Language (BEL) , 2015 .

[52]  Zhiyong Lu,et al.  Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases , 2011 .