Syntactic Parsing for Bio-molecular Event Detection from Scientific Literature

Rapid advances in science and in laboratorial and computing methods are generating vast amounts of data and scientific literature. In order to keep up-to-date with the expanding knowledge in their field of study, researchers are facing an increasing need for tools that help manage this information. In the genomics field, various databases have been created to save information in a formalized and easily accessible form. However, human curators are not capable of updating these databases at the same rate new studies are published. Advanced and robust text mining tools that automatically extract newly published information from scientific articles are required. This paper presents a methodology, based on syntactic parsing, for identification of gene events from the scientific literature. Evaluation of the proposed approach, based on the BioNLP shared task on event extraction, produced an average F-score of 47.1, for six event types.

[1]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[2]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[3]  A. Valencia,et al.  A gene network for navigating the literature , 2004, Nature Genetics.

[4]  Ulf Leser,et al.  ALIBABA: PubMed as a graph , 2006, Bioinform..

[5]  L. Grivell,et al.  Text mining for biology - the way forward: opinions from leading scientists , 2008, Genome Biology.

[6]  Hao Chen,et al.  Content-rich biological network constructed by mining PubMed abstracts , 2004, BMC Bioinformatics.

[7]  Dietrich Rebholz-Schuhmann,et al.  BioLexicon: A Lexical Resource for the Biology Domain , 2008, SMBM 2008.

[8]  Dietrich Rebholz-Schuhmann,et al.  EBIMed - text crunching to gather facts for proteins from Medline , 2007, Bioinform..

[9]  Hagit Shatkay,et al.  Hairpins in bookstacks: Information retrieval from biomedical text , 2005, Briefings Bioinform..

[10]  Sophia Ananiadou,et al.  FACTA: a text search engine for finding associated biomedical concepts , 2008, Bioinform..

[11]  Barend Mons,et al.  Online tools to support literature-based discovery in the life sciences , 2005, Briefings Bioinform..

[12]  D. Rebholz-Schuhmann,et al.  Facts from Text—Is Text Mining Ready to Deliver? , 2005, PLoS biology.

[13]  Jun'ichi Tsujii,et al.  Semantic Retrieval for the Accurate Identification of Relational Concepts in Massive Textbases , 2006, ACL.