MOLECULAR EVENT EXTRACTION FROM LINK GRAMMAR PARSE TREES IN THE BIONLP’09 SHARED TASK

The BioNLP’09 Shared Task deals with extracting information on molecular events, such as gene expression and protein localization, from natural language text. Information in this benchmark are given as tuples including protein names, trigger terms for each event, and possible other participants such as bindings sites. We address all three tasks of BioNLP’09: event detection, event enrichment, and recognition of negation and speculation. Our method for the first two tasks is based on a deep parser; we store the parse tree of each sentence in a relational database scheme. From the training data, we collect the dependencies connecting any two relevant terms of a known tuple, that is, the shortest paths linking these two constituents. We encode all such linkages in a query language to retrieve similar linkages from unseen text. For the third task, we rely on a hierarchy of hand‐crafted regular expressions to recognize speculation and negated events. In this paper, we added extensions regarding a post‐processing step that handles ambiguous event trigger terms, as well as an extension of the query language to relax linkage constraints. On the BioNLP Shared Task test data, we achieve an overall F1‐measure of 32%, 29%, and 30% for the successive Tasks 1, 2, and 3, respectively.

[1]  Peter Szolovits,et al.  Adding a Medical Lexicon to an English Parser , 2003, AMIA.

[2]  Zhiyong Lu,et al.  OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression , 2008, BMC Bioinformatics.

[3]  Domonkos Tikk,et al.  Research Paper: Semantic Classification of Diseases in Discharge Summaries Using a Context-aware Rule-based Classifier , 2009, J. Am. Medical Informatics Assoc..

[4]  Chitta Baral,et al.  Pacific Symposium on Biocomputing 14:87-98 (2009) QUERYING PARSE TREE DATABASE OF MEDLINE TEXT TO SYNTHESIZE USER-SPECIFIC BIOMOLECULAR NETWORKS , 2022 .

[5]  Tapio Salakoski,et al.  Analysis of Link Grammar on Biomedical Dependency Corpus Targeted at Protein-Protein Interactions , 2004, NLPBA/BioNLP.

[6]  Pieter Adriaans,et al.  A Local Alignment Kernel in the Context of NLP , 2008, COLING.

[7]  Jun'ichi Tsujii,et al.  Evaluating contributions of natural language parsers to protein–protein interaction extraction , 2008, Bioinform..

[8]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..

[9]  Jun'ichi Tsujii,et al.  Feature Forest Models for Probabilistic HPSG Parsing , 2008, CL.

[10]  U. Leser,et al.  Gene mention normalization and interaction extraction with context models and sentence motifs , 2008, Genome Biology.

[11]  Sampo Pyysalo,et al.  Static Relations: a Piece in the Biomedical Information Extraction Puzzle , 2009, BioNLP@HLT-NAACL.

[12]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[13]  Daniel Berleant,et al.  Mining MEDLINE: Abstracts, Sentences, or Phrases? , 2001, Pacific Symposium on Biocomputing.

[14]  Steven J. DeRose,et al.  XML Path Language (XPath) , 1999 .

[15]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[16]  Tapio Salakoski,et al.  Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches , 2006, BMC Bioinformatics.

[17]  Jun Xu,et al.  Extracting biochemical interactions from MEDLINE using a link grammar parser , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[18]  MatsuzakiTakuya,et al.  Evaluating contributions of natural language parsers to protein–protein interaction extraction , 2009 .

[19]  Jari Björne,et al.  Comparative analysis of five protein-protein interaction corpora , 2008, BMC Bioinformatics.

[20]  Susan B. Davidson,et al.  Designing and Evaluating an XPath Dialect for Linguistic Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[21]  Daniel Dominic Sleator,et al.  Parsing English with a Link Grammar , 1995, IWPT.

[22]  Fabio Rinaldi,et al.  Detecting Protein-Protein Interactions in Biomedical Texts Using a Parser and Linguistic Resources , 2009, CICLing.

[23]  Hao Yu,et al.  Discovering patterns to extract protein-protein interactions from the literature: Part II , 2005, Bioinform..

[24]  Jun'ichi Tsujii,et al.  Protein-protein interaction extraction by leveraging multiple kernels and parsers , 2009, Int. J. Medical Informatics.

[25]  Jihoon Yang,et al.  Data and text mining Kernel approaches for genic interaction extraction , 2008 .

[26]  A. Valencia,et al.  Linking genes to literature: text mining, information extraction, and retrieval applications for biology , 2008, Genome Biology.