Static Relations: a Piece in the Biomedical Information Extraction Puzzle

We propose a static relation extraction task to complement biomedical information extraction approaches. We argue that static relations such as part-whole are implicitly involved in many common extraction settings, define a task setting making them explicit, and discuss their integration into previously proposed tasks and extraction methods. We further identify a specific static relation extraction task motivated by the BioNLP'09 shared task on event extraction, introduce an annotated corpus for the task, and demonstrate the feasibility of the task by experiments showing that the defined relations can be reliably extracted. The task setting and corpus can serve to support several forms of domain information extraction.

[1]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[2]  Jin-Dong Kim,et al.  The GENIA corpus: an annotated research abstract corpus in molecular biology domain , 2002 .

[3]  Tapio Salakoski,et al.  Complex-to-Pairwise Mapping of Biological Relationships using a Semantic Network Representation , 2008 .

[4]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[5]  Jari Björne,et al.  Comparative analysis of five protein-protein interaction corpora , 2008, BMC Bioinformatics.

[6]  Preslav Nakov,et al.  Scaling Up BioNLP : Application of a Text Annotation Architecture to Noun Compound Bracketing , 2005 .

[7]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[8]  Beth M. Sundheim,et al.  Overview of Results of the MUC-6 Evaluation , 1995, MUC.

[9]  Claire Nédellec,et al.  Learning Language in Logic - Genic Interaction Extraction Challenge , 2005 .

[10]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..

[11]  Jun'ichi Tsujii,et al.  Corpus annotation for mining biomedical events from literature , 2008, BMC Bioinformatics.

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Jun'ichi Tsujii,et al.  Syntax Annotation for the GENIA Corpus , 2005, IJCNLP.

[14]  Rohit J. Kate,et al.  Comparative experiments on learning information extractors for proteins and their interactions , 2005, Artif. Intell. Medicine.

[15]  Claire Grover,et al.  The ITI TXM Corpora: Tissue Expressions and Protein-Protein Interactions , 2008 .

[16]  Graciela Gonzalez,et al.  BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[17]  K. Bretonnel Cohen,et al.  Frontiers of biomedical text mining: current progress , 2007, Briefings Bioinform..

[18]  Douglas Herrmann,et al.  A Taxonomy of Part-Whole Relations , 1987, Cogn. Sci..

[19]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[20]  Mark A. Przybocki,et al.  The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[21]  Barbara Rosario,et al.  Classifying the Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy , 2001, EMNLP.

[22]  Preslav Nakov,et al.  SemEval-2007 Task 04: Classification of Semantic Relations between Nominals , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).