Overview of BioNLP’09 Shared Task on Event Extraction

The paper presents the design and implementation of the BioNLP'09 Shared Task, and reports the final results with analysis. The shared task consists of three sub-tasks, each of which addresses bio-molecular event extraction at a different level of specificity. The data was developed based on the GENIA event corpus. The shared task was run over 12 weeks, drawing initial interest from 42 teams. Of these teams, 24 submitted final results. The evaluation results are encouraging, indicating that state-of-the-art performance is approaching a practically applicable level and revealing some remaining challenges.

[1]  Sampo Pyysalo,et al.  Static Relations: a Piece in the Biomedical Information Extraction Puzzle , 2009, BioNLP@HLT-NAACL.

[2]  Emily Dimmer,et al.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology , 2004, Nucleic Acids Res..

[3]  Jun'ichi Tsujii,et al.  Syntax Annotation for the GENIA Corpus , 2005, IJCNLP.

[4]  Stephen Clark,et al.  Porting a lexicalized-grammar parser to the biomedical domain , 2009, J. Biomed. Informatics.

[5]  Ellen M. Voorhees,et al.  Overview of TREC 2007 , 2007, TREC.

[6]  Daniel M. Bikel,et al.  Intricacies of Collins’ Parsing Model , 2004, CL.

[7]  Jun'ichi Tsujii,et al.  Corpus annotation for mining biomedical events from literature , 2008, BMC Bioinformatics.

[8]  K. Bretonnel Cohen,et al.  U-Compare: share and compare text mining tools with UIMA , 2009, Bioinform..

[9]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[10]  Claire Nédellec,et al.  Learning Language in Logic - Genic Interaction Extraction Challenge , 2005 .

[11]  Nigel Collier,et al.  Introduction to the Bio-entity Recognition Task at JNLPBA , 2004, NLPBA/BioNLP.

[12]  Nancy Chinchor,et al.  Overview of MUC-7 , 1998, MUC.

[13]  Yue Wang,et al.  Incorporating GENETAG-style annotation to GENIA corpus , 2009, BioNLP@HLT-NAACL.

[14]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[15]  Marti A. Hearst,et al.  TREC 2007 Genomics Track Overview , 2007, TREC.

[16]  Eugene Charniak,et al.  Self-Training for Biomedical Parsing , 2008, ACL.

[17]  Gary D. Bader,et al.  Pathguide: a Pathway Resource List , 2005, Nucleic Acids Res..

[18]  Lorraine K. Tanabe,et al.  GENETAG: a tagged corpus for gene/protein named entity recognition , 2005, BMC Bioinformatics.

[19]  Mark A. Przybocki,et al.  Linguistic Resources and Evaluation Techniques for Evaluation of Cross-Document Automatic Content Extraction , 2008, LREC.

[20]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[21]  Gabriele Ausiello,et al.  MINT: the Molecular INTeraction database , 2006, Nucleic Acids Res..