Overview of the Cancer Genetics (CG) task of BioNLP Shared Task 2013

We present the design, preparation, results and analysis of the Cancer Genetics (CG) event extraction task, a main task of the BioNLP Shared Task (ST) 2013. The CG task is an information extraction task targeting the recognition of events in text, represented as structured n-ary associations of given physical entities. In addition to addressing the cancer domain, the CG task is differentiated from previous event extraction tasks in the BioNLP ST series in addressing a wide range of pathological processes and multiple levels of biological organization, ranging from the molecular through the cellular and organ levels up to whole organisms. Final test set submissions were accepted from six teams. The highest-performing system achieved an Fscore of 55.4%. This level of performance is broadly comparable with the state of the art for established molecular-level extraction tasks, demonstrating that event extraction resources and methods generalize well to higher levels of biological organization and are applicable to the analysis of scientific texts on cancer. The CG task continues as an open challenge to all interested parties, with tools and resources available from http://2013. bionlp-st.org/.

[1]  Akinori Yonezawa,et al.  The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011 , 2012, BMC Bioinformatics.

[2]  Egon L. Willighagen,et al.  OSCAR4: a flexible architecture for chemical text-mining , 2011, J. Cheminformatics.

[3]  Jari Björne,et al.  Generalizing Biomedical Event Extraction , 2011, BioNLP@ACL.

[4]  S. V. Ramanan,et al.  Performance and limitations of the linguistically motivated Cocoa/Peaberry system in a broad biological domain. , 2013, BioNLP@ACL.

[5]  Jari Björne,et al.  TEES 2.1: Automated Annotation Scheme Learning in the BioNLP 2013 Shared Task , 2013, BioNLP@ACL.

[6]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[7]  Eugene Charniak,et al.  Any Domain Parsing: Automatic Domain Adaptation for Natural Language Parsing , 2010 .

[8]  Sampo Pyysalo,et al.  Event extraction across multiple levels of biological organization , 2012, Bioinform..

[9]  D. Hanahan,et al.  The Hallmarks of Cancer , 2000, Cell.

[10]  Sophia Ananiadou,et al.  Boosting automatic event extraction from the literature using domain adaptation and coreference resolution , 2012, Bioinform..

[11]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[12]  Karin M. Verspoor,et al.  BioLemmatizer: a lemmatization tool for morphological processing of biomedical text , 2012, J. Biomed. Semant..

[13]  J. Baselga,et al.  The Evolving War on Cancer , 2011, Cell.

[14]  Graciela Gonzalez,et al.  BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[15]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[16]  Sampo Pyysalo,et al.  EXTRACTING BIO‐MOLECULAR EVENTS FROM LITERATURE—THE BIONLP’09 SHARED TASK , 2011, Comput. Intell..

[17]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[18]  Goran Nenadic,et al.  LINNAEUS: A species name identification system for biomedical literature , 2010, BMC Bioinformatics.

[19]  Karin M. Verspoor,et al.  From Graphs to Events: A Subgraph Matching Approach for Information Extraction from Biomedical Text , 2011, BioNLP@ACL.

[20]  Junichi Tsujii,et al.  Event extraction for systems biology by text mining the literature. , 2010, Trends in biotechnology.

[21]  Sampo Pyysalo,et al.  BioNLP Shared Task 2011: Supporting Resources , 2011, BioNLP@ACL.

[22]  Sampo Pyysalo,et al.  Open-domain Anatomical Entity Mention Detection , 2012, ACL 2012.

[23]  Mihai Surdeanu,et al.  Event Extraction as Dependency Parsing for BioNLP 2011 , 2011, BioNLP@ACL.

[24]  Lorraine K. Tanabe,et al.  GENETAG: a tagged corpus for gene/protein named entity recognition , 2005, BMC Bioinformatics.

[25]  Jun'ichi Tsujii,et al.  Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles , 2007, EMNLP.

[26]  Karin M. Verspoor,et al.  Generalizing an Approximate Subgraph Matching-based System to Extract Events in Molecular Biology and Cancer Genetics , 2013, BioNLP@ACL.

[27]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[28]  Sampo Pyysalo,et al.  Overview of the ID, EPI and REL tasks of BioNLP Shared Task 2011 , 2012, BMC Bioinformatics.

[29]  José L. V. Mejino,et al.  CARO - The Common Anatomy Reference Ontology , 2008, Anatomy Ontologies for Bioinformatics.

[30]  Jun'ichi Tsujii,et al.  Feature Forest Models for Probabilistic HPSG Parsing , 2008, CL.

[31]  José L. V. Mejino,et al.  A reference ontology for biomedical informatics: the Foundational Model of Anatomy , 2003, J. Biomed. Informatics.

[32]  Sophia Ananiadou,et al.  NaCTeM EventMine for BioNLP 2013 CG and PC tasks , 2013, BioNLP@ACL.

[33]  Jessica A. Turner,et al.  Modeling biomedical experimental processes with OBI , 2010, J. Biomed. Semant..