Cross-Domain Evaluation of Edge Detection for Biomedical Event Extraction

Biomedical event extraction is a crucial task in order to automatically extract information from the increasingly growing body of biomedical literature. Despite advances in the methods in recent years, most event extraction systems are still evaluated in-domain and on complete event structures only. This makes it hard to determine the performance of intermediate stages of the task, such as edge detection, across different corpora. Motivated by these limitations, we present the first cross-domain study of edge detection for biomedical event extraction. We analyze differences between five existing gold standard corpora, create a standardized benchmark corpus, and provide a strong baseline model for edge detection. Experiments show a large drop in performance when the baseline is applied on out-of-domain data, confirming the need for domain adaptation methods for the task. To encourage research efforts in this direction, we make both the data and the baseline available to the research community: https://www.cosbi.eu/cfx/9985.

[1]  Zhiyong Lu,et al.  Community challenges in biomedical text mining over 10 years: success, failure and the future , 2016, Briefings Bioinform..

[2]  Jari Björne,et al.  TEES 2.2: Biomedical Event Extraction for Diverse Corpora , 2015, BMC Bioinformatics.

[3]  Sampo Pyysalo,et al.  Wide coverage biomedical event extraction using multiple partially overlapping corpora , 2013, BMC Bioinformatics.

[4]  Sampo Pyysalo,et al.  Overview of the Pathway Curation (PC) task of BioNLP Shared Task 2013 , 2013, BioNLP@ACL.

[5]  Anna Korhonen,et al.  Exploring subdomain variation in biomedical language , 2010, BMC Bioinformatics.

[6]  Heng Ji,et al.  Biomedical Event Extraction based on Knowledge-driven Tree-LSTM , 2019, NAACL.

[7]  Barbara Plank,et al.  What to do about non-standard (or non-canonical) language in NLP , 2016, KONVENS.

[8]  Sampo Pyysalo,et al.  Overview of the Epigenetics and Post-translational Modifications (EPI) task of BioNLP Shared Task 2011 , 2011, BioNLP@ACL.

[9]  Jun Zhao,et al.  Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[10]  Akinori Yonezawa,et al.  Overview of Genia Event Task in BioNLP Shared Task 2011 , 2011, BioNLP@ACL.

[11]  Jari Björne,et al.  Biomedical Event Extraction Using Convolutional Neural Networks and Dependency Parsing , 2018, BioNLP.

[12]  Sophia Ananiadou,et al.  Adaptable, high recall, event extraction system with minimal configuration , 2015, BMC Bioinformatics.

[13]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[14]  Steven J. M. Jones,et al.  VERSE: Event and Relation Extraction in the BioNLP 2016 Shared Task , 2016, BioNLP.

[15]  Junichi Tsujii,et al.  Event extraction for systems biology by text mining the literature. , 2010, Trends in biotechnology.

[16]  Tapio Salakoski,et al.  Distributional Semantics Resources for Biomedical Text Processing , 2013 .

[17]  Sampo Pyysalo,et al.  Overview of the Infectious Diseases (ID) task of BioNLP Shared Task 2011 , 2011, BioNLP@ACL.

[18]  Andreas Vlachos,et al.  Biomedical event extraction from abstracts and full papers using search-based structured prediction , 2011, BMC Bioinformatics.

[19]  Sampo Pyysalo,et al.  Event extraction across multiple levels of biological organization , 2012, Bioinform..

[20]  K. Bretonnel Cohen,et al.  The structural and content aspects of abstracts versus bodies of full text journal articles are different , 2010, BMC Bioinformatics.

[21]  Ralph Grishman,et al.  Event Detection and Domain Adaptation with Convolutional Neural Networks , 2015, ACL.

[22]  José Luís Oliveira,et al.  An Overview of Biomolecular Event Extraction from Scientific Documents , 2015, Comput. Math. Methods Medicine.

[23]  Daniel King,et al.  ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing , 2019, BioNLP@ACL.