Extracting Biomedical Events and Modifications Using Subgraph Matching with Noisy Training Data

The Genia Event (GE) extraction task of the BioNLP Shared Task addresses the extraction of biomedical events from the natural language text of the published literature. In our submission, we modified an existing system for learning of event patterns via dependency parse subgraphs to utilise a more accurate parser and significantly more, but noisier, training data. We explore the impact of these two aspects of the system and conclude that the change in parser limits recall to an extent that cannot be offset by the large quantities of training data. However, our extensions of the system to extract modification events shows promise.

[1]  Andrew McCallum,et al.  Transition-based Dependency Parsing with Selectional Branching , 2013, ACL.

[2]  Tapio Salakoski,et al.  EVEX: A PubMed-Scale Resource for Homology-Based Generalization of Text Mining Predictions , 2011, BioNLP@ACL.

[3]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[4]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[5]  K. Bretonnel Cohen,et al.  A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools , 2012, BMC Bioinformatics.

[6]  Jari Björne,et al.  EXTRACTING CONTEXTUALIZED COMPLEX BIOLOGICAL EVENTS WITH RICH GRAPH‐BASED FEATURE SETS , 2011, Comput. Intell..

[7]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[8]  Karin M. Verspoor,et al.  Generalizing an Approximate Subgraph Matching-based System to Extract Events in Molecular Biology and Cancer Genetics , 2013, BioNLP@ACL.

[9]  Martha Palmer,et al.  Getting the Most out of Transition-based Dependency Parsing , 2011, ACL.

[10]  Karin M. Verspoor,et al.  Approximate Subgraph Matching-Based Literature Mining for Biomedical Events and Relations , 2013, PloS one.

[11]  Tapio Salakoski,et al.  Exploring Biomolecular Literature with EVEX: Connecting Genes through Events, Homology, and Indirect Associations , 2012, Adv. Bioinformatics.

[12]  U. Hahn,et al.  Sentence and Token Splitting Based On Conditional Random Fields , 2007 .

[13]  Timothy Baldwin,et al.  Detecting modification of biomedical events using a deep parsing approach , 2012, BMC Medical Informatics and Decision Making.

[14]  Paul Rayson,et al.  Comparing Corpora using Frequency Profiling , 2000, Proceedings of the workshop on Comparing corpora -.

[15]  Jari Björne,et al.  University of Turku in the BioNLP'11 Shared Task , 2012, BMC Bioinformatics.

[16]  Karin M. Verspoor,et al.  From Graphs to Events: A Subgraph Matching Approach for Information Extraction from Biomedical Text , 2011, BioNLP@ACL.

[17]  Karin M. Verspoor,et al.  BioLemmatizer: a lemmatization tool for morphological processing of biomedical text , 2012, J. Biomed. Semant..

[18]  Jari Björne,et al.  Generalizing Biomedical Event Extraction , 2011, BioNLP@ACL.

[19]  K. Bretonnel Cohen,et al.  HIGH‐PRECISION BIOLOGICAL EVENT EXTRACTION: EFFECTS OF SYSTEM AND OF DATA , 2011, Comput. Intell..

[20]  Akinori Yonezawa,et al.  The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011 , 2012, BMC Bioinformatics.

[21]  Eugene Charniak,et al.  Self-Training for Biomedical Parsing , 2008, ACL.