Optimizing graph-based patterns to extract biomedical events from the literature

In BioNLP-ST 2013We participated in the BioNLP 2013 shared tasks on event extraction. Our extraction method is based on the search for an approximate subgraph isomorphism between key context dependencies of events and graphs of input sentences. Our system was able to address both the GENIA (GE) task focusing on 13 molecular biology related event types and the Cancer Genetics (CG) task targeting a challenging group of 40 cancer biology related event types with varying arguments concerning 18 kinds of biological entities. In addition to adapting our system to the two tasks, we also attempted to integrate semantics into the graph matching scheme using a distributional similarity model for more events, and evaluated the event extraction impact of using paths of all possible lengths as key context dependencies beyond using only the shortest paths in our system. We achieved a 46.38% F-score in the CG task (ranking 3rd) and a 48.93% F-score in the GE task (ranking 4th).After BioNLP-ST 2013We explored three ways to further extend our event extraction system in our previously published work: (1) We allow non-essential nodes to be skipped, and incorporated a node skipping penalty into the subgraph distance function of our approximate subgraph matching algorithm. (2) Instead of assigning a unified subgraph distance threshold to all patterns of an event type, we learned a customized threshold for each pattern. (3) We implemented the well-known Empirical Risk Minimization (ERM) principle to optimize the event pattern set by balancing prediction errors on training data against regularization. When evaluated on the official GE task test data, these extensions help to improve the extraction precision from 62% to 65%. However, the overall F-score stays equivalent to the previous performance due to a 1% drop in recall.

[1]  Tapio Salakoski,et al.  EVEX in ST’13: Application of a large-scale text mining resource to event extraction and network construction , 2013, BioNLP@ACL.

[2]  F Rinaldi,et al.  OntoGene in BioCreative II.5 , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[4]  Sampo Pyysalo,et al.  Overview of the Cancer Genetics (CG) task of BioNLP Shared Task 2013 , 2013, BioNLP@ACL.

[5]  Sampo Pyysalo,et al.  Overview of the Epigenetics and Post-translational Modifications (EPI) task of BioNLP Shared Task 2011 , 2011, BioNLP@ACL.

[6]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[7]  Hao Yu,et al.  Discovering patterns to extract protein-protein interactions from the literature: Part II , 2005, Bioinform..

[8]  Geoffrey Zweig,et al.  Polarity Inducing Latent Semantic Analysis , 2012, EMNLP.

[9]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[10]  Sampo Pyysalo,et al.  Overview of the Cancer Genetics and Pathway Curation tasks of BioNLP Shared Task 2013 , 2015, BMC Bioinformatics.

[11]  Yijia Zhang,et al.  A Single Kernel-Based Approach to Extract Drug-Drug Interactions from Biomedical Literature , 2012, PloS one.

[12]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[13]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[14]  Eugene Charniak,et al.  Self-Training for Biomedical Parsing , 2008, ACL.

[15]  Sophia Ananiadou,et al.  Automatic extraction of angiogenesis bioprocess from text , 2011, Bioinform..

[16]  Ulf Leser,et al.  Relation Extraction for Drug-Drug Interactions using Ensemble Learning , 2011 .

[17]  Haibin Liu,et al.  EXPLORING A SUBGRAPH MATCHING APPROACH FOR EXTRACTING BIOLOGICAL EVENTS FROM LITERATURE , 2014, Comput. Intell..

[18]  Jari Björne,et al.  University of Turku in the BioNLP'11 Shared Task , 2012, BMC Bioinformatics.

[19]  Karin M. Verspoor,et al.  BioLemmatizer: a lemmatization tool for morphological processing of biomedical text , 2012, J. Biomed. Semant..

[20]  Ulf Leser,et al.  Not all links are equal: Exploiting Dependency Types for the Extraction of Protein-Protein Interactions from Text , 2011, BioNLP@ACL.

[21]  Karin M. Verspoor,et al.  From Graphs to Events: A Subgraph Matching Approach for Information Extraction from Biomedical Text , 2011, BioNLP@ACL.

[22]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[23]  Udo Hahn,et al.  Event Extraction from Trimmed Dependency Graphs , 2009, BioNLP@HLT-NAACL.

[24]  Karin M. Verspoor,et al.  BioC: a minimalist approach to interoperability for biomedical text processing , 2013, AMIA.

[25]  Karin M. Verspoor,et al.  Generalizing an Approximate Subgraph Matching-based System to Extract Events in Molecular Biology and Cancer Genetics , 2013, BioNLP@ACL.

[26]  Isabel Segura-Bedmar,et al.  The 1st DDIExtraction-2011 challenge task: Extraction of Drug-Drug Interactions from biomedical texts , 2011 .

[27]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[28]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[29]  Jari Björne,et al.  TEES 2.1: Automated Annotation Scheme Learning in the BioNLP 2013 Shared Task , 2013, BioNLP@ACL.

[30]  Sampo Pyysalo,et al.  Event extraction across multiple levels of biological organization , 2012, Bioinform..

[31]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[32]  Xu Han,et al.  Extending the evaluation of Genia Event task toward knowledge base construction and comparison to Gene Regulation Ontology task , 2015, BMC Bioinformatics.

[33]  Junichi Tsujii,et al.  Event extraction for systems biology by text mining the literature. , 2010, Trends in biotechnology.

[34]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[35]  M. Kubát An Introduction to Machine Learning , 2017, Springer International Publishing.

[36]  Karin M. Verspoor,et al.  Approximate Subgraph Matching-Based Literature Mining for Biomedical Events and Relations , 2013, PloS one.

[37]  Akinori Yonezawa,et al.  Overview of Genia Event Task in BioNLP Shared Task 2011 , 2011, BioNLP@ACL.

[38]  K. Bretonnel Cohen,et al.  A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools , 2012, BMC Bioinformatics.