Dependency-directed Tree Kernel-based Protein-Protein Interaction Extraction from Biomedical Literature

There is a surge of research interest in protein-protein interaction (PPI) extraction from biomedical literature. While most of the state-of-the-art PPI extraction systems focus on dependency-based structured information, the rich structured information inherent in constituent parse trees has not been extensively explored for PPI extraction. In this paper, we propose a novel approach to tree kernel-based PPI extraction, where the tree representation generated from a constituent syntactic parser is further refined using the shortest dependency path between two proteins derived from a dependency parser. Specifically, all the constituent tree nodes associated with the nodes on the shortest dependency path are kept intact, while other nodes are removed safely to make the constituent tree concise and precise for PPI extraction. Compared with previously used constituent tree setups, our dependency-motivated constituent tree setup achieves the best results across five commonly used PPI corpora. Moreover, our tree kernel-based method outperforms other single kernel-based ones and performs comparably with some multiple kernel ones on the most commonly tested AIMed corpus.

[1]  Daniel Berleant,et al.  Mining MEDLINE: Abstracts, Sentences, or Phrases? , 2001, Pacific Symposium on Biocomputing.

[2]  Jun'ichi Tsujii,et al.  Evaluating contributions of natural language parsers to protein–protein interaction extraction , 2008, Bioinform..

[3]  Jian Su,et al.  A Composite Kernel to Extract Relations between Entities with Both Flat and Structured Features , 2006, ACL.

[4]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[5]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[6]  Peter M. A. Sloot,et al.  A hybrid approach to extract protein-protein interactions , 2011, Bioinform..

[7]  Sampo Pyysalo,et al.  Evaluating Dependency Representations for Event Extraction , 2010, COLING.

[8]  Hsinchun Chen,et al.  Kernel-based learning for biomedical relation extraction , 2008, J. Assoc. Inf. Sci. Technol..

[9]  Claire Nédellec,et al.  Learning Language in Logic - Genic Interaction Extraction Challenge , 2005 .

[10]  Jun'ichi Tsujii,et al.  Task-oriented Evaluation of Syntactic Parsers and Their Representations , 2008, ACL.

[11]  Guodong Zhou,et al.  Dependency-Driven Feature-based Learning for Extracting Protein-Protein Interactions from Biomedical Text , 2010, COLING.

[12]  Roberto Basili,et al.  Tree Kernels for Semantic Role Labeling , 2008, CL.

[13]  Guodong Zhou,et al.  Semantic Role Labeling Using a Grammar-Driven Convolution Tree Kernel , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[15]  Guodong Zhou,et al.  Tree Kernel-Based Relation Extraction with Context-Sensitive Structured Parse Tree Information , 2007, EMNLP.

[16]  Jun'ichi Tsujii,et al.  A Rich Feature Vector for Protein-Protein Interaction Extraction from Multiple Corpora , 2009, EMNLP.

[17]  Jun'ichi Tsujii,et al.  Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles , 2007, EMNLP.

[18]  Jun'ichi Tsujii,et al.  Protein-protein interaction extraction by leveraging multiple kernels and parsers , 2009, Int. J. Medical Informatics.

[19]  Dmitry Zelenko,et al.  Kernel methods for relation extraction , 2003 .

[20]  ChenHsinchun,et al.  Kernel-based learning for biomedical relation extraction , 2008 .

[21]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[22]  Ulf Leser,et al.  A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature , 2010, PLoS Comput. Biol..

[23]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..

[24]  Alessandro Moschitti,et al.  A Study on Convolution Kernels for Shallow Statistic Parsing , 2004, ACL.

[25]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[26]  Jihoon Yang,et al.  Walk-weighted subsequence kernels for protein-protein interaction extraction , 2010, BMC Bioinformatics.

[27]  Alessandro Moschitti,et al.  Convolution Kernels on Constituent, Dependency and Sequential Structures for Relation Extraction , 2009, EMNLP.

[28]  Dragomir R. Radev,et al.  Semi-Supervised Classification for Extracting Protein Interaction Sentences using Dependency Parsing , 2007, EMNLP.

[29]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[30]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[31]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[32]  Jari Björne,et al.  Comparative analysis of five protein-protein interaction corpora , 2008, BMC Bioinformatics.

[33]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[34]  Igor Jurisica,et al.  Evaluation of linguistic features useful in extraction of interactions from PubMed; Application to annotating known, high-throughput and predicted interactions in I2D , 2009, Bioinform..

[35]  Jian Su,et al.  Exploring Various Knowledge in Relation Extraction , 2005, ACL.

[36]  Udo Hahn,et al.  Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks , 2010, EMNLP.

[37]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[38]  Guodong Zhou,et al.  Extracting relation information from text documents by exploring various types of knowledge , 2007, Inf. Process. Manag..

[39]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[40]  Claudio Giuliano,et al.  Exploiting Shallow Linguistic Information for Relation Extraction from Biomedical Literature , 2006, EACL.

[41]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[42]  Richard Johansson,et al.  The CoNLL 2008 Shared Task on Joint Parsing of Syntactic and Semantic Dependencies , 2008, CoNLL.

[43]  Yoshinobu Kano,et al.  Extracting Protein Interactions from Text with the Unified AkaneRE Event Extraction System , 2010, TCBB.

[44]  Ralph Grishman,et al.  Extracting Relations with Integrated Information Using Kernel Methods , 2005, ACL.

[45]  Fang Kong,et al.  Exploiting Constituent Dependencies for Tree Kernel-Based Semantic Relation Extraction , 2008, COLING.

[46]  Jiexun Li,et al.  Kernel-based learning for biomedical relation extraction , 2008 .

[47]  Jun'ichi Tsujii,et al.  Syntactic Features for Protein-Protein Interaction Extraction , 2007, LBM.

[48]  Masaki Murata,et al.  Extracting Protein-Protein Interaction Information from Biomedical Text with SVM , 2006, IEICE Trans. Inf. Syst..

[49]  Alessandro Moschitti,et al.  A Study on Dependency Tree Kernels for Automatic Extraction of Protein-Protein Interaction , 2011, BioNLP@ACL.

[50]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[51]  Rohit J. Kate,et al.  Comparative experiments on learning information extractors for proteins and their interactions , 2005, Artif. Intell. Medicine.