A Tree Kernel-Based Method for Protein-Protein Interaction Mining from Biomedical Literature

As genomic research advances, the knowledge discovery from a large collection of scientific papers becomes more important for efficient biological and biomedical research. Even though current databases continue to update new protein-protein interactions, valuable information still remains in biomedical literature. Thus data mining techniques are required to extract the information. In this paper, we present a tree kernel-based method to mine protein-protein interactions from biomedical literature. The tree kernel is designed to consider grammatical structures for given sentences. A support vector machine classifier is combined with the tree kernel and trained on predefined interaction corpus and set of interaction patterns. Experimental results show that the proposed method gives promising results by utilizing the structure patterns.

[1]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[2]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[3]  Mark R. Gilder,et al.  Extraction of protein interaction information from unstructured text using a context-free grammar , 2003, Bioinform..

[4]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[5]  Jean-Michel Renders,et al.  Word-Sequence Kernels , 2003, J. Mach. Learn. Res..

[6]  Minghua Deng,et al.  Inferring Domain–Domain Interactions From Protein–Protein Interactions , 2002 .

[7]  Hao Yu,et al.  Discovering patterns to extract protein-protein interactions from full texts , 2004, Bioinform..

[8]  Hsinchun Chen,et al.  Filling Preposition-Based Templates to Capture Information from Medical Abstracts , 2001, Pacific Symposium on Biocomputing.

[9]  James Pustejovsky,et al.  Robust Relational Parsing Over Biomedical Literature: Extracting Inhibit Relations , 2001, Pacific Symposium on Biocomputing.

[10]  Hao Yu,et al.  Discovering patterns to extract protein-protein interactions from the literature: Part II , 2005, Bioinform..

[11]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[12]  Dmitry Zelenko,et al.  Kernel Methods for Relation Extraction , 2002, J. Mach. Learn. Res..

[13]  Jong C. Park,et al.  Bidirectional Incremental Parsing for Automatic Pathway Identification with Combinatory Categorial Grammar , 2000, Pacific Symposium on Biocomputing.

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  Jun'ichi Tsujii,et al.  Event Extraction from Biomedical Papers Using a Full Parser , 2000, Pacific Symposium on Biocomputing.

[16]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[17]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.