Kernel-based learning for biomedical relation extraction

Relation extraction is the process of scanning text for relationships between named entities. Recently, significant studies have focused on automatically extracting relations from biomedical corpora. Most existing biomedical relation extractors require manual creation of biomedical lexicons or parsing templates based on domain knowledge. In this study, we propose to use kernel-based learning methods to automatically extract biomedical relations from literature text. We develop a framework of kernel-based learning for biomedical relation extraction. In particular, we modified the standard tree kernel function by incorporating a trace kernel to capture richer contextual information. In our experiments on a biomedical corpus, we compare different kernel functions for biomedical relation detection and classification. The experimental results show that a tree kernel outperforms word and sequence kernels for relation detection, our trace-tree kernel outperforms the standard tree kernel, and a composite kernel outperforms individual kernels for relation extraction. © 2008 Wiley Periodicals, Inc.

[1]  Nello Cristianini,et al.  Composite Kernels for Hypertext Categorisation , 2001, ICML.

[2]  Hagit Shatkay,et al.  Mining the Biomedical Literature in the Genomic Era: An Overview , 2003, J. Comput. Biol..

[3]  Masaki Murata,et al.  Extracting Protein-Protein Interaction Information from Biomedical Text with SVM , 2006, IEICE Trans. Inf. Syst..

[4]  Alfonso Valencia,et al.  Overview of BioCreAtIvE: critical assessment of information extraction for biology , 2005, BMC Bioinformatics.

[5]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[6]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[7]  B J Stapley,et al.  Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[8]  Sanda M. Harabagiu,et al.  Shallow Semantics for Relation Extraction , 2005, IJCAI.

[9]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..

[10]  Peter Willett,et al.  Protein Structures and Information Extraction from Biological Texts: The PASTA System , 2003, Bioinform..

[11]  Rohit J. Kate,et al.  Comparative experiments on learning information extractors for proteins and their interactions , 2005, Artif. Intell. Medicine.

[12]  James Pustejovsky,et al.  Robust Relational Parsing Over Biomedical Literature: Extracting Inhibit Relations , 2001, Pacific Symposium on Biocomputing.

[13]  Barbara Rosario,et al.  Multi-way Relation Classification: Application to Protein-Protein Interactions , 2005, HLT.

[14]  Jun'ichi Tsujii,et al.  Event Extraction from Biomedical Papers Using a Full Parser , 2000, Pacific Symposium on Biocomputing.

[15]  Tong Zhang,et al.  An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[16]  Michael Krauthammer,et al.  GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles , 2001, ISMB.

[17]  Thomas C. Rindflesch,et al.  EDGAR: extraction of drugs, genes and relations from the biomedical literature. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[18]  Jiawei Han,et al.  PEBL: Web page classification without negative examples , 2004, IEEE Transactions on Knowledge and Data Engineering.

[19]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[20]  Joel D. Martin,et al.  PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine , 2003, BMC Bioinformatics.

[21]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[22]  Ralph Grishman,et al.  Extracting Relations with Integrated Information Using Kernel Methods , 2005, ACL.

[23]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[24]  Burr Settles,et al.  ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text , 2005 .

[25]  Ke Wang,et al.  Localization site prediction for membrane proteins by integrating rule and SVM classification , 2005, IEEE Transactions on Knowledge and Data Engineering.

[26]  Matthew Lease,et al.  Parsing Biomedical Literature , 2005, IJCNLP.

[27]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[28]  Sergei Egorov,et al.  MedScan, a natural language processing engine for MEDLINE abstracts , 2003, Bioinform..

[29]  Jong C. Park,et al.  Bidirectional Incremental Parsing for Automatic Pathway Identification with Combinatory Categorial Grammar , 2000, Pacific Symposium on Biocomputing.

[30]  Haitao Zhao,et al.  A novel incremental principal component analysis and its application for face recognition , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[31]  Jaideep Srivastava,et al.  Blocking reduction strategies in hierarchical text classification , 2004, IEEE Transactions on Knowledge and Data Engineering.

[32]  Dmitry Zelenko,et al.  Kernel Methods for Relation Extraction , 2002, J. Mach. Learn. Res..

[33]  Hsinchun Chen,et al.  Aggregating automatically extracted regulatory pathway relations , 2006, IEEE Transactions on Information Technology in Biomedicine.

[34]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[35]  Hsinchun Chen,et al.  Extracting gene pathway relations using a hybrid grammar: the Arizona Relation Parser , 2004, Bioinform..

[36]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[37]  Hsinchun Chen,et al.  A shallow parser based on closed-class words to capture relations in biomedical text , 2003, J. Biomed. Informatics.

[38]  Ioannis Xenarios,et al.  Mining literature for protein-protein interactions , 2001, Bioinform..