论文信息 - Biomedical Relationship Extraction from Literature Based on Bio-semantic Token Subsequences

Biomedical Relationship Extraction from Literature Based on Bio-semantic Token Subsequences

Relationship Extraction (RE) from biomedical literature is an important and challenging problem in both text mining and bioinformatics. Although various approaches have been proposed to extract protein-protein interaction types, their accuracy rates leave a large room for further exploration of more effective methods. In this paper, two supervised learning algorithms based on newly-defined “bio-semantic token subsequence” are proposed for multi-class biomedical relationship extraction. The first approach calculates a “bio-semantic token subsequence kernel”, while the second one explicitly extracts weighted features from bio-semantic token subsequences. The proposed structure called “bio-semantic token subsequence” is able to capture semantic features from natural language sentences for biomedical RE. Two supervised learning algorithms based on the proposed structure outperform the state-of-the-art biomedical RE methods on multi-class protein-protein interaction extraction.

Vijay V. Raghavan | Ying Xie | Jayasimha Reddy Katukuri | J. Katukuri | Ying Xie

[1] Jean-Michel Renders,et al. Word-Sequence Kernels , 2003, J. Mach. Learn. Res..

[2] Jude W. Shavlik,et al. Learning Ensembles of First-Order Clauses for Recall-Precision Curves: A Case Study in Biomedical Information Extraction , 2004, ILP.

[3] C. Ouzounis,et al. Automatic extraction of protein interactions from scientific abstracts. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[4] Dragomir R. Radev,et al. Semi-Supervised Classification for Extracting Protein Interaction Sentences using Dependency Parsing , 2007, EMNLP.

[5] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6] Dmitry Zelenko,et al. Kernel Methods for Relation Extraction , 2002, J. Mach. Learn. Res..

[7] Burr Settles. ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text , 2005 .

[8] Russ B Altman,et al. Extracting and characterizing gene-drug relationships from the literature. , 2004, Pharmacogenetics.

[9] Nello Cristianini,et al. Classification using String Kernels , 2000 .

[10] Michael Krauthammer,et al. GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles , 2001, ISMB.

[11] Razvan C. Bunescu,et al. Subsequence Kernels for Relation Extraction , 2005, NIPS.

[12] Anton Yuryev,et al. Extracting human protein interactions from MEDLINE using a full-sentence parser , 2004, Bioinform..

[13] Barbara Rosario,et al. Multi-way Relation Classification: Application to Protein-Protein Interactions , 2005, HLT.

[14] Mark Craven,et al. Representing Sentence Structure in Hidden Markov Models for Information Extraction , 2001, IJCAI.

[15] Alfonso Valencia,et al. The Frame-Based Module of the SUISEKI Information Extraction System , 2002, IEEE Intell. Syst..

[16] Jiawei Han,et al. Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.