Extraction of Gene/Protein Interaction from Text Documents with Relation Kernel

Even though there are many databases for gene/protein interactions, most such data still exist only in the biomedical literature. They are spread in biomedical literature written in natural languages and they require much effort such as data mining for constructing well-structured data forms. As genomic research advances, knowledge discovery from a large collection of scientific papers is becoming more important for efficient biological and biomedical researches. In this paper, we present a relation kernel based interaction extraction method to resolve this problem. We extract gene/protein interactions of Yeast (S.cerevisiae) from text documents with relation kernel. Kernel for relation extraction is constructed with predefined interaction corpus and set of interaction patterns. Proposed relation kernel for interaction extraction only exploits shallow parsed documents. Experimental results show that the proposed kernel method achieves a recall rate of 78.3% and precision rate of 79.9% for gene/protein interaction extraction without full parsing efforts.