Detection of Gene Interactions Based on Syntactic Relations

Interactions between proteins and genes are considered essential in the description of biomolecular phenomena, and networks of interactions are applied in a system's biology approach. Recently, many studies have sought to extract information from biomolecular text using natural language processing technology. Previous studies have asserted that linguistic information is useful for improving the detection of gene interactions. In particular, syntactic relations among linguistic information are good for detecting gene interactions. However, previous systems give a reasonably good precision but poor recall. To improve recall without sacrificing precision, this paper proposes a three-phase method for detecting gene interactions based on syntactic relations. In the first phase, we retrieve syntactic encapsulation categories for each candidate agent and target. In the second phase, we construct a verb list that indicates the nature of the interaction between pairs of genes. In the last phase, we determine direction rules to detect which of two genes is the agent or target. Even without biomolecular knowledge, our method performs reasonably well using a small training dataset. While the first phase contributes to improve recall, the second and third phases contribute to improve precision. In the experimental results using ICML 05 Workshop on Learning Language in Logic (LLL05) data, our proposed method gave an F-measure of 67.2% for the test data, significantly outperforming previous methods. We also describe the contribution of each phase to the performance.

[1]  Miguel A. Andrade-Navarro,et al.  Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions , 1999, ISMB.

[2]  Jong C. Park,et al.  Bidirectional Incremental Parsing for Automatic Pathway Identification with Combinatory Categorial Grammar , 2000, Pacific Symposium on Biocomputing.

[3]  Michael J. E. Sternberg,et al.  Predicting the Sub-Cellular Location of Proteins from Text Using Support Vector Machines , 2001, Pacific Symposium on Biocomputing.

[4]  Dekang Lin,et al.  Dependency-Based Evaluation of Minipar , 2003 .

[5]  Anton Yuryev,et al.  Extracting human protein interactions from MEDLINE using a full-sentence parser , 2004, Bioinform..

[6]  Barbara Rosario,et al.  Classifying Semantic Relations in Bioscience Texts , 2004, ACL.

[7]  Hao Yu,et al.  Discovering patterns to extract protein-protein interactions from full texts , 2004, Bioinform..

[8]  Peer Bork,et al.  Large-scale Extraction of Protein/Gene Relations for Model Organisms , 2005 .

[9]  Jude Shavlik,et al.  Learning to Extract Genic Interactions Using Gleaner , 2005 .

[10]  Marco Roos,et al.  Learning Biological Interactions from Medline Abstracts , 2005 .

[11]  Jian Su,et al.  Protein-Protein Interaction Extraction: A Supervised Learning Approach} , 2005 .

[12]  K. Bretonnel Cohen,et al.  Corpus Design for Biomedical Natural Language Processing , 2005, LBLODMBS@IDMB.

[13]  P. Uetz,et al.  From protein networks to biological systems , 2005, FEBS letters.

[14]  Ralph Grishman,et al.  Extracting Relations with Integrated Information Using Kernel Methods , 2005, ACL.

[15]  Jan Blaťák,et al.  Learning genic interactions without expert domain knowledge:Comparison of different ILP algorithms. , 2005 .

[16]  Ewan Klein,et al.  Genic interaction extraction with semantic and syntactic chains , 2005 .

[17]  K. Bretonnel Cohen,et al.  Empirical data on corpus design and usage in biomedical natural language processing , 2005, AMIA.

[18]  Mark Stevenson,et al.  Automatically acquiring a linguistically motivated genic interaction extraction system , 2005, ICML 2005.

[19]  I. Jurisica,et al.  Confirming protein-protein interactions by text mining , 2006 .

[20]  Jian Su,et al.  A Composite Kernel to Extract Relations between Entities with Both Flat and Structured Features , 2006, ACL.

[21]  ChengXiang Zhai,et al.  A Systematic Exploration of the Feature Space for Relation Extraction , 2007, NAACL.

[22]  Fabio Rinaldi,et al.  Mining of relations between proteins over biomedical scientific literature using a deep-linguistic approach , 2007, Artif. Intell. Medicine.