A SVM-Based System for Predicting Protein-Protein Interactions Using a Novel Representation of Protein Sequences

Protein-protein interactions (PPIs) are crucial for almost all cellular processes, including metabolic cycles, DNA transcription and replication, and signaling cascades. However, the experimental methods for identifying PPIs are both time-consuming and expensive. Therefore, it is important to develop computational approaches for predicting PPIs. In this article, a sequence-based method is developed by combining a novel feature representation using binary coding and Support Vector Machine (SVM). The binary-coding-based descriptors account for the interactions between residues a certain distance apart in the protein sequence, thus this method adequately takes the neighboring effect into account and mine interaction information from the continuous and discontinuous amino acids segments at the same time. When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 86.93% prediction accuracy with 86.99% sensitivity at the precision of 86.90%. Extensive experiments are performed to compare our method with the existing sequence-based method. Achieved results show that the proposed approach is very promising for predicting PPI, so it can be a useful supplementary tool for future proteomics studies.

[1]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[2]  Zhu-Hong You,et al.  Increasing Reliability of Protein Interactome by Combining Heterogeneous Data Sources with Weighted Network Topological Metrics , 2010, ICIC.

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  De-Shuang Huang,et al.  Predicting protein–protein interactions from sequence using correlation coefficient and high-quality interaction dataset , 2010, Amino Acids.

[5]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[6]  Jie Gui,et al.  Prediction of protein-protein interactions from protein sequence using local descriptors. , 2010, Protein and peptide letters.

[7]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[8]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[9]  Yanjun Qi,et al.  Random Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources , 2004, Pacific Symposium on Biocomputing.

[10]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[11]  I. Rojas,et al.  Recursive prediction for long term time series forecasting using advanced models , 2007, Neurocomputing.

[12]  Xing-Ming Zhao,et al.  Improved method for predicting phi-turns in proteins using a two-stage classifier. , 2010, Protein and peptide letters.

[13]  Zhu-Hong You,et al.  Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data , 2010, Bioinform..

[14]  Zhen Ji,et al.  Assessing and predicting protein interactions by combining manifold embedding with multiple information integration , 2012, BMC Bioinformatics.

[15]  Kyungsook Han,et al.  Sequence-based prediction of protein-protein interactions by means of rotation forest and autocorrelation descriptor. , 2010, Protein and peptide letters.

[16]  Xiaobo Zhou,et al.  A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network , 2010, BMC Bioinformatics.

[17]  Joo Chuan Tong,et al.  Prediction of protein allergenicity using local description of amino acid sequence. , 2008, Frontiers in bioscience : a journal and virtual library.

[18]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[19]  Alex Alves Freitas,et al.  Optimizing amino acid groupings for GPCR classification , 2008, Bioinform..

[20]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[21]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.