Detecting Protein-Protein Interactions with a Novel Matrix-Based Protein Sequence Representation and Support Vector Machines

Proteins and their interactions lie at the heart of most underlying biological processes. Consequently, correct detection of protein-protein interactions (PPIs) is of fundamental importance to understand the molecular mechanisms in biological systems. Although the convenience brought by high-throughput experiment in technological advances makes it possible to detect a large amount of PPIs, the data generated through these methods is unreliable and may not be completely inclusive of all possible PPIs. Targeting at this problem, this study develops a novel computational approach to effectively detect the protein interactions. This approach is proposed based on a novel matrix-based representation of protein sequence combined with the algorithm of support vector machine (SVM), which fully considers the sequence order and dipeptide information of the protein primary sequence. When performed on yeast PPIs datasets, the proposed method can reach 90.06% prediction accuracy with 94.37% specificity at the sensitivity of 85.74%, indicating that this predictor is a useful tool to predict PPIs. Achieved results also demonstrate that our approach can be a helpful supplement for the interactions that have been detected experimentally.

[1]  K. Sneppen,et al.  Modeling the two-hybrid detector: experimental bias on protein interaction networks. , 2007, Biophysical journal.

[2]  Yaoqi Zhou,et al.  SPOT-Seq-RNA: predicting protein-RNA complex structure and RNA-binding function by fold recognition and binding affinity prediction. , 2014, Methods in molecular biology.

[3]  Yun Gao,et al.  Prediction of Protein-Protein Interactions Using Local Description of Amino Acid Sequence , 2011 .

[4]  Jia-Feng Yu,et al.  A novel 2D graphical representation of protein sequence based on individual amino acid , 2011 .

[5]  Yaoqi Zhou,et al.  Specific interactions for ab initio folding of protein terminal regions with secondary structures , 2008, Proteins.

[6]  B. Honig,et al.  Structure-based prediction of protein-protein interactions on a genome-wide scale , 2012, Nature.

[7]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[9]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[10]  Xing-Ming Zhao,et al.  Gene function prediction using labeled and unlabeled data , 2008, BMC Bioinformatics.

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  Shuai Li,et al.  A MapReduce based parallel SVM for large-scale predicting protein-protein interactions , 2014, Neurocomputing.

[13]  De-Shuang Huang,et al.  Predicting protein–protein interactions from sequence using correlation coefficient and high-quality interaction dataset , 2010, Amino Acids.

[14]  Hongbin Shen,et al.  Large-scale prediction of human protein-protein interactions from amino acid sequence based on latent topic features. , 2010, Journal of proteome research.

[15]  Ivan Markovsky,et al.  Software for weighted structured low-rank approximation , 2014, J. Comput. Appl. Math..

[16]  Yaoqi Zhou,et al.  Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates , 2011, Bioinform..

[17]  Kazuyuki Aihara,et al.  Protein function prediction with high-throughput data , 2008, Amino Acids.

[18]  Xin Li,et al.  Protein classification with imbalanced data , 2007, Proteins.

[19]  Zhu-Hong You,et al.  Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data , 2010, Bioinform..

[20]  Hao Lin The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. , 2008, Journal of theoretical biology.

[21]  Loris Nanni,et al.  Hyperplanes for predicting protein-protein interactions , 2005, Neurocomputing.

[22]  Zhen Ji,et al.  Large-Scale Protein-Protein Interactions Detection by Integrating Big Biosensing Data with Computational Model , 2014, BioMed research international.

[23]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[24]  Kazuyuki Aihara,et al.  Protein domain annotation with integration of heterogeneous information sources , 2008, Proteins.

[25]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[26]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[27]  Xing-Ming Zhao,et al.  Classifying protein sequences using hydropathy blocks , 2006, Pattern Recognit..

[28]  Zhu-Hong You,et al.  Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis , 2013, BMC Bioinformatics.

[29]  David A. Gough,et al.  Whole-proteome interaction mining , 2003, Bioinform..

[30]  Xiao Sun,et al.  TN curve: A novel 3D graphical representation of DNA sequence based on trinucleotides and its applications , 2009, Journal of Theoretical Biology.

[31]  Xiao Sun,et al.  Analysis of Similarities/Dissimilarities of DNA Sequences Based on a Novel Graphical Representation , 2010 .

[32]  Jie Gui,et al.  Prediction of protein-protein interactions from protein sequence using local descriptors. , 2010, Protein and peptide letters.

[33]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[34]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..