Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding

BackgroundProteins are the important molecules which participate in virtually every aspect of cellular function within an organism in pairs. Although high-throughput technologies have generated considerable protein-protein interactions (PPIs) data for various species, the processes of experimental methods are both time-consuming and expensive. In addition, they are usually associated with high rates of both false positive and false negative results. Accordingly, a number of computational approaches have been developed to effectively and accurately predict protein interactions. However, most of these methods typically perform worse when other biological data sources (e.g., protein structure information, protein domains, or gene neighborhoods information) are not available. Therefore, it is very urgent to develop effective computational methods for prediction of PPIs solely using protein sequence information.ResultsIn this study, we present a novel computational model combining weighted sparse representation based classifier (WSRC) and global encoding (GE) of amino acid sequence. Two kinds of protein descriptors, composition and transition, are extracted for representing each protein sequence. On the basis of such a feature representation, novel weighted sparse representation based classifier is introduced to predict protein interaction class. When the proposed method was evaluated with the PPIs data of S. cerevisiae, Human and H. pylori, it achieved high prediction accuracies of 96.82, 97.66 and 92.83 % respectively. Extensive experiments were performed for cross-species PPIs prediction and the prediction accuracies were also very promising.ConclusionsTo further evaluate the performance of the proposed method, we then compared its performance with the method based on support vector machine (SVM). The results show that the proposed method achieved a significant improvement. Thus, the proposed method is a very efficient method to predict PPIs and may be a useful supplementary tool for future proteomics studies.

[1]  Shuai Li,et al.  A MapReduce based parallel SVM for large-scale predicting protein-protein interactions , 2014, Neurocomputing.

[2]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[3]  Gabriele Ausiello,et al.  MINT: the Molecular INTeraction database , 2006, Nucleic Acids Res..

[4]  Hareton K. N. Leung,et al.  Improving network topology-based protein interactome mapping via collaborative filtering , 2015, Knowl. Based Syst..

[5]  Zhu-Hong You,et al.  Increasing the reliability of protein-protein interaction networks via non-convex semantic embedding , 2013, Neurocomputing.

[6]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[7]  Zhu-Hong You,et al.  Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis , 2013, BMC Bioinformatics.

[8]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[9]  De-Shuang Huang,et al.  Predicting protein–protein interactions from sequence using correlation coefficient and high-quality interaction dataset , 2010, Amino Acids.

[10]  Jie Gui,et al.  Prediction of protein-protein interactions from protein sequence using local descriptors. , 2010, Protein and peptide letters.

[11]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[12]  David A. Gough,et al.  Whole-proteome interaction mining , 2003, Bioinform..

[13]  Qionghai Dai,et al.  WBSMDA: Within and Between Score for MiRNA-Disease Association prediction , 2016, Scientific Reports.

[14]  Loris Nanni,et al.  Hyperplanes for predicting protein-protein interactions , 2005, Neurocomputing.

[15]  Mona Singh,et al.  Predicting functionally important residues from sequence conservation , 2007, Bioinform..

[16]  Zhu-Hong You,et al.  Predicting Protein-Protein Interactions from Primary Protein Sequences Using a Novel Multi-Scale Local Feature Representation Scheme and the Random Forest , 2015, PloS one.

[17]  Tianming Wang,et al.  Protein‐based phylogenetic analysis by using hydropathy profile of amino acids , 2006, FEBS letters.

[18]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[19]  Honghua Tan,et al.  Advances in Computer Science and Education Applications , 2011 .

[20]  Mao-Zu Guo,et al.  Prediction of Protein-Protein Interactions from Secondary Structures in Binding Motifs Using the Statistic Method , 2008, 2008 Fourth International Conference on Natural Computation.

[21]  Zhu-Hong You,et al.  Detection of Interactions between Proteins through Rotation Forest and Local Phase Quantization Descriptors , 2015, International journal of molecular sciences.

[22]  Zheng-Hua Wang,et al.  A New Encoding Scheme to Improve the Performance of Protein Structural Class Prediction , 2005, ICNC.

[23]  A. Valencia,et al.  In silico two‐hybrid system for the selection of physically interacting protein pairs , 2002, Proteins.

[24]  Xiaolong Wang,et al.  Exploiting residue-level and profile-level interface propensities for usage in binding sites prediction of proteins , 2007, BMC Bioinformatics.

[25]  Xiaobo Zhou,et al.  A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network , 2010, BMC Bioinformatics.

[26]  Yun Gao,et al.  Prediction of Protein-Protein Interactions Using Local Description of Amino Acid Sequence , 2011 .

[27]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Zhiyong Pei,et al.  Prediction of Protein-Protein Interactions in Saccharomyces cerevisiae Based on Protein Secondary Structure , 2012, 2012 International Conference on Biomedical Engineering and Biotechnology.

[29]  Ian M. Donaldson,et al.  BIND: the Biomolecular Interaction Network Database , 2001, Nucleic Acids Res..

[30]  Zhu-Hong You,et al.  Detecting Protein-Protein Interactions with a Novel Matrix-Based Protein Sequence Representation and Support Vector Machines , 2015, BioMed research international.

[31]  Jean-Loup Faulon,et al.  Predicting protein-protein interactions using signature products , 2005, Bioinform..

[32]  Loris Nanni,et al.  An ensemble of K-local hyperplanes for predicting protein-protein interactions , 2006, Bioinform..

[33]  Zhen Ji,et al.  Large-Scale Protein-Protein Interactions Detection by Integrating Big Biosensing Data with Computational Model , 2014, BioMed research international.

[34]  Ying-Ke Lei,et al.  Face recognition via Weighted Sparse Representation , 2013, J. Vis. Commun. Image Represent..

[35]  Yanxin Huang,et al.  Prediction of Bioluminescent Proteins Using Auto Covariance Transformation of Evolutional Profiles , 2012, International journal of molecular sciences.

[36]  T. Takagi,et al.  Prediction of protein-protein interaction sites using support vector machines. , 2004, Protein engineering, design & selection : PEDS.

[37]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[38]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[39]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Huan-Xiang Zhou,et al.  Prediction of interface residues in protein–protein complexes by a consensus neural network method: Test against NMR data , 2005, Proteins.

[41]  Zhen Ji,et al.  Assessing and predicting protein interactions by combining manifold embedding with multiple information integration , 2012, BMC Bioinformatics.

[42]  Zhu-Hong You,et al.  Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data , 2010, Bioinform..

[43]  Wang Yi-fei Prediction of Protein-Protein Interaction Sites Using Support Vector Machine , 2008 .