Prediction of protein-protein interactions from amino acid sequences using extreme learning machine combined with auto covariance descriptor

Protein-protein interactions (PPIs) are crucial for almost all cellular processes, including metabolic cycles, DNA transcription and replication, and signaling cascades. Unfortunately, the experimental methods for identifying PPIs are both time-consuming and expensive. Therefore, it is important to develop computational approaches for predicting PPIs. In this paper, a sequence-based method was developed for identifying new protein-protein interactions (PPIs) by means of Extreme Learning Machine (ELM) combined with a novel representation using auto covariance (AC). The AC descriptors account for the interactions between residues a certain distance apart in the protein sequence, thus this method adequately takes the neighboring effect into account and enables us to extract more PPI information from the protein sequences. ELM is a kind of accurate and fast-learning innovative classification method based on the random generation of the input-to-hidden-units weights followed by the resolution of the linear equations to obtain the hidden-tooutput weights. When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 90.42% prediction accuracy with 90.12% sensitivity at the precision of 90.67%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Achieved results show that the proposed approach is very promising for predicting PPI, and would make a helpful supplement to experimental approaches.

[1]  Zhu-Hong You,et al.  Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data , 2010, Bioinform..

[2]  Zhen Ji,et al.  Assessing and predicting protein interactions by combining manifold embedding with multiple information integration , 2012, BMC Bioinformatics.

[3]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[4]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[5]  Kyungsook Han,et al.  Sequence-based prediction of protein-protein interactions by means of rotation forest and autocorrelation descriptor. , 2010, Protein and peptide letters.

[6]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[7]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[8]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[9]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[10]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[11]  Yanjun Qi,et al.  Random Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources , 2004, Pacific Symposium on Biocomputing.

[12]  Xing-Ming Zhao,et al.  Improved method for predicting phi-turns in proteins using a two-stage classifier. , 2010, Protein and peptide letters.

[13]  De-Shuang Huang,et al.  Predicting protein–protein interactions from sequence using correlation coefficient and high-quality interaction dataset , 2010, Amino Acids.

[14]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[15]  Joo Chuan Tong,et al.  Prediction of protein allergenicity using local description of amino acid sequence. , 2008, Frontiers in bioscience : a journal and virtual library.

[16]  S. Wold,et al.  DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least-squares projections to latent structures , 1993 .

[17]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Jie Gui,et al.  Prediction of protein-protein interactions from protein sequence using local descriptors. , 2010, Protein and peptide letters.

[19]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[20]  Xiaobo Zhou,et al.  A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network , 2010, BMC Bioinformatics.

[21]  Zhu-Hong You,et al.  Increasing Reliability of Protein Interactome by Combining Heterogeneous Data Sources with Weighted Network Topological Metrics , 2010, ICIC.

[22]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[23]  Zhu-Hong You,et al.  Increasing reliability of protein interactome by fast manifold embedding , 2013, Pattern Recognit. Lett..

[24]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[25]  Q. M. Jonathan Wu,et al.  A fast recognition framework based on extreme learning machine using hybrid object information , 2010, Neurocomputing.