A novel matrix of sequence descriptors for predicting protein-protein interactions from amino acid sequences

Protein-protein interactions (PPIs) play an important role in the life activities of organisms. With the availability of large amounts of protein sequence data, PPIs prediction methods have attracted increasing attention. A variety of protein sequence coding methods have emerged, but the training of these methods is particularly time consuming. To solve this issue, we have proposed a novel matrix sequence coding method. Based on deep neural network (DNN) and a novel matrix protein sequence descriptor, we constructed a protein interaction prediction model for predicting PPIs. When performed on human PPIs data, the method achieved an accuracy of 94.34%, a recall of 98.28%, an area under the curve (AUC) of 97.79% and a loss of 23.25%. A non-redundant dataset was used to evaluate this prediction model, and the prediction accuracy is 88.29%. These results indicate that the matrix of sequence (MOS) descriptor can enhance the predictive power of PPIs and reduce training time, which can be a useful complement for future proteomics research. The experimental code and experimental results can be found at https://github.com/smalltalkman/hppi-tensorflow.

[1]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[2]  Elena Marchiori,et al.  Gaussian interaction profile kernels for predicting drug-target interaction , 2011, Bioinform..

[3]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Hongbin Shen,et al.  Large-scale prediction of human protein-protein interactions from amino acid sequence based on latent topic features. , 2010, Journal of proteome research.

[5]  Jie Gui,et al.  Prediction of protein-protein interactions from protein sequence using local descriptors. , 2010, Protein and peptide letters.

[6]  Ney Lemke,et al.  The Development of a Universal In Silico Predictor of Protein-Protein Interactions , 2013, PloS one.

[7]  Xue-wen Chen,et al.  Heterogeneous data integration by tree‐augmented naïve Bayes for protein–protein interactions prediction , 2013, Proteomics.

[8]  Ujjwal Maulik,et al.  Ensemble learning prediction of protein-protein interactions using proteins functional annotations. , 2014, Molecular bioSystems.

[9]  Gajendra P S Raghava,et al.  A simple approach for predicting protein-protein interactions. , 2010, Current protein & peptide science.

[10]  Luhua Lai,et al.  Sequence-based prediction of protein protein interaction using a deep-learning algorithm , 2017, BMC Bioinformatics.

[11]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[12]  Yanjun Qi,et al.  Random Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources , 2004, Pacific Symposium on Biocomputing.

[13]  Zhu-Hong You,et al.  An ensemble approach for large-scale identification of protein-protein interactions using the alignments of multiple sequences , 2016, Oncotarget.

[14]  Xing Chen,et al.  Robust and accurate prediction of protein self-interactions from amino acids sequence using evolutionary information. , 2016, Molecular bioSystems.

[15]  Toshihisa Takagi,et al.  Improving the Performance of an SVM-Based Method for Predicting Protein-Protein Interactions , 2006, Silico Biol..

[16]  Xing Chen,et al.  PSPEL: In Silico Prediction of Self-Interacting Proteins from Amino Acids Sequences Using Ensemble Learning , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  Yanjun Dong,et al.  CD11c+ CD8+ T Cells Reduce Renal Fibrosis Following Ureteric Obstruction by Inducing Fibroblast Apoptosis , 2016, International journal of molecular sciences.

[18]  Xing Chen,et al.  Highly Accurate Prediction of Protein-Protein Interactions via Incorporating Evolutionary Information and Physicochemical Characteristics , 2016, International journal of molecular sciences.

[19]  Gary D Bader,et al.  A Combined Experimental and Computational Strategy to Define Protein Interaction Networks for Peptide Recognition Modules , 2001, Science.

[20]  Long Zhang,et al.  Protein-Protein Interactions Prediction Using a Novel Local Conjoint Triad Descriptor of Amino Acid Sequences , 2017, International journal of molecular sciences.

[21]  Piyali Chatterjee,et al.  PPI_SVM: Prediction of protein-protein interactions using machine learning, domain-domain affinities and frequency tables , 2011, Cellular & Molecular Biology Letters.

[22]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[23]  Jun Li,et al.  Shakeout: A New Approach to Regularized Deep Neural Network Training , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Shao-Wu Zhang,et al.  Prediction of Protein–Protein Interaction with Pairwise Kernel Support Vector Machine , 2014, International journal of molecular sciences.

[25]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[26]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[27]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[28]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[29]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[30]  Shmuel Sattath,et al.  How reliable are experimental protein-protein interaction data? , 2003, Journal of molecular biology.

[31]  M. Gerstein,et al.  Global Analysis of Protein Activities Using Proteome Chips , 2001, Science.

[32]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[33]  Reza Salavati,et al.  Sequence-based prediction of protein-protein interactions by means of codon usage , 2008, Genome Biology.