An Improved Deep Forest Model for Predicting Self-Interacting Proteins From Protein Sequence Using Wavelet Transformation

Self-interacting proteins (SIPs), whose more than two identities can interact with each other, play significant roles in the understanding of cellular process and cell functions. Although a number of experimental methods have been designed to detect the SIPs, they remain to be extremely time-consuming, expensive, and challenging even nowadays. Therefore, there is an urgent need to develop the computational methods for predicting SIPs. In this study, we propose a deep forest based predictor for accurate prediction of SIPs using protein sequence information. More specifically, a novel feature representation method, which integrate position-specific scoring matrix (PSSM) with wavelet transform, is introduced. To evaluate the performance of the proposed method, cross-validation tests are performed on two widely used benchmark datasets. The experimental results show that the proposed model achieved high accuracies of 95.43 and 93.65% on human and yeast datasets, respectively. The AUC value for evaluating the performance of the proposed method was also reported. The AUC value for yeast and human datasets are 0.9203 and 0.9586, respectively. To further show the advantage of the proposed method, it is compared with several existing methods. The results demonstrate that the proposed model is better than other SIPs prediction methods. This work can offer an effective architecture to biologists in detecting new SIPs.

[1]  Hafumi Nishi,et al.  Caught in self-interaction: evolutionary and functional mechanisms of protein homooligomerization , 2011, Physical biology.

[2]  Reza Ebrahimpour,et al.  LocFuse: human protein-protein interaction prediction via classifier fusion using protein localization information. , 2014, Genomics.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  S.N. Tandon,et al.  Using wavelet transforms for ECG characterization. An on-line digital signal processing system , 1997, IEEE Engineering in Medicine and Biology Magazine.

[5]  Ingrid Daubechies,et al.  The wavelet transform, time-frequency localization and signal analysis , 1990, IEEE Trans. Inf. Theory.

[6]  Xing Chen,et al.  Highly Accurate Prediction of Protein-Protein Interactions via Incorporating Evolutionary Information and Physicochemical Characteristics , 2016, International journal of molecular sciences.

[7]  S F Altschul,et al.  Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases. , 1998, Trends in biochemical sciences.

[8]  Xing Chen,et al.  Improved protein-protein interactions prediction via weighted sparse representation model combining continuous wavelet descriptor and PseAA composition , 2016, BMC Systems Biology.

[9]  Kuo-Chen Chou,et al.  Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition , 2016, Journal of biomolecular structure & dynamics.

[10]  K. Chou,et al.  Predicting protein-protein interactions from sequences in a hybridization space. , 2006, Journal of proteome research.

[11]  Xing Chen,et al.  PSPEL: In Silico Prediction of Self-Interacting Proteins from Amino Acids Sequences Using Ensemble Learning , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Reza Ebrahimpour,et al.  PPIevo: protein-protein interaction prediction from PSSM based evolutionary information. , 2013, Genomics.

[13]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[14]  G. Mengozzi,et al.  Assessment of Diagnostic and Prognostic Role of Copeptin in the Clinical Setting of Sepsis , 2016, BioMed research international.

[15]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2009 update , 2009, Nucleic Acids Res..

[16]  Xing Chen,et al.  MDHGI: Matrix Decomposition and Heterogeneous Graph Inference for miRNA-disease association prediction , 2018, PLoS Comput. Biol..

[17]  Jie Gui,et al.  A novel method for recognizing face with partial occlusion via sparse representation , 2013 .

[18]  Wanliang Wang,et al.  Iterative Re-Constrained Group Sparse Face Recognition With Adaptive Weights Learning , 2017, IEEE Transactions on Image Processing.

[19]  Jiangning Song,et al.  SPAR: a random forest-based predictor for self-interacting proteins with fine-grained domain information , 2016, Amino Acids.

[20]  Xing Chen,et al.  Long non-coding RNAs and complex diseases: from experimental results to computational models , 2016, Briefings Bioinform..

[21]  Zhen Ji,et al.  Large-Scale Protein-Protein Interactions Detection by Integrating Big Biosensing Data with Computational Model , 2014, BioMed research international.

[22]  Ying-Ke Lei,et al.  Face recognition via Weighted Sparse Representation , 2013, J. Vis. Commun. Image Represent..

[23]  Zhu-Hong You,et al.  Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis , 2013, BMC Bioinformatics.

[24]  Xiaobo Zhou,et al.  A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network , 2010, BMC Bioinformatics.

[25]  Sylvie Ricard-Blum,et al.  MatrixDB, the extracellular matrix interaction database: updated content, a new navigator and expanded functionalities , 2014, Nucleic Acids Res..

[26]  W. Staszewski IDENTIFICATION OF NON-LINEAR SYSTEMS USING MULTI-SCALE RIDGES AND SKELETONS OF THE WAVELET TRANSFORM , 1998 .

[27]  Xing Chen,et al.  Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding , 2016, BMC Bioinformatics.

[28]  Xiuquan Du,et al.  A Novel Feature Extraction Scheme with Ensemble Coding for Protein–Protein Interaction Prediction , 2014, International journal of molecular sciences.

[29]  J. Matthews,et al.  The power of two: protein dimerization in biology. , 2004, Trends in biochemical sciences.

[30]  Xing Chen,et al.  LRSSLMDA: Laplacian Regularized Sparse Subspace Learning for MiRNA-Disease Association prediction , 2017, PLoS Comput. Biol..

[31]  Xing Chen,et al.  Predicting protein-protein interactions from protein sequences by a stacked sparse autoencoder deep neural network. , 2017, Molecular bioSystems.

[32]  Jian Wang,et al.  Proteome-wide Prediction of Self-interacting Proteins Based on Multiple Properties* , 2013, Molecular & Cellular Proteomics.

[33]  Jiangning Song,et al.  Can simple codon pair usage predict protein-protein interaction? , 2012, Molecular bioSystems.

[34]  Joao Castanheira,et al.  FOR PREDICTING PROTEIN-PROTEIN INTERACTIONS , 2018 .

[35]  Adrian S. Lewis,et al.  Image compression using the 2-D wavelet transform , 1992, IEEE Trans. Image Process..

[36]  Na-Na Guan,et al.  Predicting miRNA‐disease association based on inductive matrix completion , 2018, Bioinform..

[37]  Xing Chen,et al.  PBMDA: A novel and effective path-based computational model for miRNA-disease association prediction , 2017, PLoS Comput. Biol..

[38]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[39]  Yong Zhou,et al.  Advancing the prediction accuracy of protein-protein interactions by utilizing evolutionary information from position-specific scoring matrix and ensemble classifier. , 2017, Journal of theoretical biology.

[40]  Rafael C. Jimenez,et al.  The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases , 2013, Nucleic Acids Res..

[41]  Zhu-Hong You,et al.  An improved sequence-based prediction protocol for protein-protein interactions using amino acids substitution matrix and rotation forest ensemble classifiers , 2017, Neurocomputing.

[42]  Chao Wang,et al.  Locality Preserving Discriminant Projections , 2009, ICIC.

[43]  Johnson I. Agbinya,et al.  Discrete wavelet transform techniques in speech processing , 1996, Proceedings of Digital Processing Applications (TENCON '96).

[44]  Yang Li,et al.  PCLPred: A Bioinformatics Method for Predicting Protein–Protein Interactions by Combining Relevance Vector Machine Model with Low-Rank Matrix Approximation , 2018, International journal of molecular sciences.

[45]  Xing Chen,et al.  MicroRNAs and complex diseases: from experimental results to computational models , 2019, Briefings Bioinform..

[46]  Karin Breuer,et al.  InnateDB: systems biology of innate immunity and beyond—recent updates and continuing curation , 2012, Nucleic Acids Res..

[47]  Yong Zhou,et al.  Prediction of Protein–Protein Interactions with Clustered Amino Acids and Weighted Sparse Representation , 2015, International journal of molecular sciences.

[48]  Zhen Ji,et al.  Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set , 2014, BMC Bioinformatics.

[49]  Takashi Makino,et al.  Duplicability of self-interacting human genes , 2010, BMC Evolutionary Biology.

[50]  Zhu-Hong You,et al.  Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data , 2010, Bioinform..

[51]  MengChu Zhou,et al.  Highly Efficient Framework for Predicting Interactions Between Proteins , 2017, IEEE Transactions on Cybernetics.

[52]  Lei Wang,et al.  BNPMDA: Bipartite Network Projection for MiRNA–Disease Association prediction , 2018, Bioinform..

[53]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[54]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[55]  Shuai Li,et al.  A MapReduce based parallel SVM for large-scale predicting protein-protein interactions , 2014, Neurocomputing.

[56]  Tieniu Tan,et al.  Representative Vector Machines: A Unified Framework for Classical Classifiers , 2016, IEEE Transactions on Cybernetics.

[57]  Yong Zhou,et al.  Ens-PPI: A Novel Ensemble Classifier for Predicting the Interactions of Proteins Using Autocovariance Transformation from PSSM , 2016, BioMed research international.

[58]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[59]  Rick van der Zwan,et al.  Medication Adherence in Patients with Rheumatoid Arthritis: The Effect of Patient Education, Health Literacy, and Musculoskeletal Ultrasound , 2015, BioMed research international.

[60]  I. Ispolatov,et al.  Binding properties and evolution of homodimers in protein–protein interaction networks , 2005, Nucleic acids research.

[61]  Ji Feng,et al.  Deep Forest: Towards An Alternative to Deep Neural Networks , 2017, IJCAI.

[62]  Zhu-Hong You,et al.  Detecting Protein-Protein Interactions with a Novel Matrix-Based Protein Sequence Representation and Support Vector Machines , 2015, BioMed research international.

[63]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[64]  Long Zhang,et al.  Protein-protein interactions prediction based on ensemble deep neural networks , 2019, Neurocomputing.

[65]  Guo-Wei Wei,et al.  Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening , 2017, PLoS Comput. Biol..

[66]  Zhu-Hong You,et al.  Predicting Protein-Protein Interactions from Primary Protein Sequences Using a Novel Multi-Scale Local Feature Representation Scheme and the Random Forest , 2015, PloS one.

[67]  Kara Dolinski,et al.  The BioGRID interaction database: 2017 update , 2016, Nucleic Acids Res..

[68]  Xing Chen,et al.  PCVMZM: Using the Probabilistic Classification Vector Machines Model Combined with a Zernike Moments Descriptor to Predict Protein–Protein Interactions from Protein Sequences , 2017, International journal of molecular sciences.

[69]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[70]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.