An Ensemble Classifier to Predict Protein–Protein Interactions by Combining PSSM-based Evolutionary Information with Local Binary Pattern Model

Protein plays a critical role in the regulation of biological cell functions. Among them, whether proteins interact with each other has become a fundamental problem, because proteins usually perform their functions by interacting with other proteins. Although a large amount of protein–protein interactions (PPIs) data has been produced by high-throughput biotechnology, the disadvantage of biological experimental technique is time-consuming and costly. Thus, computational methods for predicting protein interactions have become a research hot spot. In this research, we propose an efficient computational method that combines Rotation Forest (RF) classifier with Local Binary Pattern (LBP) feature extraction method to predict PPIs from the perspective of Position-Specific Scoring Matrix (PSSM). The proposed method has achieved superior performance in predicting Yeast, Human, and H. pylori datasets with average accuracies of 92.12%, 96.21%, and 86.59%, respectively. In addition, we also evaluated the performance of the proposed method on the four independent datasets of C. elegans, H. pylori, H. sapiens, and M. musculus datasets. These obtained experimental results fully prove that our model has good feasibility and robustness in predicting PPIs.

[1]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[2]  Zhu-Hong You,et al.  Increasing the reliability of protein-protein interaction networks via non-convex semantic embedding , 2013, Neurocomputing.

[3]  Sajid Javed,et al.  Local binary pattern variants-based adaptive texture features analysis for posed and nonposed facial expression recognition , 2017, J. Electronic Imaging.

[4]  Igor Jurisica,et al.  In silico prediction of physical protein interactions and characterization of interactome orphans , 2014, Nature Methods.

[5]  L. Aravind,et al.  A conserved NAD+ binding pocket that regulates protein-protein interactions during aging , 2017, Science.

[6]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[7]  Zhu-Hong You,et al.  Identifying Spurious Interactions in the Protein-Protein Interaction Networks Using Local Similarity Preserving Embedding , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  Jean-Luc Dugelay,et al.  An Efficient LBP-Based Descriptor for Facial Depth Images Applied to Gender Recognition Using RGB-D Face Data , 2012, ACCV Workshops.

[9]  Zhu-Hong You,et al.  ILNCSIM: improved lncRNA functional similarity calculation model , 2016, Oncotarget.

[10]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Hai-Cheng Yi,et al.  A Deep Learning Framework for Robust and Accurate Prediction of ncRNA-Protein Interactions Using Evolutionary Information , 2018, Molecular therapy. Nucleic acids.

[12]  Lei Zhang,et al.  Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. , 2014, Current protein & peptide science.

[13]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[14]  David L. Wild,et al.  Improving protein-protein interaction prediction using evolutionary information from low-quality MSAs , 2017, PloS one.

[15]  A. Gierer Model for DNA and Protein Interactions and the Function of the Operator , 1966, Nature.

[16]  Zhu-Hong You,et al.  Detecting Protein-Protein Interactions with a Novel Matrix-Based Protein Sequence Representation and Support Vector Machines , 2015, BioMed research international.

[17]  B. Honig,et al.  Structure-based prediction of protein-protein interactions on a genome-wide scale , 2012, Nature.

[18]  Zhu-Hong You,et al.  Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis , 2013, BMC Bioinformatics.

[19]  Xiaobo Zhou,et al.  A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network , 2010, BMC Bioinformatics.

[20]  Xing Chen,et al.  PCVMZM: Using the Probabilistic Classification Vector Machines Model Combined with a Zernike Moments Descriptor to Predict Protein–Protein Interactions from Protein Sequences , 2017, International journal of molecular sciences.

[21]  Hui Wang,et al.  Efficient prediction of human protein-protein interactions at a global scale , 2014, BMC Bioinformatics.

[22]  Lee A. D. Cooper,et al.  The OncoPPi network of cancer-focused protein–protein interactions to inform biological insights and therapeutic strategies , 2017, Nature Communications.

[23]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[24]  W. DeGrado,et al.  Spontaneous and specific chemical cross-linking in live cells to capture and identify protein interactions , 2017, Nature Communications.

[25]  MengChu Zhou,et al.  Highly Efficient Framework for Predicting Interactions Between Proteins , 2017, IEEE Transactions on Cybernetics.

[26]  Zhu-Hong You,et al.  An ensemble approach for large-scale identification of protein-protein interactions using the alignments of multiple sequences , 2016, Oncotarget.

[27]  Jie Gui,et al.  Prediction of protein-protein interactions from protein sequence using local descriptors. , 2010, Protein and peptide letters.

[28]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[29]  Fei Guo,et al.  Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree , 2017, PloS one.

[30]  Zhu-Hong You,et al.  Using the Relevance Vector Machine Model Combined with Local Phase Quantization to Predict Protein-Protein Interactions from Protein Sequences , 2016, BioMed research international.

[31]  Zhen Ji,et al.  Large-Scale Protein-Protein Interactions Detection by Integrating Big Biosensing Data with Computational Model , 2014, BioMed research international.

[32]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[33]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[34]  Zhu-Hong You,et al.  Detection of Interactions between Proteins through Rotation Forest and Local Phase Quantization Descriptors , 2015, International journal of molecular sciences.

[35]  Hareton K. N. Leung,et al.  Improving network topology-based protein interactome mapping via collaborative filtering , 2015, Knowl. Based Syst..

[36]  Zhu-Hong You,et al.  Using Weighted Sparse Representation Model Combined with Discrete Cosine Transformation to Predict Protein-Protein Interactions from Protein Sequence , 2015, BioMed research international.

[37]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Ulrich Schlecht,et al.  A scalable double-barcode sequencing platform for characterization of dynamic protein-protein interactions , 2017, Nature Communications.

[39]  Yong Zhou,et al.  Prediction of Protein–Protein Interactions with Clustered Amino Acids and Weighted Sparse Representation , 2015, International journal of molecular sciences.

[40]  Amir Ahooye Atashin,et al.  A two-stage learning method for protein-protein interaction prediction , 2016, ArXiv.

[41]  Yun Gao,et al.  Prediction of Protein-Protein Interactions Using Local Description of Amino Acid Sequence , 2011 .

[42]  Xiaolong Wang,et al.  repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects , 2015, Bioinform..

[43]  Lei Huang,et al.  Protein-protein interaction prediction based on multiple kernels and partial network with linear programming , 2016, BMC Systems Biology.

[44]  Vasant Honavar,et al.  Predicting RNA-Protein Interactions Using Only Sequence Information , 2011, BMC Bioinformatics.