Machine learning based identification of protein-protein interactions using derived features of physiochemical properties and evolutionary profiles

Proteins are the central constitute of a cell or biological system. Proteins execute their functions by interacting with other molecules such as RNA, DNA and other proteins. The major functionality of protein-protein interactions (PPIs) is the execution of biochemical activities in living species. Therefore, an accurate identification of PPIs becomes a challenging and demanding task for investigators from last few decades. Various traditional and computational methods have been applied but they have not achieved quite encouraging results. In order to extend the concept of computational model by incorporating intelligent, contemporary machine learning algorithms have been utilized for identification of PPIs. In this prediction model, protein sequences are expressed by using two distinct feature extraction methods namely: physiochemical properties of amino acids and evolutionary profiles method position specific scoring matrix (PSSM). Jackknife test and numerous performance parameters namely: specificity, recall, accuracy, MCC, precision, and F-measure were employed to compute the predictive quality of proposed model. After empirical analysis, it is determined that the proposed prediction model yielded encouraging predictive outcomes compared to existing state-of-the-art models. This achievement is ascribed with PSSM because it has clearly discerned a motif of PPIs. It is realized that the proposed prediction model will lead to be a practical and very useful tool for research community.

[1]  Wei Zheng,et al.  Improved Species-Specific Lysine Acetylation Site Prediction Based on a Large Variety of Features Set , 2016, PloS one.

[2]  Kenji Mizuguchi,et al.  Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites , 2010, Bioinform..

[3]  Hao Lin,et al.  Prediction of cell wall lytic enzymes using Chou's amphiphilic pseudo amino acid composition. , 2009, Protein and peptide letters.

[4]  G. Drewes,et al.  Global approaches to protein-protein interactions. , 2003, Current opinion in cell biology.

[5]  Hua Tang,et al.  Identification of Secretory Proteins in Mycobacterium tuberculosis Using Pseudo Amino Acid Composition , 2016, BioMed research international.

[6]  A. Valencia,et al.  Prediction of protein--protein interaction sites in heterocomplexes with neural networks. , 2002, European journal of biochemistry.

[7]  J. De las Rivas,et al.  Protein-protein interaction networks: unraveling the wiring of molecular machines within the cell. , 2012, Briefings in functional genomics.

[8]  Xuan Xiao,et al.  Prediction of Protein–Protein Interactions with Physicochemical Descriptors and Wavelet Transform via Random Forests , 2016, Journal of laboratory automation.

[9]  Wen-Lian Hsu,et al.  Protein-Protein Interaction Site Predictions with Three-Dimensional Probability Distributions of Interacting Atoms on Protein Surfaces , 2012, PloS one.

[10]  R. Russell,et al.  Targeting and tinkering with interaction networks. , 2008, Nature chemical biology.

[11]  Mark Gerstein,et al.  Bridging structural biology and genomics: assessing protein interaction data with known complexes. , 2002, Trends in genetics : TIG.

[12]  Wei Chen,et al.  Prediction of thermophilic proteins using feature selection technique. , 2011, Journal of microbiological methods.

[13]  E. Guney,et al.  iFrag: A Protein-Protein Interface Prediction Server Based on Sequence Fragments. , 2017, Journal of molecular biology.

[14]  R. Nussinov,et al.  Non-Redundant Unique Interface Structures as Templates for Modeling Protein Interactions , 2014, PloS one.

[15]  N. Raja,et al.  ANN Approach for Weather Prediction using Back Propagation , 2012 .

[16]  Bryan Kolaczkowski,et al.  Improving the accuracy of high-throughput protein-protein affinity prediction may require better training data , 2017, BMC Bioinformatics.

[17]  Xing Chen,et al.  Accurate prediction of protein-protein interactions by integrating potential evolutionary information embedded in PSSM profile and discriminative vector machine classifier , 2017, Oncotarget.

[18]  Hui Ding,et al.  AcalPred: A Sequence-Based Tool for Discriminating between Acidic and Alkaline Enzymes , 2013, PloS one.

[19]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[20]  T. Santhanam,et al.  PROBABILISTIC NEURAL NETWORK – A BETTER SOLUTION FOR NOISE CLASSIFICATION , 2011 .

[21]  Christopher W. V. Hogue,et al.  Structure-Templated Predictions of Novel Protein Interactions from Sequence Information , 2007, PLoS Comput. Biol..

[22]  Kuo-Chen Chou,et al.  Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition , 2016, Journal of biomolecular structure & dynamics.

[23]  Z. Weng,et al.  Protein–protein docking benchmark version 3.0 , 2008, Proteins.

[24]  Maqsood Hayat,et al.  Intelligent computational model for classification of sub-Golgi protein using oversampling and fisher feature selection methods , 2017, Artif. Intell. Medicine.

[25]  Hao Lin,et al.  Identifying Sigma70 Promoters with Novel Pseudo Nucleotide Composition , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  Mohammad Ganjtabesh,et al.  Improving protein complex prediction by reconstructing a high-confidence protein-protein interaction network of Escherichia coli from different physical interaction data sources , 2017, BMC Bioinformatics.

[27]  Gary D Bader,et al.  Computational Prediction of Protein–Protein Interactions , 2008, Molecular biotechnology.

[28]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[29]  Burkhard Rost,et al.  ISIS: interaction sites identified from sequence , 2007, Bioinform..

[30]  Muhammad Tahir,et al.  Sequence based predictor for discrimination of enhancer and their types by applying general form of Chou's trinucleotide composition , 2017, Comput. Methods Programs Biomed..

[31]  Asifullah Khan,et al.  MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM. , 2012, Journal of theoretical biology.

[32]  A. Bulpitt,et al.  Insights into protein-protein interfaces using a Bayesian network prediction method. , 2006, Journal of molecular biology.

[33]  Ashkan Golshani,et al.  Computational methods for predicting protein-protein interactions. , 2008, Advances in biochemical engineering/biotechnology.

[34]  Hao Lin The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. , 2008, Journal of theoretical biology.

[35]  Hui Ding,et al.  The prediction of protein structural class using averaged chemical shifts , 2012, Journal of biomolecular structure & dynamics.

[36]  Yongchun Zuo,et al.  iDPF-PseRAAAC: A Web-Server for Identifying the Defensin Peptide Family and Subfamily Using Pseudo Reduced Amino Acid Alphabet Composition , 2015, PloS one.

[37]  Asifullah Khan,et al.  WRF-TMH: predicting transmembrane helix by fusing composition index and physicochemical properties of amino acids , 2013, Amino Acids.

[38]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[39]  Muhammad Tahir,et al.  PSOFuzzySVM-TMH: identification of transmembrane helix segments using ensemble feature space by incorporated fuzzy support vector machine. , 2015, Molecular bioSystems.

[40]  Hong-Bin Shen,et al.  Prediction of Protein–Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures , 2015, The Journal of Membrane Biology.

[41]  Kuo-Chen Chou,et al.  Prediction of Protein Structural Classes by Support Vector Machines , 2002, Comput. Chem..

[42]  Wei Chen,et al.  iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. , 2014, Analytical biochemistry.

[43]  Guangfeng Song,et al.  HIV-1, human interaction database: current status and new features , 2014, Nucleic Acids Res..

[44]  A. Valencia,et al.  Computational methods for the prediction of protein interactions. , 2002, Current opinion in structural biology.

[45]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[46]  Kuo-Chen Chou,et al.  Predicting Functions of Proteins in Mouse Based on Weighted Protein-Protein Interaction Network and Protein Hybrid Properties , 2011, PloS one.

[47]  Zaheer Ahmed,et al.  Protein-protein interactions among enzymes of starch biosynthesis in high-amylose barley genotypes reveal differential roles of heteromeric enzyme complexes in the synthesis of A and B granules. , 2015, Plant science : an international journal of experimental plant biology.

[48]  Wei Chen,et al.  iOri-Human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition , 2016, Oncotarget.

[49]  H. Ding,et al.  Identification of mitochondrial proteins of malaria parasite using analysis of variance , 2014, Amino Acids.

[50]  Ruth Nussinov,et al.  An overview of recent advances in structural bioinformatics of protein-protein interactions and a guide to their principles. , 2014, Progress in biophysics and molecular biology.

[51]  Zaheer Ullah Khan,et al.  Discrimination of acidic and alkaline enzyme using Chou's pseudo amino acid composition in conjunction with probabilistic neural network model. , 2015, Journal of theoretical biology.

[52]  Tony Pawson,et al.  Protein Interaction Network of the Mammalian Hippo Pathway Reveals Mechanisms of Kinase-Phosphatase Interactions , 2013, Science Signaling.

[53]  Kuo-Chen Chou,et al.  iPPI-Esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. , 2015, Journal of theoretical biology.

[54]  J. Keck,et al.  Protein Interactions in Genome Maintenance as Novel Antibacterial Targets , 2013, PloS one.

[55]  M. Snyder,et al.  Protein microarray technology , 2006, Mechanisms of Ageing and Development.

[56]  Maqsood Hayat,et al.  iNuc-STNC: a sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou's PseAAC. , 2016, Molecular bioSystems.

[57]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[58]  Tobias Müller,et al.  Modelling interaction sites in protein domains with interaction profile hidden Markov models , 2006, Bioinform..

[59]  S. Jones,et al.  Prediction of protein-protein interaction sites using patch analysis. , 1997, Journal of molecular biology.

[60]  Yan Huang,et al.  RNALocate: a resource for RNA subcellular localizations , 2016, Nucleic Acids Res..

[61]  Kaustubh D. Dhole,et al.  SPRINGS: Prediction of Protein- Protein Interaction Sites Using Artificial Neural Networks , 2014 .

[62]  Hua Tang,et al.  Identification of Bacterial Cell Wall Lyases via Pseudo Amino Acid Composition , 2016, BioMed research international.

[63]  Hong-Bin Shen,et al.  TargetFreeze: Identifying Antifreeze Proteins via a Combination of Weights using Sequence Evolutionary Information and Pseudo Amino Acid Composition , 2015, The Journal of Membrane Biology.

[64]  Kaustubh D. Dhole,et al.  Sequence-based prediction of protein-protein interaction sites with L1-logreg classifier. , 2014, Journal of theoretical biology.

[65]  Richard M. Jackson,et al.  Predicting protein interaction sites: binding hot-spots in protein-protein and protein-ligand interfaces , 2006, Bioinform..

[66]  Aleksey A. Porollo,et al.  Prediction‐based fingerprints of protein–protein interactions , 2006, Proteins.

[67]  Fei Guo,et al.  Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier , 2017, Artif. Intell. Medicine.

[68]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[69]  Limsoon Wong,et al.  Author's Personal Copy Increasing the Reliability of Protein Interactomes , 2022 .

[70]  M. Gromiha,et al.  Computational Approaches for Predicting Binding Partners, Interface Residues, and Binding Affinity of Protein-Protein Complexes. , 2017, Methods in molecular biology.

[71]  Asifullah Khan,et al.  Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. , 2011, Journal of theoretical biology.

[72]  Bernhardt L Trout,et al.  A computational tool to predict the evolutionarily conserved protein–protein interaction hot‐spot residues from the structure of the unbound protein , 2013, FEBS letters.

[73]  Wei Chen,et al.  Predicting bacteriophage proteins located in host cell with feature selection technique , 2016, Comput. Biol. Medicine.

[74]  Wei Chen,et al.  Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis. , 2014, Molecular bioSystems.

[75]  Hui Ding,et al.  Prediction of the types of ion channel-targeted conotoxins based on radial basis function network. , 2013, Toxicology in vitro : an international journal published in association with BIBRA.

[76]  T. Ito,et al.  Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[77]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[78]  Maqsood Hayat,et al.  Prediction of Membrane Protein Types Using Pseudo-Amino Acid Composition and Ensemble Classification , 2013 .

[79]  Ren Long,et al.  Identification of Multi-Functional Enzyme with Multi-Label Classifier , 2016, PloS one.

[80]  Yong Zhou,et al.  Advancing the prediction accuracy of protein-protein interactions by utilizing evolutionary information from position-specific scoring matrix and ensemble classifier. , 2017, Journal of theoretical biology.

[81]  Boris N. Kholodenko,et al.  Protein interaction switches coordinate Raf-1 and MST2/Hippo signalling , 2014, Nature Cell Biology.

[82]  David R. Westhead,et al.  Improved prediction of protein-protein binding sites using a support vector machines approach. , 2005, Bioinformatics.