Identification of Heat Shock Protein families and J-protein types by incorporating Dipeptide Composition into Chou's general PseAAC

Heat Shock Proteins (HSPs) are the substantial ingredients for cell growth and viability, which are found in all living organisms. HSPs manage the process of folding and unfolding of proteins, the quality of newly synthesized proteins and protecting cellular homeostatic processes from environmental stress. On the basis of functionality, HSPs are categorized into six major families namely: (i) HSP20 or sHSP (ii) HSP40 or J-proteins types (iii) HSP60 or GroEL/ES (iv) HSP70 (v) HSP90 and (vi) HSP100. Identification of HSPs family and sub-family through conventional approaches is expensive and laborious. It is therefore, highly desired to establish an automatic, robust and accurate computational method for prediction of HSPs quickly and reliably. Regard, a computational model is developed for the prediction of HSPs family. In this model, protein sequences are formulated using three discrete methods namely: Split Amino Acid Composition, Pseudo Amino Acid Composition, and Dipeptide Composition. Several learning algorithms are utilized to choice the best one for high throughput computational model. Leave one out test is applied to assess the performance of the proposed model. The empirical results showed that support vector machine achieved quite promising results using Dipeptide Composition feature space. The predicted outcomes of proposed model are 90.7% accuracy for HSPs dataset and 97.04% accuracy for J-protein types, which are higher than existing methods in the literature so far.

[1]  T. Lithgow,et al.  The J‐protein family: modulating protein assembly, disassembly and translocation , 2004, EMBO reports.

[2]  Hassan Mohabatkar,et al.  Prediction of cyclin proteins using Chou's pseudo amino acid composition. , 2010, Protein and peptide letters.

[3]  Zia-ur-Rehman,et al.  Identifying GPCRs and their types with Chou's pseudo amino acid composition: an approach from multi-scale energy representation and position specific scoring matrix. , 2012, Protein and peptide letters.

[4]  Dong-Sheng Cao,et al.  propy: a tool to generate various modes of Chou's PseAAC , 2013, Bioinform..

[5]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[6]  K. Chou,et al.  iCTX-Type: A Sequence-Based Predictor for Identifying the Types of Conotoxins in Targeting Ion Channels , 2014, BioMed research international.

[7]  Mohd Razi Ismail,et al.  Heat Shock Proteins: Functions And Response Against Heat Stress In Plants , 2014 .

[8]  Zihai Li,et al.  Heat-shock proteins in infection-mediated inflammation-induced tumorigenesis , 2009, Journal of hematology & oncology.

[9]  B. Liu,et al.  iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance-Pairs and Reduced Alphabet Profile into the General Pseudo Amino Acid Composition , 2014, PloS one.

[10]  H. Kampinga,et al.  The HSP70 chaperone machinery: J proteins as drivers of functional specificity , 2010, Nature Reviews Molecular Cell Biology.

[11]  Asifullah Khan,et al.  Prediction of membrane protein types by using dipeptide and pseudo amino acid composition-based composite features , 2012, IET Commun..

[12]  K. Chou Impacts of bioinformatics to medicinal chemistry. , 2015, Medicinal chemistry (Shariqah (United Arab Emirates)).

[13]  Wei Chen,et al.  iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition , 2014, Nucleic acids research.

[14]  K. Chou,et al.  iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. , 2013, Analytical biochemistry.

[15]  Raja Das,et al.  A PROBABILISTIC NEURAL NETWORK APPROACH FOR CLASSIFICATION OF VEHICLE , 2013 .

[16]  Muhammad Tahir,et al.  MitProt-Pred: Predicting mitochondrial proteins of Plasmodium falciparum parasite using diverse physiochemical properties and ensemble classification , 2013, Comput. Biol. Medicine.

[17]  Sibanda Wilbert,et al.  Novel Application of Multi-Layer Perceptrons (MLP) Neural Networks to Model HIV in South Africa using Seroprevalence Data from Antenatal Clinics , 2011 .

[18]  Oliver F. Lange,et al.  Determination of solution structures of proteins up to 40 kDa using CS-Rosetta with sparse NMR data from deuterated samples , 2012, Proceedings of the National Academy of Sciences.

[19]  K. Chou,et al.  iSS-PseDNC: Identifying Splicing Sites Using Pseudo Dinucleotide Composition , 2014, BioMed research international.

[20]  Xin Wang,et al.  PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou's pseudo-amino acid compositions. , 2012, Analytical biochemistry.

[21]  K. Chou,et al.  iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. , 2012, Molecular bioSystems.

[22]  Ian Witten,et al.  Data Mining , 2000 .

[23]  Manish Kumar,et al.  Prediction of β-lactamase and its class by Chou's pseudo-amino acid composition and support vector machine. , 2015, Journal of theoretical biology.

[24]  Wei Chen,et al.  iORI-PseKNC: A predictor for identifying origin of replication with pseudo k-tuple nucleotide composition , 2015 .

[25]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[26]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[27]  Wei Chen,et al.  PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions , 2015, Bioinform..

[28]  Qiuwen Zhang,et al.  MultiP-SChlo: Multi-label protein subchloroplast localization prediction , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[29]  M. Nasreen,et al.  Computational Approach to Search for Plant Homologues of Human Heat Shock Protein , 2013 .

[30]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[31]  Zahoor Jan,et al.  Seasonal to Inter-annual Climate Prediction Using Data Mining KNN Technique , 2008, IMTIC.

[32]  K. Chou,et al.  iSNO-PseAAC: Predict Cysteine S-Nitrosylation Sites in Proteins by Incorporating Position Specific Amino Acid Propensity into Pseudo Amino Acid Composition , 2013, PloS one.

[33]  Qian-Zhong Li,et al.  Discriminating bioluminescent proteins by incorporating average chemical shift and evolutionary information into the general form of Chou's pseudo amino acid composition. , 2013, Journal of theoretical biology.

[34]  Maqsood Hayat,et al.  Discriminating protein structure classes by incorporating Pseudo Average Chemical Shift to Chou's general PseAAC and Support Vector Machine , 2014, Comput. Methods Programs Biomed..

[35]  Arbab Waseem Abbas,et al.  Database development and automatic speech recognition of isolated Pashto spoken digits using MFCC and K-NN , 2015, Int. J. Speech Technol..

[36]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[37]  M. Raška,et al.  Heat shock proteins in autoimmune diseases. , 2005, Biomedical papers of the Medical Faculty of the University Palacky, Olomouc, Czechoslovakia.

[38]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[39]  Wei Chen,et al.  Predicting the Types of J-Proteins Using Clustered Amino Acids , 2014, BioMed research international.

[40]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[41]  Lei Yang,et al.  Discrimination of membrane transporter protein types using K-nearest neighbor method derived from the similarity distance of total diversity measure. , 2015, Molecular bioSystems.

[42]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[43]  Muciz Özcan,et al.  Control of diesel engines mounted on vehicles in mobile cranes via CAN bus , 2013 .

[44]  NMR structural studies of membrane proteins. , 1998, Current opinion in structural biology.

[45]  Hassan Mohabatkar,et al.  Prediction of allergenic proteins by means of the concept of Chou's pseudo amino acid composition and a machine learning approach. , 2012, Medicinal chemistry (Shariqah (United Arab Emirates)).

[46]  Jianding Qiu,et al.  Prediction of G-protein-coupled receptor classes based on the concept of Chou's pseudo amino acid composition: an approach from discrete wavelet transform. , 2009, Analytical biochemistry.

[47]  Wei Chen,et al.  iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. , 2014, Analytical biochemistry.

[48]  W. Zhong,et al.  Molecular Science for Drug Development and Biomedicine , 2014, International journal of molecular sciences.

[49]  S. O’Brien,et al.  Evaluation and Integration of Genetic Signature for Prediction Risk of Nasopharyngeal Carcinoma in Southern China , 2014, BioMed research international.

[50]  P. Hooper,et al.  Heat shock proteins: new keys to the development of cytoprotective therapies , 2001, Expert opinion on therapeutic targets.

[51]  C. V. D. Malsburg,et al.  Frank Rosenblatt: Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms , 1986 .

[52]  F F Nobre,et al.  Comparison among probabilistic neural network, support vector machine and logistic regression for evaluating the effect of subthalamic stimulation in Parkinson disease on ground reaction force during gait. , 2010, Journal of biomechanics.

[53]  Zaheer Ullah Khan,et al.  Discrimination of acidic and alkaline enzyme using Chou's pseudo amino acid composition in conjunction with probabilistic neural network model. , 2015, Journal of theoretical biology.

[54]  James G. Lyons,et al.  Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳s general PseAAC. , 2015, Journal of theoretical biology.

[55]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[56]  Zhanchao Li,et al.  Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. , 2007, Journal of theoretical biology.

[57]  Asifullah Khan,et al.  IDM-PhyChm-Ens: Intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids , 2014, Amino Acids.

[58]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[59]  F. Davies,et al.  Heat shock proteins in multiple myeloma , 2014, Oncotarget.

[60]  K. Chou Using subsite coupling to predict signal peptides. , 2001, Protein engineering.

[61]  Maqsood Hayat,et al.  Mem-PHybrid: hybrid features-based prediction system for classifying membrane protein types. , 2012, Analytical biochemistry.

[62]  K. Chou,et al.  PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. , 2014, Analytical biochemistry.

[63]  Jean-Christophe Gelly,et al.  Detection and Architecture of Small Heat Shock Protein Monomers , 2010, PloS one.

[64]  Xiaoqi Zheng,et al.  Prediction of bacterial protein subcellular localization by incorporating various features into Chou's PseAAC and a backward feature selection approach. , 2014, Biochimie.

[65]  Hao Lin,et al.  Prediction of Subcellular Localization of Apoptosis Protein Using Chou’s Pseudo Amino Acid Composition , 2009, Acta biotheoretica.

[66]  Kuo-Chen Chou,et al.  Some remarks on predicting multi-label attributes in molecular biosystems. , 2013, Molecular bioSystems.

[67]  C. Redfield,et al.  Using nuclear magnetic resonance spectroscopy to study molten globule states of proteins. , 2004, Methods.

[68]  Pufeng Du,et al.  PseAAC-General: Fast Building Various Modes of General Form of Chou’s Pseudo-Amino Acid Composition for Large-Scale Protein Datasets , 2014, International journal of molecular sciences.

[69]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[70]  H. Wong,et al.  Pediatric Sepsis - Part V: Extracellular Heat Shock Proteins: Alarmins for the Host Immune System. , 2011, The open inflammation journal.

[71]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[72]  B S Polla,et al.  Heat shock protein 70 and ATP as partners in cell homeostasis (Review). , 1999, International journal of molecular medicine.

[73]  R. RatheeshKumar,et al.  HSPIR: a manually annotated heat shock protein information resource , 2012, Bioinform..

[74]  Kuo-Chen Chou,et al.  iPPI-Esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. , 2015, Journal of theoretical biology.

[75]  K. Chou,et al.  iRSpot-TNCPseAAC: Identify Recombination Spots with Trinucleotide Composition and Pseudo Amino Acid Components , 2014, International journal of molecular sciences.

[76]  Asifullah Khan,et al.  MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM. , 2012, Journal of theoretical biology.

[77]  Sukanta Mondal,et al.  Chou's pseudo amino acid composition improves sequence-based antifreeze protein prediction. , 2014, Journal of theoretical biology.

[78]  K. Chou,et al.  iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. , 2013, Analytical biochemistry.

[79]  Qian-zhong Li,et al.  Using reduced amino acid composition to predict defensin family and subfamily: Integrating similarity measure and structural alphabet , 2009, Peptides.

[80]  Asifullah Khan,et al.  Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. , 2011, Journal of theoretical biology.

[81]  Wei Chen,et al.  iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition , 2014, Bioinform..

[82]  Mustafa Ghaderzadeh,et al.  Comparing Performance of Different Neural Networks for Early Detection of Cancer from Benign Hyperplasia of Prostate , 2013 .

[83]  Xiaolong Wang,et al.  repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects , 2015, Bioinform..

[84]  Maqsood Hayat,et al.  Discriminating outer membrane proteins with Fuzzy K-nearest Neighbor algorithms based on the general form of Chou's PseAAC. , 2012, Protein and peptide letters.

[85]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.