A two-step discriminated method to identify thermophilic proteins

Improving thermostability of an enzyme can accelerate the relevant chemical reaction. Thus, the analysis and prediction of thermophilic proteins are conducive to protein engineering and enzyme design. In this study, a novel method based on two-step discrimination was proposed to distinguish between thermophilic and non-thermophilic proteins. The model was rigorously benchmarked on an objective dataset including 915 thermophilic proteins and 793 non-thermophilic proteins. Results showed that the overall accuracy of our method is 94.44% in 5-fold cross-validation, which is higher than those of other published methods. We believe that the two-step discriminated strategy will become a promising method in the relevant field of protein bioinformatics.

[1]  Renzhi Cao,et al.  Deciphering the association between gene function and spatial gene-gene interactions in 3D human genome conformation , 2015, BMC Genomics.

[2]  Xue-Hai Hu,et al.  Predicting thermophilic proteins with pseudo amino acid composition:approached from chaos game representation and principal component analysis. , 2011, Protein and peptide letters.

[3]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[4]  Hui Ding,et al.  The prediction of protein structural class using averaged chemical shifts , 2012, Journal of biomolecular structure & dynamics.

[5]  Yujie Cai,et al.  The influence of dipeptide composition on protein thermostability , 2004, FEBS letters.

[6]  Renzhi Cao,et al.  Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks. , 2016, Methods.

[7]  R. Nussinov,et al.  Factors enhancing protein thermostability. , 2000, Protein engineering.

[8]  Jingbo Xia,et al.  Prediction of thermophilic protein with pseudo amino Acid composition: an approach from combined feature selection and reduction. , 2011, Protein and peptide letters.

[9]  Baishan Fang,et al.  LogitBoost classifier for discriminating thermophilic and mesophilic proteins. , 2007, Journal of biotechnology.

[10]  M Michael Gromiha,et al.  Discrimination of mesophilic and thermophilic proteins using machine learning algorithms , 2007, Proteins.

[11]  Jianwen Fang,et al.  Distance-dependent statistical potentials for discriminating thermophilic and mesophilic proteins. , 2010, Biochemical and biophysical research communications.

[12]  Zheng Wang,et al.  Designing and evaluating the MULTICOM protein local and global model quality prediction methods in the CASP10 experiment , 2014, BMC Structural Biology.

[13]  Igor N. Berezovsky,et al.  Positive and Negative Design in Stability and Thermal Adaptation of Natural Proteins , 2006, PLoS Comput. Biol..

[14]  Hao Lin,et al.  Prediction of ketoacyl synthase family using reduced amino acid alphabets , 2012, Journal of Industrial Microbiology & Biotechnology.

[15]  Wei Chen,et al.  Predicting the Types of J-Proteins Using Clustered Amino Acids , 2014, BioMed research international.

[16]  K. Nishikawa,et al.  Protein surface amino acid compositions distinctively differ between thermophilic and mesophilic bacteria. , 2001, Journal of molecular biology.

[17]  Baishan Fang,et al.  Support vector machine for discrimination of thermophilic and mesophilic proteins based on amino acid composition. , 2006, Protein and peptide letters.

[18]  Michele Magrane,et al.  Searching and Navigating UniProt Databases , 2015, Current protocols in bioinformatics.

[19]  M. Gerstein,et al.  The stability of thermophilic proteins: a study based on comprehensive genome comparison , 2000, Functional & Integrative Genomics.

[20]  Jilong Li,et al.  Large-scale model quality assessment for improving protein tertiary structure prediction , 2015, Bioinform..

[21]  Wei Chen,et al.  Prediction of midbody, centrosome and kinetochore proteins based on gene ontology information. , 2010, Biochemical and biophysical research communications.

[22]  Songyot Nakariyakul,et al.  Detecting thermophilic proteins through selecting amino acid and dipeptide composition features , 2011, Amino Acids.

[23]  Jilong Li,et al.  Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11 , 2016, Proteins.

[24]  Renzhi Cao,et al.  SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines , 2013, BMC Bioinformatics.

[25]  Wei Chen,et al.  Prediction of thermophilic proteins using feature selection technique. , 2011, Journal of microbiological methods.

[26]  H. Freeze,et al.  Thermus aquaticus gen. n. and sp. n., a Nonsporulating Extreme Thermophile , 1969, Journal of bacteriology.

[27]  M Michael Gromiha,et al.  Importance of main-chain hydrophobic free energy to the stability of thermophilic proteins. , 2005, International journal of biological macromolecules.

[28]  Wei Chen,et al.  A similarity distance of diversity measure for discriminating mesophilic and thermophilic proteins , 2013, Amino Acids.

[29]  Wei Chen,et al.  Using Over-Represented Tetrapeptides to Predict Protein Submitochondria Locations , 2013, Acta Biotheoretica.

[30]  M. Gromiha,et al.  Important amino acid properties for enhanced thermostability from mesophilic to thermophilic proteins. , 1999, Biophysical chemistry.

[31]  Wei Chen,et al.  Identification of voltage-gated potassium channel subfamilies from sequence information using support vector machine , 2012, Comput. Biol. Medicine.