Discrimination of mesophilic and thermophilic proteins using machine learning algorithms

Discriminating thermophilic proteins from their mesophilic counterparts is a challenging task and it would help to design stable proteins. In this work, we have systematically analyzed the amino acid compositions of 3075 mesophilic and 1609 thermophilic proteins belonging to 9 and 15 families, respectively. We found that the charged residues Lys, Arg, and Glu as well as the hydrophobic residues, Val and Ile have higher occurrence in thermophiles than mesophiles. Further, we have analyzed the performance of different methods, based on Bayes rules, logistic functions, neural networks, support vector machines, decision trees and so forth for discriminating mesophilic and thermophilic proteins. We found that most of the machine learning techniques discriminate these classes of proteins with similar accuracy. The neural network‐based method could discriminate the thermophiles from mesophiles at the five‐fold cross‐validation accuracy of 89% in a dataset of 4684 proteins. Moreover, this method is tested with 325 mesophiles in Xylella fastidosa and 382 thermophiles in Aquifex aeolicus and it could successfully discriminate them with the accuracy of 91%. These accuracy levels are better than other methods in the literature and we suggest that this method could be effectively used to discriminate mesophilic and thermophilic proteins. Proteins 2008. © 2007 Wiley‐Liss, Inc.

[1]  Y. Igarashi,et al.  Selected Mutations in a Mesophilic Cytochrome cConfer the Stability of a Thermophilic Counterpart* , 2000, The Journal of Biological Chemistry.

[2]  M. Gromiha,et al.  Important amino acid properties for enhanced thermostability from mesophilic to thermophilic proteins. , 1999, Biophysical chemistry.

[3]  Yixue Li,et al.  Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines. , 2006, Journal of theoretical biology.

[4]  M Michael Gromiha,et al.  Inter-residue interactions in protein folding and stability. , 2004, Progress in biophysics and molecular biology.

[5]  R. Ladenstein,et al.  Proteins from hyperthermophiles: stability and enzymatic catalysis close to the boiling point of water. , 1998, Advances in biochemical engineering/biotechnology.

[6]  M Michael Gromiha,et al.  Importance of main-chain hydrophobic free energy to the stability of thermophilic proteins. , 2005, International journal of biological macromolecules.

[7]  J. M. Scholtz,et al.  Lessons in stability from thermophilic proteins , 2006, Protein science : a publication of the Protein Society.

[8]  Baishan Fang,et al.  Application of amino acid distribution along the sequence for discriminating mesophilic and thermophilic proteins , 2006 .

[9]  R. Nussinov,et al.  How do thermophilic proteins deal with heat? , 2001, Cellular and Molecular Life Sciences CMLS.

[10]  M Michael Gromiha,et al.  ROLE OF CATION-π INTERACTIONS TO THE STABILITY OF THERMOPHILIC PROTEINS , 2002, Preparative biochemistry & biotechnology.

[11]  N. Kannan,et al.  Aromatic clusters: a determinant of thermal stability of thermophilic proteins. , 2000, Protein engineering.

[12]  Ming-Tat Ko,et al.  Amino acid coupling patterns in thermophilic proteins , 2005, Proteins.

[13]  Igor N. Berezovsky,et al.  Positive and Negative Design in Stability and Thermal Adaptation of Natural Proteins , 2006, PLoS Comput. Biol..

[14]  R. Varadarajan,et al.  Elucidation of factors responsible for enhanced thermal stability of proteins: a structural genomics based study. , 2002, Biochemistry.

[15]  Adam Godzik,et al.  Clustering of highly homologous sequences to reduce the size of large protein databases , 2001, Bioinform..

[16]  Vasantha Pattabhi,et al.  Role of weak interactions in thermal stability of proteins. , 2004, Biochemical and biophysical research communications.

[17]  Shandar Ahmad,et al.  Application of residue distribution along the sequence for discriminating outer membrane proteins , 2005, Comput. Biol. Chem..

[18]  Gajendra P. S. Raghava,et al.  Prediction of Neurotoxins Based on Their Function and Source , 2007, Silico Biol..

[19]  A. Szilágyi,et al.  Structural differences between mesophilic, moderately thermophilic and extremely thermophilic protein subunits: results of a comprehensive survey. , 2000, Structure.

[20]  R Nussinov,et al.  Thermodynamic differences among homologous thermophilic and mesophilic proteins. , 2001, Biochemistry.

[21]  R. Nussinov,et al.  Factors enhancing protein thermostability. , 2000, Protein engineering.

[22]  K. Nishikawa,et al.  Protein surface amino acid compositions distinctively differ between thermophilic and mesophilic bacteria. , 2001, Journal of molecular biology.

[23]  Chris Sander,et al.  Removing near-neighbour redundancy from large protein sequence collections , 1998, Bioinform..

[24]  Makiko Suwa,et al.  Discrimination of outer membrane proteins using machine learning algorithms , 2006, Proteins.

[25]  M. Sadeghi,et al.  Effective factors in thermostability of thermophilic proteins. , 2006, Biophysical chemistry.

[26]  R. Nussinov,et al.  How do thermophilic proteins deal with heat? Cell Mol Life Sci , 2001 .

[27]  T. Poulos,et al.  New understandings of thermostable and peizostable enzymes. , 2003, Current opinion in biotechnology.

[28]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[29]  M. Gromiha,et al.  Important inter-residue contacts for enhancing the thermal stability of thermophilic proteins. , 2001, Biophysical chemistry.

[30]  N. Bhardwaj,et al.  Kernel-based machine learning protocol for predicting DNA-binding proteins , 2005, Nucleic acids research.

[31]  Yujie Cai,et al.  The influence of dipeptide composition on protein thermostability , 2004, FEBS letters.

[32]  M. Gerstein,et al.  The stability of thermophilic proteins: a study based on comprehensive genome comparison , 2000, Functional & Integrative Genomics.

[33]  B Honig,et al.  Electrostatic contributions to the stability of hyperthermophilic proteins. , 1999, Journal of molecular biology.

[34]  Shandar Ahmad,et al.  Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information , 2004, Bioinform..

[35]  Hervé Minoux,et al.  An electrostatic basis for the stability of thermophilic proteins , 2004, Proteins.

[36]  G. Böhm,et al.  The stability of proteins in extreme environments. , 1998, Current opinion in structural biology.

[37]  M. Michael Gromiha,et al.  A simple statistical method for discriminating outer membrane proteins with better accuracy , 2005, Bioinform..