A novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting relative thermostability of protein mutants

BackgroundThe ability to design thermostable proteins is theoretically important and practically useful. Robust and accurate algorithms, however, remain elusive. One critical problem is the lack of reliable methods to estimate the relative thermostability of possible mutants.ResultsWe report a novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting the relative thermostability of protein mutants. The scoring function was developed based on an elaborate analysis of a set of features calculated or predicted from 540 pairs of hyperthermophilic and mesophilic protein ortholog sequences. It was constructed by a linear combination of ten important features identified by a feature ranking procedure based on the random forest classification algorithm. The weights of these features in the scoring function were fitted by a hill-climbing algorithm. This scoring function has shown an excellent ability to discriminate hyperthermophilic from mesophilic sequences. The prediction accuracies reached 98.9% and 97.3% in discriminating orthologous pairs in training and the holdout testing datasets, respectively. Moreover, the scoring function can distinguish non-homologous sequences with an accuracy of 88.4%. Additional blind tests using two datasets of experimentally investigated mutations demonstrated that the scoring function can be used to predict the relative thermostability of proteins and their mutants at very high accuracies (92.9% and 94.4%). We also developed an amino acid substitution preference matrix between mesophilic and hyperthermophilic proteins, which may be useful in designing more thermostable proteins.ConclusionsWe have presented a novel scoring function which can distinguish not only HP/MP ortholog pairs, but also non-homologous pairs at high accuracies. Most importantly, it can be used to accurately predict the relative stability of proteins and their mutants, as demonstrated in two blind tests. In addition, the residue substitution preference matrix assembled in this study may reflect the thermal adaptation induced substitution biases. A web server implementing the scoring function and the dataset used in this study are freely available at http://www.abl.ku.edu/thermorank/.

[1]  Eugene I Shakhnovich,et al.  Physics and evolution of thermophilic adaptation. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[2]  B. Stoddard,et al.  Computational Thermostabilization of an Enzyme , 2005, Science.

[3]  Mookyung Cheon,et al.  NEW METHOD OF EVALUATING RELATIVE THERMAL STABILITIES OF PROTEINS BASED ON THEIR AMINO ACID SEQUENCES: TARGETSTAR , 2007 .

[4]  B. Dahiyat,et al.  In silico design for protein stabilization. , 1999, Current opinion in biotechnology.

[5]  P Argos,et al.  Protein thermal stability, hydrogen bonds, and ion pairs. , 1997, Journal of molecular biology.

[6]  R. Fisher On the Interpretation of χ 2 from Contingency Tables , and the Calculation of P Author , 2022 .

[7]  M Rossi,et al.  Analysis of thermal adaptation in the HSL enzyme family. , 2004, Journal of molecular biology.

[8]  S. A. Marshall,et al.  Designing proteins for therapeutic applications. , 2003, Current opinion in structural biology.

[9]  Jorng-Tzong Horng,et al.  An expert system to predict protein thermostability using decision tree , 2009, Expert Syst. Appl..

[10]  M. McFall-Ngai,et al.  A comparative study of the thermal stability of the vertebrate eye lens: Antarctic ice fish to the desert iguana. , 1990, Experimental eye research.

[11]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[12]  Boojala V. B. Reddy,et al.  Comparative proteome analysis of psychrophilic versus mesophilic bacterial species: Insights into the molecular basis of cold adaptation of proteins , 2009, BMC Genomics.

[13]  X.-X. Zhou,et al.  Differences in amino acids composition and coupling patterns between mesophilic and thermophilic proteins , 2007, Amino Acids.

[14]  Sandip Paul,et al.  Analysis of Nanoarchaeum equitans genome and proteome composition: indications for hyperthermophilic and parasitic adaptation , 2006, BMC Genomics.

[15]  S. Trivedi,et al.  Protein thermostability in Archaea and Eubacteria. , 2006, Genetics and molecular research : GMR.

[16]  B Honig,et al.  Electrostatic contributions to the stability of hyperthermophilic proteins. , 1999, Journal of molecular biology.

[17]  T. Gibson,et al.  Protein disorder prediction: implications for structural proteomics. , 2003, Structure.

[18]  Orly Dym,et al.  A single proline substitution is critical for the thermostabilization of Clostridium beijerinckii alcohol dehydrogenase , 2006, Proteins.

[19]  Baishan Fang,et al.  Discrimination of thermophilic and mesophilic proteins via pattern recognition methods , 2006 .

[20]  A. Giuliani,et al.  A computational approach identifies two regions of Hepatitis C Virus E1 protein as interacting domains involved in viral fusion process , 2009, BMC Structural Biology.

[21]  Xiuzhen Zhang,et al.  Predicting disordered regions in proteins using the profiles of amino acid indices , 2009, BMC Bioinformatics.

[22]  M. Sadeghi,et al.  Effective factors in thermostability of thermophilic proteins. , 2006, Biophysical chemistry.

[23]  Michail Yu. Lobanov,et al.  Different packing of external residues can explain differences in the thermostability of proteins from thermophilic and mesophilic organisms , 2007, Bioinform..

[24]  P. Haney,et al.  Analysis of Thermal Stabilizing Interactions in Mesophilic and Thermophilic Adenylate Kinases from the GenusMethanococcus * , 1999, The Journal of Biological Chemistry.

[25]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[26]  Jaap Heringa,et al.  An analysis of protein domain linkers: their classification and role in protein folding. , 2002, Protein engineering.

[27]  Stefano Pascarella,et al.  Structural adaptation of the subunit interface of oligomeric thermophilic and hyperthermophilic enzymes , 2009, Comput. Biol. Chem..

[28]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[29]  Jonathan M. Garibaldi,et al.  Supervised machine learning algorithms for protein structure classification , 2009, Comput. Biol. Chem..

[30]  S H Kim,et al.  Prediction of protein folding class from amino acid composition , 1993, Proteins.

[31]  R D Appel,et al.  Protein identification and analysis tools in the ExPASy server. , 1999, Methods in molecular biology.

[32]  C. Cambillau,et al.  Structural and Genomic Correlates of Hyperthermostability* , 2000, The Journal of Biological Chemistry.

[33]  Frank Eisenhaber,et al.  Improved strategy in analytic surface calculation for molecular systems: Handling of singularities and computational efficiency , 1993, J. Comput. Chem..

[34]  R. Sterner,et al.  Thermophilic Adaptation of Proteins , 2001, Critical reviews in biochemistry and molecular biology.

[35]  J. McDonald,et al.  Patterns of temperature adaptation in proteins from the bacteria Deinococcus radiodurans and Thermus thermophilus. , 2001, Molecular biology and evolution.

[36]  BMC Bioinformatics , 2005 .

[37]  Igor N. Berezovsky,et al.  Positive and Negative Design in Stability and Thermal Adaptation of Natural Proteins , 2006, PLoS Comput. Biol..

[38]  P Argos,et al.  Engineering protein thermal stability. Sequence statistics point to residue substitutions in alpha-helices. , 1989, Journal of molecular biology.

[39]  Manfred K. Warmuth,et al.  Engineering proteinase K using machine learning and synthetic genes , 2007, BMC biotechnology.

[40]  A. Szilágyi,et al.  Structural differences between mesophilic, moderately thermophilic and extremely thermophilic protein subunits: results of a comprehensive survey. , 2000, Structure.

[41]  Piero Fariselli,et al.  Predicting protein thermostability changes from sequence upon multiple mutations , 2008, ISMB.

[42]  R. Fisher On the Interpretation of χ2 from Contingency Tables, and the Calculation of P , 2018, Journal of the Royal Statistical Society Series A (Statistics in Society).

[43]  J. M. Scholtz,et al.  Lessons in stability from thermophilic proteins , 2006, Protein science : a publication of the Protein Society.

[44]  D Eisenberg,et al.  Transproteomic evidence of a loop-deletion mechanism for enhancing protein thermostability. , 1999, Journal of molecular biology.

[45]  P. Armentrout,et al.  Experimental and theoretical studies of sodium cation complexes of the deamidation and dehydration products of asparagine, glutamine, aspartic acid, and glutamic acid. , 2008, The journal of physical chemistry. A.

[46]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[47]  George I Makhatadze,et al.  A computational approach for the rational design of stable proteins and enzymes: optimization of surface charge-charge interactions. , 2009, Methods in enzymology.

[48]  G. Olsen,et al.  Thermal adaptation analyzed by comparison of protein sequences from mesophilic and extremely thermophilic Methanococcus species. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Ronald T Borchardt,et al.  Asparagine deamidation in recombinant human lymphotoxin: hindrance by three-dimensional structures. , 2003, Journal of pharmaceutical sciences.

[50]  M Michael Gromiha,et al.  Discrimination of mesophilic and thermophilic proteins using machine learning algorithms , 2007, Proteins.

[51]  M. Gromiha,et al.  Important amino acid properties for enhanced thermostability from mesophilic to thermophilic proteins. , 1999, Biophysical chemistry.

[52]  S Pascarella,et al.  Structural adaptation of enzymes to low temperatures. , 2001, Protein engineering.

[53]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[54]  Igor N. Berezovsky,et al.  Protein and DNA Sequence Determinants of Thermophilic Adaptation , 2006, PLoS Comput. Biol..