Sequence-only evolutionary and predicted structural features for the prediction of stability changes in protein mutants

BackgroundEven a single amino acid substitution in a protein sequence may result in significant changes in protein stability, structure, and therefore in protein function as well. In the post-genomic era, computational methods for predicting stability changes from only the sequence of a protein are of importance. While evolutionary relationships of protein mutations can be extracted from large protein databases holding millions of protein sequences, relevant evolutionary features for the prediction of stability changes have not been proposed. Also, the use of predicted structural features in situations when a protein structure is not available has not been explored.ResultsWe proposed a number of evolutionary and predicted structural features for the prediction of stability changes and analysed which of them capture the determinants of protein stability the best. We trained and evaluated our machine learning method on a non-redundant data set of experimentally measured stability changes. When only the direction of the stability change was predicted, we found that the best performance improvement can be achieved by the combination of the evolutionary features mutation likelihood and SIFTscore in conjunction with the predicted structural feature secondary structure. The same two evolutionary features in the combination with the predicted structural feature accessible surface area achieved the lowest error when the prediction of actual values of stability changes was assessed. Compared to similar studies, our method achieved improvements in prediction performance.ConclusionAlthough the strongest feature for the prediction of stability changes appears to be the vector of amino acid identities in the sequential neighbourhood of the mutation, the most relevant combination of evolutionary and predicted structural features further improves prediction performance. Even the predicted structural features, which did not perform well on their own, turn out to be beneficial when appropriately combined with evolutionary features. We conclude that a high prediction accuracy can be achieved knowing only the sequence of a protein when the right combination of both structural and evolutionary features is used.

[1]  J. S. Sodhi,et al.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. , 2004, Journal of molecular biology.

[2]  Liang-Tsung Huang,et al.  iPTREE-STAB: interpretable decision tree based method for predicting protein stability changes upon mutations , 2007, Bioinform..

[3]  P. Baldi,et al.  Prediction of coordination number and relative solvent accessibility in proteins , 2002, Proteins.

[4]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[5]  Akinori Sarai,et al.  ProTherm and ProNIT: thermodynamic databases for proteins and protein–nucleic acid interactions , 2005, Nucleic Acids Res..

[6]  Liangjiang Wang,et al.  Sequence feature-based prediction of protein stability changes upon amino acid substitutions , 2010, BMC Genomics.

[7]  S. Henikoff,et al.  Predicting deleterious amino acid substitutions. , 2001, Genome research.

[8]  M. O. Dayhoff A model of evolutionary change in protein , 1978 .

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  M. N. Ponnuswamy,et al.  Average assignment method for predicting the stability of protein mutants , 2006, Biopolymers.

[11]  Shiow-Fen Hwang,et al.  Knowledge acquisition and development of accurate rules for predicting protein stability changes , 2006, Comput. Biol. Chem..

[12]  Piero Fariselli,et al.  I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure , 2005, Nucleic Acids Res..

[13]  M. Gönen,et al.  Machine learning integration for predicting the effect of single amino acid substitutions on protein stability , 2009, BMC Structural Biology.

[14]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[15]  Janet M. Thornton,et al.  Understanding the molecular machinery of genetics through 3D structures , 2008, Nature Reviews Genetics.

[16]  Arlo Z. Randall,et al.  Prediction of protein stability changes for single‐site mutations using support vector machines , 2005, Proteins.

[17]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[18]  Piero Fariselli,et al.  A neural-network-based method for predicting protein stability changes upon single point mutations , 2004, ISMB/ECCB.

[19]  Geoffrey I. Webb,et al.  TANGLE: Two-Level Support Vector Regression Approach for Protein Backbone Torsion Angle Prediction from Primary Sequences , 2012, PloS one.