A base measure of precision for protein stability predictors: structural sensitivity

Background Prediction of the change in fold stability (ΔΔG) of a protein upon mutation is of major importance to protein engineering and screening of disease-causing variants. Many prediction methods can use 3D structural information to predict ΔΔG. While the performance of these methods has been extensively studied, a new problem has arisen due to the abundance of crystal structures: How precise are these methods in terms of structure input used, which structure should be used, and how much does it matter? Thus, there is a need to quantify the structural sensitivity of protein stability prediction methods. Results We computed the structural sensitivity of six widely-used prediction methods by use of saturated computational mutagenesis on a diverse set of 87 structures of 25 proteins. Our results show that structural sensitivity varies massively and surprisingly falls into two very distinct groups, with methods that take detailed account of the local environment showing a sensitivity of ~ 0.6 to 0.8 kcal/mol, whereas machine-learning methods display much lower sensitivity (~ 0.1 kcal/mol). We also observe that the precision correlates with the accuracy for mutation-type-balanced data sets but not generally reported accuracy of the methods, indicating the importance of mutation-type balance in both contexts. Conclusions The structural sensitivity of stability prediction methods varies greatly and is caused mainly by the models and less by the actual protein structural differences. As a new recommended standard, we therefore suggest that ΔΔG values are evaluated on three protein structures when available and the associated standard deviation reported, to emphasize not just the accuracy but also the precision of the method in a specific study. Our observation that machine-learning methods deemphasize structure may indicate that folded wild-type structures alone, without the folded mutant and unfolded structures, only add modest value for assessing protein stability effects, and that side-chain-sensitive methods overstate the significance of the folded wild-type structure.

[1]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[2]  Alan A. Dombkowski,et al.  Disulfide by Design 2.0: a web-based tool for disulfide engineering in proteins , 2013, BMC Bioinformatics.

[3]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[4]  Rainer Merkl,et al.  A Fast and Precise Approach for Computational Saturation Mutagenesis and its Experimental Validation by Using an Artificial (βα)8‐Barrel Protein , 2011, Chembiochem : a European journal of chemical biology.

[5]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[6]  M. Gromiha,et al.  Prediction of protein stability upon point mutations. , 2007, Biochemical Society transactions.

[7]  K. P. Kepp Computing stability effects of mutations in human superoxide dismutase 1. , 2014, The journal of physical chemistry. B.

[8]  Adrian W. R. Serohijos,et al.  The Influence of Selection for Protein Stability on dN/dS Estimations , 2014, Genome biology and evolution.

[9]  Konrad Hinsen,et al.  Structural flexibility in proteins: impact of the crystal environment , 2008, Bioinform..

[10]  R. Godoy-Ruiz,et al.  Relation between protein stability, evolution and structure, as probed by carboxylic acid mutations. , 2004, Journal of molecular biology.

[11]  K. P. Kepp,et al.  Superoxide dismutase 1 is positively selected to minimize protein aggregation in great apes , 2017, Cellular and Molecular Life Sciences.

[12]  T L Blundell,et al.  Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitution and propensity tables. , 1997, Protein engineering.

[13]  Tugba G. Kucukkal,et al.  On Human Disease‐Causing Amino Acid Variants: Statistical Study of Sequence and Structural Patterns , 2015, Human mutation.

[14]  Kasper P. Kepp,et al.  A Model of Proteostatic Energy Cost and Its Use in Analysis of Proteome Trends and Sequence Evolution , 2014, PloS one.

[15]  Ilan Samish,et al.  Computational Protein Design , 2017, Methods in Molecular Biology.

[16]  Kasper P. Kepp,et al.  Accurate Stabilities of Laccase Mutants Predicted with a Modified FoldX Protocol , 2012, J. Chem. Inf. Model..

[17]  Piero Fariselli,et al.  A three-state prediction of single point mutations on protein stability changes , 2007, BMC Bioinformatics.

[18]  Tugba G. Kucukkal,et al.  Structural and physico-chemical effects of disease and non-disease nsSNPs on proteins. , 2015, Current opinion in structural biology.

[19]  M. Vihinen,et al.  Variation Interpretation Predictors: Principles, Types, Performance, and Choice , 2016, Human mutation.

[20]  Mauno Vihinen,et al.  VariBench: A Benchmark Database for Variations , 2013, Human mutation.

[21]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[22]  Akinori Sarai,et al.  ProTherm, version 4.0: thermodynamic database for proteins and mutants , 2004, Nucleic Acids Res..

[23]  Marianne Rooman,et al.  Quantification of biases in predictions of protein stability changes upon mutations , 2018, bioRxiv.

[24]  Nidhi Mathur,et al.  Computational approaches for predicting mutant protein stability , 2016, Journal of Computer-Aided Molecular Design.

[25]  E J Dodson,et al.  Does NMR mean "not for molecular replacement"? Using NMR-based search models to solve protein crystal structures. , 2000, Structure.

[26]  Eugene I. Shakhnovich,et al.  Protein stability imposes limits on organism complexity and speed of molecular evolution , 2007, Proceedings of the National Academy of Sciences.

[27]  J. Moult,et al.  Loss of protein structure stability as a major causative factor in monogenic disease. , 2005, Journal of molecular biology.

[28]  R. Goldstein,et al.  The evolution and evolutionary consequences of marginal thermostability in proteins , 2011, Proteins.

[29]  G. Schreiber,et al.  Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. , 2009, Protein engineering, design & selection : PEDS.

[30]  Carsten Wiuf,et al.  The CATH database , 2010, Human Genomics.

[31]  Douglas E. V. Pires,et al.  mCSM: predicting the effects of mutations in proteins using graph-based signatures , 2013, Bioinform..

[32]  Jessie B. Kennedy,et al.  MaTSE: the gene expression time-series explorer , 2013, BMC Bioinformatics.

[33]  S. L. Mayo,et al.  Computational protein design. , 1999, Structure.

[34]  D. Baker,et al.  The coming of age of de novo protein design , 2016, Nature.

[35]  Piero Fariselli,et al.  On the biases in predictions of protein stability changes upon variations: the INPS test case , 2018, Bioinform..

[36]  Yi Lu,et al.  Rational Design of a Structural and Functional Nitric Oxide Reductase , 2009, Nature.

[37]  Peter Lackner,et al.  MAESTRO - multi agent stability prediction upon point mutations , 2015, BMC Bioinformatics.

[38]  D. Baker,et al.  Role of conformational sampling in computing mutation‐induced changes in protein structure and stability , 2011, Proteins.

[39]  Mauno Vihinen,et al.  Performance of protein stability predictors , 2010, Human mutation.

[40]  D Gilis,et al.  PoPMuSiC, an algorithm for predicting protein mutant stability changes: application to prion proteins. , 2000, Protein engineering.

[41]  Philippe Bogaerts,et al.  Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0 , 2009, Bioinform..

[42]  Piero Fariselli,et al.  I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure , 2005, Nucleic Acids Res..

[43]  Kasper P Kepp,et al.  Towards a "Golden Standard" for computing globin stability: Stability and structure sensitivity of myoglobin mutants. , 2015, Biochimica et biophysica acta.

[44]  Octav Caldararu,et al.  Systematic Investigation of the Data Set Dependency of Protein Stability Predictors , 2020, J. Chem. Inf. Model..

[45]  Alan A. Dombkowski,et al.  Disulfide by DesignTM: a computational method for the rational design of disulfide bonds in proteins , 2003, Bioinform..

[46]  François Stricher,et al.  The FoldX web server: an online force field , 2005, Nucleic Acids Res..