Towards sequence-based prediction of mutation-induced stability changes in unseen non-homologous proteins

BackgroundReliable prediction of stability changes induced by a single amino acid substitution is an important aspect of computational protein design. Several machine learning methods capable of predicting stability changes from the protein sequence alone have been introduced. Prediction performance of these methods is evaluated on mutations unseen during training. Nevertheless, different mutations of the same protein, and even the same residue, as encountered during training are commonly used for evaluation. We argue that a faithful evaluation can be achieved only when a method is tested on previously unseen proteins with low sequence similarity to the training set.ResultsWe provided experimental evidence of the limitations of the evaluation commonly used for assessing the prediction performance. Furthermore, we demonstrated that the prediction of stability changes in previously unseen non-homologous proteins is a challenging task for currently available methods. To improve the prediction performance of our previously proposed method, we identified features which led to over-fitting and further extended the model with new features. The new method employs Evolutionary And Structural Encodings with Amino Acid parameters (EASE-AA). Evaluated with an independent test set of more than 600 mutations, EASE-AA yielded a Matthews correlation coefficient of 0.36 and was able to classify correctly 66% of the stabilising and 74% of the destabilising mutations. For real-value prediction, EASE-AA achieved the correlation of predicted and experimentally measured stability changes of 0.51.ConclusionsCommonly adopted evaluation with mutations in the same protein, and even the same residue, randomly divided between the training and test sets lead to an overestimation of prediction performance. Therefore, stability changes prediction methods should be evaluated only on mutations in previously unseen non-homologous proteins. Under such an evaluation, EASE-AA predicts stability changes more reliably than currently available methods.

[1]  Akinori Sarai,et al.  ProTherm and ProNIT: thermodynamic databases for proteins and protein–nucleic acid interactions , 2005, Nucleic Acids Res..

[2]  Xiaoyu Chu,et al.  Predicting changes in protein thermostability brought about by single- or multi-site mutations , 2010, BMC Bioinformatics.

[3]  Janet M. Thornton,et al.  Understanding the molecular machinery of genetics through 3D structures , 2008, Nature Reviews Genetics.

[4]  Piero Fariselli,et al.  A neural-network-based method for predicting protein stability changes upon single point mutations , 2004, ISMB/ECCB.

[5]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[6]  Lukasz A. Kurgan,et al.  SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles , 2012, J. Comput. Chem..

[7]  Burkhard Rost,et al.  SNAP predicts effect of mutations on protein function , 2008, Bioinform..

[8]  Piero Fariselli,et al.  I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure , 2005, Nucleic Acids Res..

[9]  S. Henikoff,et al.  Predicting deleterious amino acid substitutions. , 2001, Genome research.

[10]  Bairong Shen,et al.  Structure-based prediction of the effects of a missense variant on protein stability , 2012, Amino Acids.

[11]  George Karypis,et al.  Introduction to Protein Structure Prediction: Methods and Algorithms , 2010 .

[12]  Philippe Bogaerts,et al.  Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0 , 2009, Bioinform..

[13]  M. N. Ponnuswamy,et al.  Average assignment method for predicting the stability of protein mutants , 2006, Biopolymers.

[14]  Yuedong Yang,et al.  DDIG-in: discriminating between disease-associated and neutral non-frameshifting micro-indels , 2013, Genome Biology.

[15]  Abdul Sattar,et al.  Sequence-only evolutionary and predicted structural features for the prediction of stability changes in protein mutants , 2013, BMC Bioinformatics.

[16]  Peng Yue,et al.  SNPs3D: Candidate gene and SNP selection for association studies , 2006, BMC Bioinformatics.

[17]  Eugene W. Myers,et al.  Basic local alignment search tool. Journal of Molecular Biology , 1990 .

[18]  Iosif I. Vaisman,et al.  Accurate prediction of stability changes in protein mutants by combining machine learning with structure based computational mutagenesis , 2008, Bioinform..

[19]  A Keith Dunker,et al.  SPINE-D: Accurate Prediction of Short and Long Disordered Regions by a Single Neural-Network Based Method , 2012, Journal of biomolecular structure & dynamics.

[20]  Jens Meiler,et al.  Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks , 2001 .

[21]  Gang Chen,et al.  Robust prediction of mutation-induced protein stability change by property encoding of amino acids. , 2008, Protein engineering, design & selection : PEDS.

[22]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[23]  Emidio Capriotti,et al.  Bioinformatics Original Paper Predicting the Insurgence of Human Genetic Diseases Associated to Single Point Protein Mutations with Support Vector Machines and Evolutionary Information , 2022 .

[24]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[25]  P. Thomas,et al.  Coding single-nucleotide polymorphisms associated with complex vs. Mendelian disease: evolutionary evidence for differences in molecular effects. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Piero Fariselli,et al.  A three-state prediction of single point mutations on protein stability changes , 2007, BMC Bioinformatics.

[27]  Hong-Bin Shen,et al.  GFO: A data driven approach for optimizing the Gaussian function based similarity metric in computational biology , 2013, Neurocomputing.

[28]  Liangjiang Wang,et al.  Sequence feature-based prediction of protein stability changes upon amino acid substitutions , 2010, BMC Genomics.

[29]  Bairong Shen,et al.  Physicochemical feature-based classification of amino acid mutations. , 2007, Protein engineering, design & selection : PEDS.

[30]  Liang-Tsung Huang,et al.  iPTREE-STAB: interpretable decision tree based method for predicting protein stability changes upon mutations , 2007, Bioinform..

[31]  Nikolay V Dokholyan,et al.  Can contact potentials reliably predict stability of proteins? , 2004, Journal of molecular biology.

[32]  Mauno Vihinen,et al.  Performance of protein stability predictors , 2010, Human mutation.

[33]  M. Gönen,et al.  Machine learning integration for predicting the effect of single amino acid substitutions on protein stability , 2009, BMC Structural Biology.

[34]  L. Mirny,et al.  Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. , 1999, Journal of molecular biology.

[35]  Yaoqi Zhou,et al.  Prediction of One‐Dimensional Structural Properties Of Proteins by Integrated Neural Networks , 2010 .

[36]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[37]  Shiow-Fen Hwang,et al.  Prediction of protein mutant stability using classification and regression tool. , 2007, Biophysical chemistry.

[38]  Arlo Z. Randall,et al.  Prediction of protein stability changes for single‐site mutations using support vector machines , 2005, Proteins.