Prediction of protein stability changes for single‐site mutations using support vector machines

Accurate prediction of protein stability changes resulting from single amino acid mutations is important for understanding protein structures and designing new proteins. We use support vector machines to predict protein stability changes for single amino acid mutations leveraging both sequence and structural information. We evaluate our approach using cross‐validation methods on a large dataset of single amino acid mutations. When only the sign of the stability changes is considered, the predictive method achieves 84% accuracy—a significant improvement over previously published results. Moreover, the experimental results show that the prediction accuracy obtained using sequence alone is close to the accuracy obtained using tertiary structure information. Because our method can accurately predict protein stability changes using primary sequence information only, it is applicable to many situations where the tertiary structure is unknown, overcoming a major limitation of previous methods which require tertiary information. The web server for predictions of protein stability changes upon mutations (MUpro), software, and datasets are available at http://www.igb.uci.edu/servers/servers.html. Proteins 2006. © 2005 Wiley‐Liss, Inc.

[1]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[2]  Stephen L Mayo,et al.  Prudent modeling of core polar residues in computational protein design. , 2003, Journal of molecular biology.

[3]  K Nishikawa,et al.  Experimental verification of the 'stability profile of mutant protein' (SPMP) data using mutant human lysozymes. , 1999, Protein engineering.

[4]  P. Kollman,et al.  Exhaustive mutagenesis in silico: Multicoordinate free energy calculations on proteins and peptides , 2000, Proteins.

[5]  P. A. Bash,et al.  Free energy calculations by computer simulation. , 1987, Science.

[6]  Raphael Guerois,et al.  Energy estimation in protein design. , 2002, Current opinion in structural biology.

[7]  C. Frenz,et al.  Neural network‐based prediction of mutation‐induced protein stability changes in Staphylococcal nuclease at 20 residue positions , 2005, Proteins.

[8]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[9]  Piero Fariselli,et al.  A neural-network-based method for predicting protein stability changes upon single point mutations , 2004, ISMB/ECCB.

[10]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[11]  M. Karplus,et al.  Effective energy functions for protein structure prediction. , 2000, Current opinion in structural biology.

[12]  L. Looger,et al.  Computational design of receptor and sensor proteins with novel functions , 2003, Nature.

[13]  M J Sippl,et al.  Knowledge-based potentials for proteins. , 1995, Current opinion in structural biology.

[14]  T L Blundell,et al.  Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitution and propensity tables. , 1997, Protein engineering.

[15]  B. Dahiyat,et al.  In silico design for protein stabilization. , 1999, Current opinion in biotechnology.

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  R. Abagyan,et al.  Large‐scale prediction of protein geometry and stability changes for arbitrary single point mutations , 2004, Proteins.

[18]  D Gilis,et al.  Stability changes upon mutation of solvent-accessible residues in proteins evaluated by database-derived potentials. , 1996, Journal of molecular biology.

[19]  B. Rost,et al.  Improved prediction of protein secondary structure by use of sequence profiles and neural networks. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Pierre Baldi,et al.  Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles , 2002, Proteins.

[21]  Marianne Rooman,et al.  Prediction of stability changes upon single-site mutations using database-derived potentials , 1999 .

[22]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[23]  H Oschkinat,et al.  Improving the refolding yield of interleukin-4 through the optimization of local interactions. , 2000, Journal of biotechnology.

[24]  Akinori Sarai,et al.  ProTherm, version 2.0: thermodynamic database for proteins and mutants , 2000, Nucleic Acids Res..

[25]  J. C. BurgesChristopher A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[26]  Peter A. Kollman,et al.  Free energy calculations on protein stability: Thr-157 .fwdarw. Val-157 mutation of T4 lysozyme , 1989 .

[27]  M. Levitt,et al.  Accurate prediction of the stability and activity effects of site-directed mutagenesis on a protein core , 1991, Nature.

[28]  C. Dobson,et al.  Stabilisation of alpha-helices by site-directed mutagenesis reveals the importance of secondary structure in the transition state for acylphosphatase folding. , 2000, Journal of molecular biology.

[29]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[30]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[31]  M. Karplus,et al.  Simulation analysis of the stability mutants R96H of bacteriophage T4 lysozyme and I96A of barnase. , 1991, Ciba Foundation symposium.

[32]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[33]  Jeffery G Saven,et al.  Combinatorial protein design. , 2002, Current opinion in structural biology.

[34]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[35]  Burkhard Rost,et al.  UniqueProt: creating representative protein sequence sets , 2003, Nucleic Acids Res..

[36]  S J Wodak,et al.  Contribution of the hydrophobic effect to protein stability: analysis based on simulations of the Ile-96----Ala mutation in barnase. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[37]  R L Jernigan,et al.  Protein stability for single substitution mutants and the extent of local compactness in the denatured state. , 1994, Protein engineering.

[38]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[39]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[40]  Hongyi Zhou,et al.  Quantifying the effect of burial of amino acid residues on protein stability , 2003, Proteins.

[41]  A. Tropsha,et al.  Four-body potentials reveal protein-specific correlations to stability changes caused by hydrophobic core mutations. , 2001, Journal of molecular biology.

[42]  Lee Testing homology modeling on mutant proteins: predicting structural and thermodynamic effects in the Ala98-->Val mutants of T4 lysozyme. , 1995, Folding & design.

[43]  L Serrano,et al.  Elucidating the folding problem of alpha-helices: local motifs, long-range electrostatics, ionic-strength dependence and prediction of NMR parameters. , 1998, Journal of molecular biology.

[44]  Christopher M. Summa,et al.  De novo design and structural characterization of proteins and metalloproteins. , 1999, Annual review of biochemistry.

[45]  Piero Fariselli,et al.  Predicting Free Energy Contribution to the Conformational Stability of Folded Proteins From the Residue Sequence with Radial Basis Function Networks , 1995, ISMB.

[46]  P. Baldi,et al.  Prediction of coordination number and relative solvent accessibility in proteins , 2002, Proteins.

[47]  Bernhard Schölkopf,et al.  A Primer on Kernel Methods , 2004 .

[48]  Hongyi Zhou,et al.  Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability prediction , 2002, Protein science : a publication of the Protein Society.

[49]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[50]  D Gilis,et al.  Predicting protein stability changes upon mutation using database-derived potentials: solvent accessibility determines the importance of local versus non-local interactions along the sequence. , 1997, Journal of molecular biology.

[51]  Marianne Rooman,et al.  PoPMuSiC, rationally designing point mutations in protein structures , 2002, Bioinform..

[52]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[53]  Villegas,et al.  Stabilization of proteins by rational design of alpha-helix stability using helix/coil transition theory. , 1995, Folding & design.

[54]  K. Takano,et al.  Are the parameters of various stabilization factors estimated from mutant human lysozymes compatible with other proteins? , 2001, Protein engineering.

[55]  L Serrano,et al.  Development of the multiple sequence approximation within the AGADIR model of alpha-helix formation: comparison with Zimm-Bragg and Lifson-Roig formalisms. , 1997, Biopolymers.