Accurate prediction of stability changes in protein mutants by combining machine learning with structure based computational mutagenesis

MOTIVATION Accurate predictive models for the impact of single amino acid substitutions on protein stability provide insight into protein structure and function. Such models are also valuable for the design and engineering of new proteins. Previously described methods have utilized properties of protein sequence or structure to predict the free energy change of mutants due to thermal (DeltaDeltaG) and denaturant (DeltaDeltaG(H2O)) denaturations, as well as mutant thermal stability (DeltaT(m)), through the application of either computational energy-based approaches or machine learning techniques. However, accuracy associated with applying these methods separately is frequently far from optimal. RESULTS We detail a computational mutagenesis technique based on a four-body, knowledge-based, statistical contact potential. For any mutation due to a single amino acid replacement in a protein, the method provides an empirical normalized measure of the ensuing environmental perturbation occurring at every residue position. A feature vector is generated for the mutant by considering perturbations at the mutated position and it's ordered six nearest neighbors in the 3-dimensional (3D) protein structure. These predictors of stability change are evaluated by applying machine learning tools to large training sets of mutants derived from diverse proteins that have been experimentally studied and described. Predictive models based on our combined approach are either comparable to, or in many cases significantly outperform, previously published results. AVAILABILITY A web server with supporting documentation is available at http://proteins.gmu.edu/automute.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  M. O. Dayhoff A model of evolutionary change in protein , 1978 .

[3]  T L Blundell,et al.  Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitution and propensity tables. , 1997, Protein engineering.

[4]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[5]  M. Karplus,et al.  Effective energy functions for protein structure prediction. , 2000, Current opinion in structural biology.

[6]  S J Wodak,et al.  Contribution of the hydrophobic effect to protein stability: analysis based on simulations of the Ile-96----Ala mutation in barnase. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[7]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[8]  R. Abagyan,et al.  Large‐scale prediction of protein geometry and stability changes for arbitrary single point mutations , 2004, Proteins.

[9]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[10]  J Moult,et al.  Comparison of database potentials and molecular mechanics force fields. , 1997, Current opinion in structural biology.

[11]  D Gilis,et al.  Stability changes upon mutation of solvent-accessible residues in proteins evaluated by database-derived potentials. , 1996, Journal of molecular biology.

[12]  Majid Masso,et al.  Comprehensive mutagenesis of HIV-1 protease: a computational geometry approach. , 2003, Biochemical and biophysical research communications.

[13]  Piero Fariselli,et al.  Predicting protein stability changes from sequences using support vector machines , 2005, ECCB/JBI.

[14]  Akinori Sarai,et al.  ProTherm, version 4.0: thermodynamic database for proteins and mutants , 2004, Nucleic Acids Res..

[15]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[16]  David P. Dobkin,et al.  The quickhull algorithm for convex hulls , 1996, TOMS.

[17]  Arlo Z. Randall,et al.  Prediction of protein stability changes for single‐site mutations using support vector machines , 2005, Proteins.

[18]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[19]  P. Kollman,et al.  Calculating structures and free energies of complex molecules: combining molecular mechanics and continuum models. , 2000, Accounts of chemical research.

[20]  Iosif I. Vaisman,et al.  Compositional preferences in quadruplets of nearest neighbor residues in protein structures: statistical geometry analysis , 1998, Proceedings. IEEE International Joint Symposia on Intelligence and Systems (Cat. No.98EX174).

[21]  C. Frenz,et al.  Neural network‐based prediction of mutation‐induced protein stability changes in Staphylococcal nuclease at 20 residue positions , 2005, Proteins.

[22]  D Gilis,et al.  Predicting protein stability changes upon mutation using database-derived potentials: solvent accessibility determines the importance of local versus non-local interactions along the sequence. , 1997, Journal of molecular biology.

[23]  L Wang,et al.  The early stage of folding of villin headpiece subdomain observed in a 200-nanosecond fully solvated molecular dynamics simulation. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Piero Fariselli,et al.  I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure , 2005, Nucleic Acids Res..

[25]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[26]  Shiow-Fen Hwang,et al.  Prediction of protein mutant stability using classification and regression tool. , 2007, Biophysical chemistry.

[27]  Majid Masso,et al.  Computational mutagenesis studies of protein structure‐function correlations , 2006, Proteins.

[28]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[29]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[30]  Ron Elber,et al.  The network of sequence flow between protein structures , 2007, Proceedings of the National Academy of Sciences.

[31]  K Nishikawa,et al.  Desk-top analysis of the structural stability of various point mutations introduced into ribonuclease H. , 1995, Journal of molecular biology.

[32]  M. N. Ponnuswamy,et al.  Average assignment method for predicting the stability of protein mutants , 2006, Biopolymers.

[33]  Dietmar Schomburg,et al.  Prediction of protein thermostability with a direction‐ and distance‐dependent knowledge‐based potential , 2005, Protein science : a publication of the Protein Society.

[34]  Iosif I. Vaisman,et al.  Delaunay Tessellation of Proteins: Four Body Nearest-Neighbor Propensities of Amino Acid Residues , 1996, J. Comput. Biol..

[35]  D. L. Veenstra,et al.  Can one predict protein stability? An attempt to do so for residue 133 of T4 lysozyme using a combination of free energy derivatives, PROFEC, and free energy perturbation methods , 1998, Proteins.

[36]  P. Kollman,et al.  Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. , 1998, Science.

[37]  Piero Fariselli,et al.  A neural-network-based method for predicting protein stability changes upon single point mutations , 2004, ISMB/ECCB.

[38]  Hongyi Zhou,et al.  Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability prediction , 2002, Protein science : a publication of the Protein Society.

[39]  P. Kollman,et al.  Exhaustive mutagenesis in silico: Multicoordinate free energy calculations on proteins and peptides , 2000, Proteins.

[40]  Dietmar Schomburg,et al.  Structural analysis and prediction of protein mutant stability using distance and torsion potentials: Role of secondary structure and solvent accessibility , 2006, Proteins.

[41]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[42]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[43]  Shiow-Fen Hwang,et al.  Knowledge acquisition and development of accurate rules for predicting protein stability changes , 2006, Comput. Biol. Chem..

[44]  Marianne Rooman,et al.  PoPMuSiC, rationally designing point mutations in protein structures , 2002, Bioinform..

[45]  Y Wang,et al.  Position-dependent protein mutant profile based on mean force field calculation. , 1996, Protein engineering.