Assessing how multiple mutations affect protein stability using rigid cluster size distributions

Predicting how amino acid substitutions affect the stability of a protein has relevance to drug design and may help elucidate the mechanisms of disease-causing protein variants. Unfortunately, wet-lab experiments are time intensive, and to the best of our knowledge there are no efficient computational techniques to asses the effect of multiple mutations. In this work we present a new approach for inferring the effects of single and multiple mutations on a protein's structure. Our rMutant algorithm generates in silico mutants with single or multiple amino acid substitutions. We use a graph-theoretic rigidity analysis approach to compute the distributions of rigid cluster sizes of the wild type and mutant structures which we then analyze to infer the effect of the amino acid substitutions. We successfully predict the effects of multiple mutations for which our previous methods were unsuccessful. We validate the predictions of our computational approach against experimental ΔΔG data. To demonstrate the utility of using rigid cluster size distributions to infer the effects of mutations, we also present a Random Forest Machine Learning approach that relies on rigidity data to predict which residues are critical to the stability of a protein. We predict the destabilizing effects of a single or multiple mutations with over 86% accuracy.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Yang Li,et al.  KINARI-Web: a server for protein rigidity analysis , 2011, Nucleic Acids Res..

[3]  Roland L. Dunbrack,et al.  Conformational analysis of the backbone-dependent rotamer preferences of protein sidechains , 1994, Nature Structural Biology.

[4]  Piero Fariselli,et al.  I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure , 2005, Nucleic Acids Res..

[5]  Hongyi Zhou,et al.  Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability prediction , 2002, Protein science : a publication of the Protein Society.

[6]  Iosif I. Vaisman,et al.  Accurate prediction of enzyme mutant activity based on a multibody statistical potential , 2007, Bioinform..

[7]  Angela D. Wilkins,et al.  Evolutionary trace for prediction and redesign of protein functional sites. , 2012, Methods in molecular biology.

[8]  U. Sauer,et al.  Dissection of helix capping in T4 lysozyme by structural and thermodynamic analysis of six amino acid substitutions at Thr 59. , 1992, Biochemistry.

[9]  R. Shafer,et al.  HIV-1 Protease Mutations and Protease Inhibitor Cross-Resistance , 2010, Antimicrobial Agents and Chemotherapy.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12]  T L Blundell,et al.  Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitution and propensity tables. , 1997, Protein engineering.

[13]  Brian W Matthews,et al.  Contributions of all 20 amino acids at site 96 to the stability and structure of T4 lysozyme , 2009, Protein science : a publication of the Protein Society.

[14]  D Gilis,et al.  Predicting protein stability changes upon mutation using database-derived potentials: solvent accessibility determines the importance of local versus non-local interactions along the sequence. , 1997, Journal of molecular biology.

[15]  Malgorzata B. Tracka,et al.  Redistribution of Flexibility in Stabilizing Antibody Fragment Mutants Follows Le Châtelier’s Principle , 2014, PloS one.

[16]  E. Alexov,et al.  Approaches and resources for prediction of the effects of non-synonymous single nucleotide polymorphism on protein function and interactions. , 2008, Current pharmaceutical biotechnology.

[17]  M. Michael Gromiha,et al.  CUPSAT: prediction of protein stability upon point mutations , 2006, Nucleic Acids Res..

[18]  G. Weiss,et al.  Combinatorial alanine-scanning. , 2001, Current opinion in chemical biology.

[19]  Akinori Sarai,et al.  ProTherm and ProNIT: thermodynamic databases for proteins and protein–nucleic acid interactions , 2005, Nucleic Acids Res..

[20]  Nurit Haspel,et al.  An Evolutionary Conservation & Rigidity Analysis Machine Learning Approach for Detecting Critical Protein Residues , 2013, BCB.

[21]  D. Jacobs,et al.  Protein flexibility predictions using graph theory , 2001, Proteins.

[22]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[23]  John Moult,et al.  A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. , 2005, Current opinion in structural biology.

[24]  Nurit Haspel,et al.  A conservation and rigidity based method for detecting critical protein residues , 2013, BMC Structural Biology.

[25]  M. Levitt,et al.  Accurate prediction of the stability and activity effects of site-directed mutagenesis on a protein core , 1991, Nature.

[26]  Ileana Streinu,et al.  Using rigidity analysis to probe mutation-induced structural changes in proteins , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[27]  Arlo Z. Randall,et al.  Prediction of protein stability changes for single‐site mutations using support vector machines , 2005, Proteins.

[28]  J. Reis-Filho,et al.  Benchmarking mutation effect prediction algorithms using functionally validated cancer-related missense mutations , 2014, Genome Biology.

[29]  Donald J. Jacobs,et al.  An Interfacial Thermodynamics Model for Protein Stability , 2012 .

[30]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[31]  J. Ponder,et al.  Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. , 1987, Journal of molecular biology.

[32]  M. Levitt,et al.  Conformation of amino acid side-chains in proteins. , 1978, Journal of molecular biology.