In silico functional dissection of saturation mutagenesis: Interpreting the relationship between phenotypes and changes in protein stability, interactions and activity

Despite interest in associating polymorphisms with clinical or experimental phenotypes, functional interpretation of mutation data has lagged behind generation of data from modern high-throughput techniques and the accurate prediction of the molecular impact of a mutation remains a non-trivial task. We present here an integrated knowledge-driven computational workflow designed to evaluate the effects of experimental and disease missense mutations on protein structure and interactions. We exemplify its application with analyses of saturation mutagenesis of DBR1 and Gal4 and show that the experimental phenotypes for over 80% of the mutations correlate well with predicted effects of mutations on protein stability and RNA binding affinity. We also show that analysis of mutations in VHL using our workflow provides valuable insights into the effects of mutations, and their links to the risk of developing renal carcinoma. Taken together the analyses of the three examples demonstrate that structural bioinformatics tools, when applied in a systematic, integrated way, can rapidly analyse a given system to provide a powerful approach for predicting structural and functional effects of thousands of mutations in order to reveal molecular mechanisms leading to a phenotype. Missense or non-synonymous mutations are nucleotide substitutions that alter the amino acid sequence of a protein. Their effects can range from modifying transcription, translation, processing and splicing, localization, changing stability of the protein, altering its dynamics or interactions with other proteins, nucleic acids and ligands, including small molecules and metal ions. The advent of high-throughput techniques including sequencing and saturation mutagenesis has provided large amounts of phenotypic data linked to mutations. However, one of the hurdles has been understanding and quantifying the effects of a particular mutation, and how they translate into a given phenotype. One approach to overcome this is to use robust, accurate and scalable computational methods to understand and correlate structural effects of mutations with disease.

[1]  Michael Carey,et al.  DNA recognition by GAL4: structure of a protein-DNA complex , 1992, Nature.

[2]  Douglas E. V. Pires,et al.  mCSM: predicting the effects of mutations in proteins using graph-based signatures , 2013, Bioinform..

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[5]  Jay Shendure,et al.  Saturation Editing of Genomic Regions by Multiplex Homology-Directed Repair , 2014, Nature.

[6]  Manqing Hong,et al.  Structural basis for dimerization in DNA recognition by Gal4. , 2008, Structure.

[7]  T L Blundell,et al.  Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitution and propensity tables. , 1997, Protein engineering.

[8]  Iosif I. Vaisman,et al.  AUTO-MUTE 2.0: A Portable Framework with Enhanced Capabilities for Predicting Protein Functional Consequences upon Mutation , 2014, Adv. Bioinformatics.

[9]  J. Leatherwood,et al.  An amino-terminal fragment of GAL4 binds DNA as a dimer. , 1989, Journal of molecular biology.

[10]  T. Blundell,et al.  Distinguishing structural and functional restraints in evolution in order to identify interaction sites. , 2004, Journal of molecular biology.

[11]  Douglas E. V. Pires,et al.  Germline Mutations in the CDKN2B Tumor Suppressor Gene Predispose to Renal Cell Carcinoma. , 2015, Cancer discovery.

[12]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[13]  J. Kitzman,et al.  Massively Parallel Single Amino Acid Mutagenesis , 2014, Nature Methods.

[14]  T. Blundell,et al.  An integrated computational approach can classify VHL missense mutations according to risk of clear cell renal carcinoma , 2014, Human molecular genetics.

[15]  S. Harrison,et al.  DNA sequence preferences of GAL4 and PPR1: how a subset of Zn2 Cys6 binuclear cluster proteins recognizes DNA , 1996, Molecular and cellular biology.

[16]  Piero Fariselli,et al.  A neural-network-based method for predicting protein stability changes upon single point mutations , 2004, ISMB/ECCB.

[17]  I. Adzhubei,et al.  Predicting Functional Effect of Human Missense Mutations Using PolyPhen‐2 , 2013, Current protocols in human genetics.

[18]  J. Boeke,et al.  Human RNA lariat debranching enzyme cDNA complements the phenotypes of Saccharomyces cerevisiae dbr1 and Schizosaccharomyces pombe dbr1 mutants. , 2000, Nucleic acids research.

[19]  D. Eisenberg,et al.  VERIFY3D: assessment of protein models with three-dimensional profiles. , 1997, Methods in enzymology.

[20]  Z. Deng,et al.  Structural interaction fingerprint (SIFt): a novel method for analyzing three-dimensional protein-ligand binding interactions. , 2004, Journal of medicinal chemistry.

[21]  M. Ptashne,et al.  Separation of DNA binding from the transcription-activating function of a eukaryotic regulatory protein. , 1986, Science.

[22]  M. Michael Gromiha,et al.  CUPSAT: prediction of protein stability upon point mutations , 2006, Nucleic Acids Res..

[23]  M. Ptashne,et al.  GAL11P: A yeast mutation that potentiates the effect of weak GAL4-derived activators , 1990, Cell.

[24]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[25]  S. Harrison,et al.  Solution structure of the DNA-binding domain of Cd2-GAL4 from S. cerevisiae , 1992, Nature.

[26]  Tom L. Blundell,et al.  Flexibility and small pockets at protein–protein interfaces: New insights into druggability , 2015, Progress in biophysics and molecular biology.

[27]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[28]  Arlo Z. Randall,et al.  Prediction of protein stability changes for single‐site mutations using support vector machines , 2005, Proteins.

[29]  Thomas Simonson,et al.  Testing the Coulomb/Accessible Surface Area solvent model for protein stability, ligand binding, and protein design , 2008, BMC Bioinformatics.

[30]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[31]  Jun Ma,et al.  A new class of yeast transcriptional activators , 1987, Cell.

[32]  Philippe Bogaerts,et al.  Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0 , 2009, Bioinform..

[33]  M. Parker,et al.  Structural approaches to probing metal interaction with proteins. , 2012, Journal of inorganic biochemistry.

[34]  E. Arnold,et al.  Multifaceted Roles of Crystallography in Modern Drug Discovery , 2015, NATO Science for Peace and Security Series A: Chemistry and Biology.

[35]  N. Pokala,et al.  Energy functions for protein design: adjustment with protein-protein complex affinities, models for the unfolded state, and negative design of solubility and specificity. , 2005, Journal of molecular biology.

[36]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[37]  Douglas E. V. Pires,et al.  DUET: a server for predicting effects of mutations on protein stability using an integrated computational approach , 2014, Nucleic Acids Res..

[38]  Douglas E. V. Pires,et al.  Platinum: a database of experimentally measured effects of mutations on structurally defined protein–ligand complexes , 2014, Nucleic Acids Res..

[39]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[40]  David T. W. Jones,et al.  Mechismo: predicting the mechanistic impact of mutations and modifications on molecular interactions , 2014, Nucleic acids research.

[41]  R. Cherny,et al.  Regulation of insulin-regulated membrane aminopeptidase activity by its C-terminal domain. , 2011, Biochemistry.

[42]  Douglas E. V. Pires,et al.  pkCSM: Predicting Small-Molecule Pharmacokinetic and Toxicity Properties Using Graph-Based Signatures , 2015, Journal of medicinal chemistry.

[43]  Michael W Parker,et al.  Identification and characterization of a new cognitive enhancer based on inhibition of insulin‐regulated aminopeptidase , 2008, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[44]  Piero Fariselli,et al.  I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure , 2005, Nucleic Acids Res..

[45]  Ludevit Kadasi,et al.  Twelve novel HGD gene variants identified in 99 alkaptonuria patients: focus on ‘black bone disease’ in Italy , 2015, European Journal of Human Genetics.

[46]  P. Hart,et al.  Structural basis of lariat RNA recognition by the intron debranching enzyme Dbr1 , 2014, Nucleic acids research.

[47]  Douglas E. V. Pires,et al.  Analysis of HGD Gene Mutations in Patients with Alkaptonuria from the United Kingdom: Identification of Novel Mutations. , 2015, JIMD reports.

[48]  François Stricher,et al.  The FoldX web server: an online force field , 2005, Nucleic Acids Res..

[49]  David F. Burke,et al.  Andante: reducing side-chain rotamer search space during comparative modeling using environment-specific substitution probabilities , 2007, Bioinform..