Quantification of the effect of mutations using a global probability model of natural sequence variation

Modern biomedicine is challenged to predict the effects of genetic variation. Systematic functional assays of point mutants of proteins have provided valuable empirical information, but vast regions of sequence space remain unexplored. Fortunately, the mutation-selection process of natural evolution has recorded rich information in the diversity of natural protein sequences. Here, building on probabilistic models for correlated amino-acid substitutions that have been successfully applied to determine the three-dimensional structures of proteins, we present a statistical approach for quantifying the contribution of residues and their interactions to protein function, using a statistical energy, the evolutionary Hamiltonian. We find that these probability models predict the experimental effects of mutations with reasonable accuracy for a number of proteins, especially where the selective pressure is similar to the evolutionary pressure on the protein, such as antibiotics.

[1]  F. Arnold,et al.  Protein stability promotes evolvability. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[2]  J. Kitzman,et al.  Massively Parallel Single Amino Acid Mutagenesis , 2014, Nature Methods.

[3]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[4]  J. Dushoff,et al.  Prevalence of Epistasis in the Evolution of Influenza A Surface Proteins , 2011, PLoS genetics.

[5]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[6]  Maria Jesus Martin,et al.  SIFTS: Structure Integration with Function, Taxonomy and Sequences resource , 2012, Nucleic Acids Res..

[7]  Thomas A. Hopf,et al.  Sequence co-evolution gives 3D contacts and structures of protein complexes , 2014, eLife.

[8]  Sivaraman Balakrishnan,et al.  Learning generative models for protein fold families , 2011, Proteins.

[9]  G. Stormo,et al.  Correlated mutations in models of protein sequences: phylogenetic and structural effects , 1999 .

[10]  Jesse D. Bloom,et al.  Inferring Stabilizing Mutations from Protein Phylogenies: Application to Influenza Hemagglutinin , 2009, PLoS Comput. Biol..

[11]  D. Baker,et al.  Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information , 2014, eLife.

[12]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[13]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[14]  Andreas Wagner,et al.  Neutralism and selectionism: a network-based reconciliation , 2008, Nature Reviews Genetics.

[15]  Brian K Shoichet,et al.  The Structural Bases of Antibiotic Resistance in the Clinically Derived Mutant β-Lactamases TEM-30, TEM-32, and TEM-34* , 2002, The Journal of Biological Chemistry.

[16]  Jesse D Bloom,et al.  The inherent mutational tolerance and antigenic evolvability of influenza hemagglutinin , 2014, bioRxiv.

[17]  Jie Zhang,et al.  Analysis of BRCA1 Variants in Double‐Strand Break Repair by Homologous Recombination and Single‐Strand Annealing , 2013, Human mutation.

[18]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[19]  L. Starita,et al.  Massively Parallel Functional Analysis of BRCA1 RING Domain Variants , 2017, Genetics.

[20]  Jing Hu,et al.  SIFT web server: predicting effects of amino acid substitutions on proteins , 2012, Nucleic Acids Res..

[21]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[22]  Karsten M. Borgwardt,et al.  The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity , 2015, Human mutation.

[23]  Masatoshi Nei,et al.  Origin and evolution of influenza virus hemagglutinin genes. , 2002, Molecular biology and evolution.

[24]  David L. Young,et al.  Combining Natural Sequence Variation with High Throughput Mutational Data to Reveal Protein Interaction Sites , 2015, PLoS genetics.

[25]  Nicholas C. Wu,et al.  A Comprehensive Biophysical Description of Pairwise Epistasis throughout an Entire Protein Domain , 2014, Current Biology.

[26]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[27]  R. Goldstein,et al.  Strong evidence for protein epistasis, weak evidence against it , 2014, Proceedings of the National Academy of Sciences.

[28]  Thomas A. Hopf,et al.  Protein structure prediction from sequence variation , 2012, Nature Biotechnology.

[29]  L. Landau,et al.  statistical-physics-part-1 , 1958 .

[30]  Jay Shendure,et al.  Saturation Editing of Genomic Regions by Multiplex Homology-Directed Repair , 2014, Nature.

[31]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[32]  Thomas A. Hopf,et al.  Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing , 2012, Cell.

[33]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[34]  S. Sunyaev,et al.  Identification of cis-suppression of human disease mutations by comparative genomics , 2015, Nature.

[35]  Arup K Chakraborty,et al.  Spin models inferred from patient-derived viral sequence data faithfully describe HIV fitness landscapes. , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  Guido Tiana,et al.  The network of stabilizing contacts in proteins studied by coevolutionary data. , 2013, The Journal of chemical physics.

[37]  Brian E. Granger,et al.  IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.

[38]  David L. Young,et al.  Massively Parallel Functional Analysis of BRCA1 RING Domain Variants , 2015, Genetics.

[39]  Michael T. Laub,et al.  Pervasive degeneracy and epistasis in a protein-protein interface , 2015, Science.

[40]  Stefan M. Larson,et al.  The relationship between conservation, thermodynamic stability, and function in the SH3 domain hydrophobic core. , 2003, Journal of molecular biology.

[41]  Wanzhi Huang,et al.  A natural polymorphism in beta-lactamase is a global suppressor. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Peter B. McGarvey,et al.  UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches , 2014, Bioinform..

[43]  Colin A Russell,et al.  Predicting evolution from the shape of genealogical trees , 2014, eLife.

[44]  F. J. Poelwijk,et al.  The spatial architecture of protein function and adaptation , 2012, Nature.

[45]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[46]  M. Weigt,et al.  Coevolutionary Landscape Inference and the Context-Dependence of Mutations in Beta-Lactamase TEM-1 , 2015, bioRxiv.

[47]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[48]  R. Goldstein,et al.  Amino acid coevolution induces an evolutionary Stokes shift , 2012, Proceedings of the National Academy of Sciences.

[49]  Marc A Suchard,et al.  Stability-mediated epistasis constrains the evolution of an influenza protein , 2013, eLife.

[50]  C. Sander,et al.  Determinants of protein function revealed by combinatorial entropy optimization , 2007, Genome Biology.

[51]  Christopher Jarzynski,et al.  Using Sequence Alignments to Predict Protein Structure and Stability With High Accuracy , 2012, 1207.2484.

[52]  D. Baker,et al.  High Resolution Mapping of Protein Sequence–Function Relationships , 2010, Nature Methods.

[53]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[54]  Michael S. Breen,et al.  Epistasis as the primary factor in molecular evolution , 2012, Nature.

[55]  M. Harms,et al.  Evolutionary biochemistry: revealing the historical and physical causes of protein properties , 2013, Nature Reviews Genetics.

[56]  M. Lässig,et al.  A predictive fitness model for influenza , 2014, Nature.

[57]  E. Aurell,et al.  Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[58]  Ágnes Tóth-Petróczy,et al.  Systematic Mapping of Protein Mutational Space by Prolonged Drift Reveals the Deleterious Effects of Seemingly Neutral Mutations , 2015, PLoS Comput. Biol..

[59]  I. Adzhubei,et al.  Predicting Functional Effect of Human Missense Mutations Using PolyPhen‐2 , 2013, Current protocols in human genetics.

[60]  R. Ranganathan,et al.  Evolvability as a Function of Purifying Selection in TEM-1 β-Lactamase , 2015, Cell.

[61]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[62]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[63]  Robert B. Heckendorn,et al.  Should evolutionary geneticists worry about higher-order epistasis? , 2013, Current opinion in genetics & development.

[64]  Najeeb M. Halabi,et al.  Protein Sectors: Evolutionary Units of Three-Dimensional Structure , 2009, Cell.

[65]  Benjamin P. Roscoe,et al.  Systematic exploration of ubiquitin sequence, E1 activation efficiency, and experimental fitness in yeast. , 2014, Journal of molecular biology.

[66]  David L. Young,et al.  Deep mutational scanning of an RRM domain of the Saccharomyces cerevisiae poly(A)-binding protein , 2013, RNA.

[67]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[68]  T. Mikkelsen,et al.  Comprehensive mutational scanning of a kinase in vivo reveals substrate-dependent fitness landscapes , 2014, Nucleic acids research.

[69]  C. Sander,et al.  Predicting the functional impact of protein mutations: application to cancer genomics , 2011, Nucleic acids research.

[70]  Dan S. Tawfik,et al.  Stability effects of mutations and protein evolvability. , 2009, Current opinion in structural biology.

[71]  Kelly M. Thayer,et al.  Analyses of the effects of all ubiquitin point mutants on yeast growth rate. , 2013, Journal of molecular biology.

[72]  S. Fields,et al.  A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function , 2012, Proceedings of the National Academy of Sciences.