An Experimentally Determined Evolutionary Model Dramatically Improves Phylogenetic Fit

All modern approaches to molecular phylogenetics require a quantitative model for how genes evolve. Unfortunately, existing evolutionary models do not realistically represent the site-heterogeneous selection that governs actual sequence change. Attempts to remedy this problem have involved augmenting these models with a burgeoning number of free parameters. Here I demonstrate an alternative: experimental determination of a parameter-free evolutionary model via mutagenesis, functional selection, and deep sequencing. Using this strategy, I create an evolutionary model for influenza nucleoprotein that describes the gene phylogeny far better than existing models with dozens or even hundreds of free parameters. Emerging high-throughput experimental strategies such as the one employed here provide fundamentally new information that has the potential to transform the sensitivity of phylogenetic and genetic analyses.

[1]  F. Arnold,et al.  Protein stability promotes evolvability. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[2]  A. Halpern,et al.  Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies. , 1998, Molecular biology and evolution.

[3]  A. Dean,et al.  Pervasive Cryptic Epistasis in Molecular Evolution , 2010, PLoS genetics.

[4]  Kelly M. Thayer,et al.  Analyses of the effects of all ubiquitin point mutants on yeast growth rate. , 2013, Journal of molecular biology.

[5]  Janine N. Copp,et al.  Directed Evolution Library Creation , 2014, Methods in Molecular Biology.

[6]  David Baltimore,et al.  Permissive Secondary Mutations Enable the Evolution of Influenza Oseltamivir Resistance , 2010, Science.

[7]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[8]  Hervé Philippe,et al.  Statistical potentials for improved structurally constrained evolutionary models. , 2010, Molecular biology and evolution.

[9]  R. Rabadán,et al.  Anomalies in the Influenza Virus Genome Database: New Biology or Laboratory Errors? , 2008, Journal of Virology.

[10]  Christoph Adami,et al.  Thermodynamic prediction of protein neutrality. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[11]  David C. Jones,et al.  Assessing the impact of secondary structure and solvent accessibility on protein evolution. , 1998, Genetics.

[12]  Ian Holmes,et al.  An empirical codon model for protein sequence evolution. , 2007, Molecular biology and evolution.

[13]  Marcelo Serrano Zanetti,et al.  CodonPhyML: Fast Maximum Likelihood Phylogeny Estimation under Codon Substitution Models , 2013, Molecular biology and evolution.

[14]  Gert Vriend,et al.  Everyday , 2020, Oxford Research Encyclopedia of Literature.

[15]  Arnold J. Levine,et al.  Highly Conserved Regions of Influenza A Virus Polymerase Gene Segments Are Critical for Efficient Viral RNA Packaging , 2007, Journal of Virology.

[16]  Austin G. Meyer,et al.  Maximum Allowed Solvent Accessibilites of Residues in Proteins , 2012, PloS one.

[17]  J. Parvin,et al.  Measurement of the mutation rates of animal viruses: influenza A virus and poliovirus type 1 , 1986, Journal of virology.

[18]  Jonathan P. Bollback,et al.  Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology , 2001, Science.

[19]  Asif U. Tamuri,et al.  A Penalized-Likelihood Method to Estimate the Distribution of Selection Coefficients from Phylogenetic Data , 2014, Genetics.

[20]  G. Crooks,et al.  WebLogo: A sequence logo generator, Genome Research, , 2004 .

[21]  V. Georgiev Virology , 1955, Nature.

[22]  J. Mattick,et al.  Genome research , 1990, Nature.

[23]  R. Webster,et al.  A DNA transfection system for generation of influenza A virus from eight plasmids. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Senjie Lin,et al.  Serious Overestimation in Quantitative PCR by Circular (Supercoiled) Plasmid Standard: Microalgal pcna as the Model Gene , 2010, PloS one.

[25]  Patrick C Cirino,et al.  Generating mutant libraries using error-prone PCR. , 2003, Methods in molecular biology.

[26]  E. Susko,et al.  A class frequency mixture model that adjusts for site-specific amino acid frequencies and improves inference of protein phylogeny , 2008, BMC Evolutionary Biology.

[27]  A. R. Wagner Molecular Biology and Evolution , 2001 .

[28]  Ted M. Lakowski,et al.  Analytical Biochemistry , 1960, Nature.

[29]  T. Tatusova,et al.  The Influenza Virus Resource at the National Center for Biotechnology Information , 2007, Journal of Virology.

[30]  Joseph Felsenstein,et al.  Maximum Likelihood and Minimum-Steps Methods for Estimating Evolutionary Trees from Data on Discrete Characters , 1973 .

[31]  R. Krug,et al.  The mechanism by which influenza A virus nucleoprotein forms oligomers and binds RNA , 2006, Nature.

[32]  J. Bloom,et al.  Mutational effects on stability are largely conserved during protein evolution , 2013, Proceedings of the National Academy of Sciences.

[33]  Cameron Neylon,et al.  Chemical and biochemical strategies for the randomization of protein encoding DNA sequences: library construction methods for directed evolution. , 2004, Nucleic acids research.

[34]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[35]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[36]  H. Goto,et al.  A novel mechanism for the acquisition of virulence by a human influenza A virus. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Marc A Suchard,et al.  Stability-mediated epistasis constrains the evolution of an influenza protein , 2013, eLife.

[38]  Dan S. Tawfik,et al.  Robustness–epistasis link shapes the fitness landscape of a randomly drifting protein , 2006, Nature.

[39]  Olivier Gascuel,et al.  FLU, an amino acid substitution model for influenza proteins , 2010, BMC Evolutionary Biology.

[40]  K. Pearson,et al.  Biometrika , 1902, The American Naturalist.

[41]  Tokiko Watanabe,et al.  Generation of influenza A viruses entirely from cloned cDNAs. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[42]  H. Philippe,et al.  A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. , 2004, Molecular biology and evolution.

[43]  Joseph B Hiatt,et al.  Activity-enhancing mutations in an E3 ubiquitin ligase identified by high-throughput mutagenesis , 2013, Proceedings of the National Academy of Sciences.

[44]  Alexei J. Drummond,et al.  Bayesian Selection of Nucleotide Substitution Models and Their Site Assignments , 2012, Molecular biology and evolution.

[45]  Sergei L. Kosakovsky Pond,et al.  HyPhy: hypothesis testing using phylogenies , 2005, Bioinform..

[46]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[47]  G. Schreiber,et al.  Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. , 2009, Protein engineering, design & selection : PEDS.

[48]  R. Ogliore,et al.  Ratio Estimation in SIMS Analysis , 2011, 1106.0797.

[49]  Hervé Philippe,et al.  Mutation-selection models of coding sequence evolution with site-heterogeneous amino acid fitness profiles , 2010, Proceedings of the National Academy of Sciences.

[50]  N. Goldman,et al.  Codon-substitution models for heterogeneous selection pressure at amino acid sites. , 2000, Genetics.

[51]  Raghavan Varadarajan,et al.  A rapid, efficient, and economical inverse polymerase chain reaction-based method for generating a site saturation mutant library. , 2014, Analytical biochemistry.

[52]  Ian Holmes,et al.  Estimating Empirical Codon Hidden Markov Models , 2012, Molecular biology and evolution.

[53]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[54]  R. Nielsen,et al.  Synonymous and nonsynonymous rate variation in nuclear genes of mammals , 1998, Journal of Molecular Evolution.

[55]  M. Ostermeier,et al.  PFunkel: Efficient, Expansive, User-Defined Mutagenesis , 2012, PloS one.

[56]  Y. X. Wang,et al.  Nuclear Instruments and Methods in Physics Research Section B : Beam Interactions with Materials and Atoms , 2018 .

[57]  Hervé Philippe,et al.  Computational methods for evaluating phylogenetic models of coding sequence evolution with dependence between codons. , 2009, Molecular biology and evolution.

[58]  D. Halligan,et al.  Spontaneous Mutation Accumulation Studies in Evolutionary Genetics , 2009 .

[59]  Hirohisa Kishino,et al.  Population genetics without intraspecific data. , 2007, Molecular biology and evolution.

[60]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[61]  N. Rodrigue On the Statistical Interpretation of Site-Specific Variables in Phylogeny-Based Substitution Models , 2013, Genetics.

[62]  B. Bainbridge,et al.  Genetics , 1981, Experientia.

[63]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[64]  D. Posada,et al.  Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests. , 2004, Systematic biology.

[65]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[66]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[67]  R. Goldstein,et al.  Amino acid coevolution induces an evolutionary Stokes shift , 2012, Proceedings of the National Academy of Sciences.

[68]  Jesse J. Salk,et al.  Detection of ultra-rare mutations by next-generation sequencing , 2012, Proceedings of the National Academy of Sciences.

[69]  David L. Young,et al.  Deep mutational scanning of an RRM domain of the Saccharomyces cerevisiae poly(A)-binding protein , 2013, RNA.

[70]  Michael M. Desai,et al.  Beneficial Mutation–Selection Balance and the Effect of Linkage on Positive Selection , 2006, Genetics.

[71]  D. Fowler,et al.  Deep mutational scanning: assessing protein function on a massive scale. , 2011, Trends in biotechnology.

[72]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[73]  O. Gascuel,et al.  Phylogenetic mixture models for proteins , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[74]  Jay Shendure,et al.  Parallel, tag-directed assembly of locally derived short sequence reads , 2010, Nature Methods.

[75]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[76]  C. Obinger,et al.  Construction of a Stability Landscape of the CH3 Domain of Human IgG1 by Combining Directed Evolution with High Throughput Sequencing , 2012, Journal of molecular biology.

[77]  Richard A. Goldstein,et al.  Estimating the Distribution of Selection Coefficients from Phylogenetic Data Using Sitewise Mutation-Selection Models , 2012, Genetics.

[78]  Jeffrey A. Hussmann,et al.  High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing , 2013, Proceedings of the National Academy of Sciences.

[79]  David C. Jones,et al.  Combining protein evolution and secondary structure. , 1996, Molecular biology and evolution.

[80]  A. G. Day,et al.  Step-wise mutation of barnase to binase. A procedure for engineering increased stability of proteins and an experimental analysis of the evolution of protein stability. , 1993, Journal of molecular biology.

[81]  C. Wilke,et al.  Thermodynamics of Neutral Protein Evolution , 2006, Genetics.

[82]  P. Digard,et al.  The influenza virus nucleoprotein: a multifunctional RNA-binding protein pivotal to virus replication. , 2002, The Journal of general virology.

[83]  D. Baker,et al.  High Resolution Mapping of Protein Sequence–Function Relationships , 2010, Nature Methods.

[84]  W. Wong,et al.  Bayes empirical bayes inference of amino acid sites under positive selection. , 2005, Molecular biology and evolution.

[85]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.