Estimating the Distribution of Selection Coefficients from Phylogenetic Data Using Sitewise Mutation-Selection Models

Estimation of the distribution of selection coefficients of mutations is a long-standing issue in molecular evolution. In addition to population-based methods, the distribution can be estimated from DNA sequence data by phylogenetic-based models. Previous models have generally found unimodal distributions where the probability mass is concentrated between mildly deleterious and nearly neutral mutations. Here we use a sitewise mutation–selection phylogenetic model to estimate the distribution of selection coefficients among novel and fixed mutations (substitutions) in a data set of 244 mammalian mitochondrial genomes and a set of 401 PB2 proteins from influenza. We find a bimodal distribution of selection coefficients for novel mutations in both the mitochondrial data set and for the influenza protein evolving in its natural reservoir, birds. Most of the mutations are strongly deleterious with the rest of the probability mass concentrated around mildly deleterious to neutral mutations. The distribution of the coefficients among substitutions is unimodal and symmetrical around nearly neutral substitutions for both data sets at adaptive equilibrium. About 0.5% of the nonsynonymous mutations and 14% of the nonsynonymous substitutions in the mitochondrial proteins are advantageous, with 0.5% and 24% observed for the influenza protein. Following a host shift of influenza from birds to humans, however, we find among novel mutations in PB2 a trimodal distribution with a small mode of advantageous mutations.

[1]  Michael D. Hendy,et al.  Mathematical Elegance with Biochemical Realism: The Covarion Model of Molecular Evolution , 2001, Journal of Molecular Evolution.

[2]  A. G. Pedersen,et al.  Computational Molecular Evolution , 2013 .

[3]  R. Cartwright,et al.  History can matter: non-Markovian behavior of ancestral lineages. , 2011, Systematic biology.

[4]  D. Bolon,et al.  Experimental illumination of a fitness landscape , 2011, Proceedings of the National Academy of Sciences.

[5]  Nick Goldman,et al.  What's in a likelihood? Simple models of protein evolution and the contribution of structurally viable reconstructions to the likelihood. , 2011, Systematic biology.

[6]  Asif U. Tamuri,et al.  Charting the Host Adaptation of Influenza Viruses , 2010, Molecular biology and evolution.

[7]  J. Jensen,et al.  Experimental illumination of a fitness landscape , 2011, Proceedings of the National Academy of Sciences.

[8]  Hervé Philippe,et al.  Statistical potentials for improved structurally constrained evolutionary models. , 2010, Molecular biology and evolution.

[9]  Hervé Philippe,et al.  Mutation-selection models of coding sequence evolution with site-heterogeneous amino acid fitness profiles , 2010, Proceedings of the National Academy of Sciences.

[10]  Richard A. Goldstein,et al.  Identifying Changes in Selective Constraints: Host Shifts in Influenza , 2009, PLoS Comput. Biol..

[11]  R. Goldstein,et al.  Using Non-Homogeneous Models of Nucleotide Substitution to Identify Host Shift Events: Application to the Origin of the 1918 ‘Spanish’ Influenza Pandemic Virus , 2009, Journal of Molecular Evolution.

[12]  Lorenz Wernisch,et al.  Estimating Translational Selection in Eukaryotic Genomes , 2008, Molecular biology and evolution.

[13]  M. Holder,et al.  Evaluating the robustness of phylogenetic methods to among-site variability in substitution processes , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[14]  A. Löytynoja,et al.  Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis , 2008, Science.

[15]  C. Viboud,et al.  Explorer The genomic and epidemiological dynamics of human influenza A virus , 2016 .

[16]  E. Holmes,et al.  Homologous Recombination Is Very Rare or Absent in Human Influenza A Virus , 2008, Journal of Virology.

[17]  Ziheng Yang,et al.  Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage. , 2008, Molecular biology and evolution.

[18]  Hirohisa Kishino,et al.  Population genetics without intraspecific data. , 2007, Molecular biology and evolution.

[19]  P. Keightley,et al.  A Comparison of Models to Infer the Distribution of Fitness Effects of New Mutations , 2013, Genetics.

[20]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[21]  Ian Holmes,et al.  An empirical codon model for protein sequence evolution. , 2007, Molecular biology and evolution.

[22]  John Parsch,et al.  INAUGURAL ARTICLE by a Recently Elected Academy Member:Prevalence of positive selection among nearly neutral amino acid replacements in Drosophila , 2007 .

[23]  John Maynard Smith,et al.  The hitch-hiking effect of a favourable gene. , 1974, Genetical research.

[24]  Sudhir Kumar,et al.  Higher intensity of purifying selection on >90% of the human genes revealed by the intrinsic replacement mutation rates. , 2006, Molecular biology and evolution.

[25]  H. Philippe,et al.  Assessing site-interdependent phylogenetic models of sequence evolution. , 2006, Molecular biology and evolution.

[26]  A. Eyre-Walker,et al.  The Distribution of Fitness Effects of New Deleterious Amino Acid Mutations in Humans , 2006, Genetics.

[27]  E. Holmes,et al.  Evolutionary Basis of Codon Usage and Nucleotide Composition Bias in Vertebrate DNA Viruses , 2006, Journal of Molecular Evolution.

[28]  A. Kondrashov,et al.  Distribution of the strength of selection against amino acid replacements in human proteins. , 2005, Human molecular genetics.

[29]  Jeffery K. Taubenberger,et al.  Characterization of the 1918 influenza virus polymerase genes , 2005, Nature.

[30]  Jean L. Chang,et al.  Initial sequence of the chimpanzee genome and comparison with the human genome , 2005, Nature.

[31]  J. Bull,et al.  Distributions of Beneficial Fitness Effects in RNA , 2005, Genetics.

[32]  Thomas Ludwig,et al.  RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees , 2005, Bioinform..

[33]  Andrew G. Clark,et al.  The structure of human mitochondrial DNA variation , 1991, Journal of Molecular Evolution.

[34]  Carlos Bustamante,et al.  Population Genetics of Molecular Evolution , 2005 .

[35]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[36]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[37]  Rafael Sanjuán,et al.  The distribution of fitness effects caused by single-nucleotide substitutions in an RNA virus. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Ziheng Yang Estimating the pattern of nucleotide substitution , 1994, Journal of Molecular Evolution.

[39]  W. Stephan,et al.  Demography and natural selection have shaped genetic variation in Drosophila melanogaster: a multi-locus approach. , 2003, Genetics.

[40]  A. Eyre-Walker,et al.  Estimating the distribution of fitness effects from DNA sequence data: Implications for the molecular clock , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Ziheng Yang,et al.  Estimating the distribution of selection coefficients from phylogenetic data with applications to mitochondrial and viral DNA. , 2003, Molecular biology and evolution.

[42]  H. A. Orr,et al.  The distribution of fitness effects among beneficial mutations. , 2003, Genetics.

[43]  Li Wen-Hsiung MAINTENANCE OF GENETIC VARIABILITY UNDER THE JOINT EFFECT OF MUTATION, SELECTION AND RANDOM DRIFT* , 2003 .

[44]  Carlos D. Bustamante,et al.  Bayesian Analysis Suggests that Most Amino Acid Replacements in Drosophila Are Driven by Positive Selection , 2003, Journal of Molecular Evolution.

[45]  Daniel J. Gaffney,et al.  Quantifying the slightly deleterious mutation model of molecular evolution. , 2002, Molecular biology and evolution.

[46]  Carlos D. Bustamante,et al.  The cost of inbreeding in Arabidopsis , 2002, Nature.

[47]  R H Borts,et al.  Direct estimate of the mutation rate and the distribution of fitness effects in the yeast Saccharomyces cerevisiae. , 2001, Genetics.

[48]  Justin C. Fay,et al.  Positive and negative selection on the human genome. , 2001, Genetics.

[49]  N. Galtier,et al.  Maximum-likelihood phylogenetic analysis under a covarion-like model. , 2001, Molecular biology and evolution.

[50]  T. Jukes,et al.  The neutral theory of molecular evolution. , 2000, Genetics.

[51]  N. Barton,et al.  Genetic hitchhiking. , 2000, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[52]  H. Akashi,et al.  Within- and between-species DNA sequence variation and the 'footprint' of natural selection. , 1999, Gene.

[53]  Stephen C. Arnold,et al.  Kendall's advanced theory of statistics. Vol.2A: Classical inference and the linear model , 1999 .

[54]  M. Kendall,et al.  Classical inference and the linear model , 1999 .

[55]  R A Goldstein,et al.  Models of natural mutations including site heterogeneity , 1998, Proteins.

[56]  A. Halpern,et al.  Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies. , 1998, Molecular biology and evolution.

[57]  D. A. Kirby,et al.  Multi-locus selection and the structure of variation at the white gene of Drosophila melanogaster. , 1996, Genetics.

[58]  Z. Yang,et al.  Approximate methods for estimating the pattern of nucleotide substitution and the variation of substitution rates among sites. , 1996, Molecular biology and evolution.

[59]  W. Stephan Perturbation analysis of a two-locus model with directional selection and recombination , 1995, Journal of mathematical biology.

[60]  W. Hendrickson,et al.  Quantification of tertiary structural conservation despite primary sequence drift in the globin fold , 1994, Protein science : a publication of the Protein Society.

[61]  N. Goldman,et al.  Comparison of models for nucleotide substitution used in maximum-likelihood phylogenetic estimation. , 1994, Molecular biology and evolution.

[62]  D. Hartl,et al.  Population genetics of polymorphism and divergence. , 1992, Genetics.

[63]  T. Ohta THE NEARLY NEUTRAL THEORY OF MOLECULAR EVOLUTION , 1992 .

[64]  M. Bulmer The selection-mutation-drift theory of synonymous codon usage. , 1991, Genetics.

[65]  J. Gillespie The causes of molecular evolution , 1991 .

[66]  W. Ewens The neutral theory of molecular evolution , 1985 .

[67]  J. Gillespie MOLECULAR EVOLUTION OVER THE MUTATIONAL LANDSCAPE , 1984, Evolution; international journal of organic evolution.

[68]  M. Kimura The Neutral Theory of Molecular Evolution: Introduction , 1983 .

[69]  W. Li,et al.  Maintenance of Genetic Variability under the Joint Effect of Mutation, Selection and Random Drift. , 1978, Genetics.

[70]  J. M. Smith,et al.  The hitch-hiking effect of a favourable gene. , 1974, Genetical research.

[71]  T. Ohta Slightly Deleterious Mutant Substitutions in Evolution , 1973, Nature.

[72]  Kenneth C. W. Kammeyer,et al.  An introduction to population , 1974 .

[73]  M. Kimura,et al.  An introduction to population genetics theory , 1971 .

[74]  M. Kimura The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. , 1969, Genetics.

[75]  M. Kimura Evolutionary Rate at the Molecular Level , 1968, Nature.

[76]  W. G. Hill,et al.  The effect of linkage on limits to artificial selection. , 1966, Genetical research.

[77]  S. Wright,et al.  Evolution in Mendelian Populations. , 1931, Genetics.