Fully Bayesian tests of neutrality using genealogical summary statistics

BackgroundMany data summary statistics have been developed to detect departures from neutral expectations of evolutionary models. However questions about the neutrality of the evolution of genetic loci within natural populations remain difficult to assess. One critical cause of this difficulty is that most methods for testing neutrality make simplifying assumptions simultaneously about the mutational model and the population size model. Consequentially, rejecting the null hypothesis of neutrality under these methods could result from violations of either or both assumptions, making interpretation troublesome.ResultsHere we harness posterior predictive simulation to exploit summary statistics of both the data and model parameters to test the goodness-of-fit of standard models of evolution. We apply the method to test the selective neutrality of molecular evolution in non-recombining gene genealogies and we demonstrate the utility of our method on four real data sets, identifying significant departures of neutrality in human influenza A virus, even after controlling for variation in population size.ConclusionImportantly, by employing a full model-based Bayesian analysis, our method separates the effects of demography from the effects of selection. The method also allows multiple summary statistics to be used in concert, thus potentially increasing sensitivity. Furthermore, our method remains useful in situations where analytical expectations and variances of summary statistics are not available. This aspect has great potential for the analysis of temporally spaced data, an expanding area previously ignored for limited availability of theory and methods.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  George E. P. Box,et al.  Sampling and Bayes' inference in scientific modelling and robustness , 1980 .

[3]  J. Sinsheimer,et al.  Evolutionary Similarity Among Genes When Data Are Sparse , 2007 .

[4]  Adam Eyre-Walker,et al.  Changing effective population size and the McDonald-Kreitman test. , 2002, Genetics.

[5]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[6]  W. Fitch,et al.  Long term trends in the evolution of H(3) HA1 human influenza type A. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[7]  G. McVean,et al.  The effects of Hill-Robertson interference between weakly selected mutations on patterns of molecular evolution and variation. , 2000, Genetics.

[8]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[9]  W. Stephan,et al.  Inferring the Demographic History and Rate of Adaptive Substitution in Drosophila , 2006, PLoS genetics.

[10]  M. Suchard,et al.  Evolutionary Similarity Among Genes , 2003 .

[11]  M. Slatkin,et al.  Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. , 1991, Genetics.

[12]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[13]  Justin C. Fay,et al.  Hitchhiking under positive Darwinian selection. , 2000, Genetics.

[14]  C. J-F,et al.  THE COALESCENT , 1980 .

[15]  Xiao-Li Meng,et al.  Posterior Predictive $p$-Values , 1994 .

[16]  Jonathan P. Bollback,et al.  Bayesian model adequacy and choice in phylogenetics. , 2002, Molecular biology and evolution.

[17]  E. Wiley Phylogenetics: The Theory and Practice of Phylogenetic Systematics , 1981 .

[18]  D. Rubin Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician , 1984 .

[19]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[20]  Kevin R. Thornton,et al.  Multilocus patterns of nucleotide variability and the demographic and selection history of Drosophila melanogaster populations. , 2005, Genome research.

[21]  S. Williamson,et al.  The genealogy of a sequence subject to purifying selection at multiple sites. , 2002, Molecular biology and evolution.

[22]  Peter Green,et al.  Highly Structured Stochastic Systems , 2003 .

[23]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[24]  M. Kreitman,et al.  Adaptive protein evolution at the Adh locus in Drosophila , 1991, Nature.

[25]  Yun-Xin Fu,et al.  New statistical tests of neutrality for DNA samples from a population. , 1996, Genetics.

[26]  Y. Fu,et al.  Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. , 1997, Genetics.

[27]  Daniel J. Gaffney,et al.  Quantifying the slightly deleterious mutation model of molecular evolution. , 2002, Molecular biology and evolution.

[28]  Daniel J. Wilson,et al.  Evolution of the Human Immunodeficiency Virus Envelope Gene Is Dominated by Purifying Selection , 2006, Genetics.

[29]  Alexei J Drummond,et al.  Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. , 2002, Genetics.

[30]  Z. Yang,et al.  Preponderance of slightly deleterious polymorphism in mitochondrial DNA: nonsynonymous/synonymous rate ratio is much higher within species than between species. , 1998, Molecular biology and evolution.

[31]  D. Aldous Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today , 2001 .

[32]  W. Li,et al.  Statistical tests of neutrality of mutations. , 1993, Genetics.

[33]  F J Ayala,et al.  Evidence for positive selection in the superoxide dismutase (Sod) region of Drosophila melanogaster. , 1994, Genetics.

[34]  D. Weinreich,et al.  The age of nonsynonymous and synonymous mutations in animal mtDNA and implications for the mildly deleterious theory. , 1999, Genetics.

[35]  R. Nielsen,et al.  Detecting Positively Selected Amino Acid Sites Using Posterior Predictive P-Values , 2001, Pacific Symposium on Biocomputing.

[36]  Molly Przeworski,et al.  The signature of positive selection at randomly chosen loci. , 2002, Genetics.

[37]  Marc A Suchard,et al.  Fast, accurate and simulation-free stochastic mapping , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[38]  D. Schaid Mathematical and Statistical Methods for Genetic Analysis , 1999 .

[39]  M. Suchard,et al.  Hierarchical phylogenetic models for analyzing multipartite sequence data. , 2003, Systematic biology.

[40]  J K Kelly,et al.  A test of neutrality based on interlocus associations. , 1997, Genetics.

[41]  N. Ferguson,et al.  Ecological and immunological determinants of influenza evolution , 2003, Nature.

[42]  O. Pybus,et al.  EVOLUTION OF THE HIV-1 ENVELOPE GENE IS DOMINATED BY PURIFYING SELECTION , 2006 .

[43]  Statistica Sinica , .

[44]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[45]  Xiao-Li Meng,et al.  POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .

[46]  Hideki Innan,et al.  Statistical Tests of the Coalescent Model Based on the Haplotype Frequency Distribution and the Number of Segregating Sites , 2005, Genetics.

[47]  F. Tajima Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. , 1989, Genetics.

[48]  A. Rodrigo,et al.  Measurably evolving populations , 2003 .

[49]  E. Holmes,et al.  Selection-driven evolution of emergent dengue virus. , 2003, Molecular biology and evolution.

[50]  M. Slatkin,et al.  SEARCHING FOR EVOLUTIONARY PATTERNS IN THE SHAPE OF A PHYLOGENETIC TREE , 1993, Evolution; international journal of organic evolution.

[51]  J. Rozas,et al.  Statistical properties of new neutrality tests against population growth. , 2002, Molecular biology and evolution.

[52]  Anne-Mieke Vandamme,et al.  Molecular Evolution and Circulation Patterns of Human Respiratory Syncytial Virus Subgroup A: Positively Selected Sites in the Attachment G Glycoprotein , 2004, Journal of Virology.

[53]  S. Mousset,et al.  A test of neutrality and constant population size based on the mismatch distribution. , 2004, Molecular biology and evolution.

[54]  M. Steel,et al.  Distributions of cherries for two models of trees. , 2000, Mathematical biosciences.

[55]  B. Shapiro,et al.  Dynamics of Pleistocene Population Extinctions in Beringian Brown Bears , 2002, Science.

[56]  H. Innan Modified Hudson–Kreitman–Aguadé Test and Two-Dimensional Evaluation of Neutrality Tests , 2006, Genetics.

[57]  D. H. Colless,et al.  Phylogenetics: The Theory and Practice of Phylogenetic Systematics. , 1982 .

[58]  M. Suchard,et al.  Bayesian selection of continuous-time Markov chain evolutionary models. , 2001, Molecular biology and evolution.

[59]  O. Pybus,et al.  Unifying the Epidemiological and Evolutionary Dynamics of Pathogens , 2004, Science.

[60]  Matthew W. Hahn,et al.  Toward a Selection Theory of Molecular Evolution , 2008, Evolution; international journal of organic evolution.

[61]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[62]  Colin N. Dewey,et al.  Population Genomics: Whole-Genome Analysis of Polymorphism and Divergence in Drosophila simulans , 2007, PLoS biology.

[63]  C. Strobeck,et al.  Average number of nucleotide differences in a sample from a single subpopulation: a test for population subdivision. , 1987, Genetics.