Genealogical Working Distributions for Bayesian Model Testing with Phylogenetic Uncertainty.

Marginal likelihood estimates to compare models using Bayes factors frequently accompany Bayesian phylogenetic inference. Approaches to estimate marginal likelihoods have garnered increased attention over the past decade. In particular, the introduction of path sampling (PS) and stepping-stone sampling (SS) into Bayesian phylogenetics has tremendously improved the accuracy of model selection. These sampling techniques are now used to evaluate complex evolutionary and population genetic models on empirical data sets, but considerable computational demands hamper their widespread adoption. Further, when very diffuse, but proper priors are specified for model parameters, numerical issues complicate the exploration of the priors, a necessary step in marginal likelihood estimation using PS or SS. To avoid such instabilities, generalized SS (GSS) has recently been proposed, introducing the concept of "working distributions" to facilitate--or shorten--the integration process that underlies marginal likelihood estimation. However, the need to fix the tree topology currently limits GSS in a coalescent-based framework. Here, we extend GSS by relaxing the fixed underlying tree topology assumption. To this purpose, we introduce a "working" distribution on the space of genealogies, which enables estimating marginal likelihoods while accommodating phylogenetic uncertainty. We propose two different "working" distributions that help GSS to outperform PS and SS in terms of accuracy when comparing demographic and evolutionary models applied to synthetic data and real-world examples. Further, we show that the use of very diffuse priors can lead to a considerable overestimation in marginal likelihood when using PS and SS, while still retrieving the correct marginal likelihood using both GSS approaches. The methods used in this article are available in BEAST, a powerful user-friendly software package to perform Bayesian evolutionary analyses.

[1]  M. Suchard,et al.  The early spread and epidemic ignition of HIV-1 in human populations , 2014, Science.

[2]  Ming-Hui Chen,et al.  Bayesian model selection in phylogenetics and genealogy- based population genetics , 2014 .

[3]  David L. Swofford,et al.  Variable tree topology stepping-stone marginal likelihood estimation , 2014 .

[4]  Guy Baele,et al.  The Genealogical Population Dynamics of HIV-1 in a Large Transmission Chain: Bridging within and among Host Evolutionary Rates , 2014, PLoS Comput. Biol..

[5]  Guy Baele,et al.  πBUSS: a parallel BEAST/BEAGLE utility for sequence simulation under complex evolutionary scenarios , 2013, BMC Bioinformatics.

[6]  Guy Baele,et al.  Bayesian evolutionary model testing in the phylogenomics era: matching model complexity with computational efficiency , 2013, Bioinform..

[7]  Mandev S. Gill,et al.  Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. , 2013, Molecular biology and evolution.

[8]  Wai Lok Sibon Li,et al.  Accurate model selection of relaxed molecular clocks in bayesian phylogenetics. , 2012, Molecular biology and evolution.

[9]  Guy Baele,et al.  Make the most of your samples: Bayes factor estimators for high-dimensional models of sequence evolution , 2013, BMC Bioinformatics.

[10]  M. Suchard,et al.  Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. , 2012, Molecular biology and evolution.

[11]  Luca Tardella,et al.  Improved Harmonic Mean Estimator for Phylogenetic Model Evidence , 2012, J. Comput. Biol..

[12]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[13]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[14]  Rebecca R. Gray,et al.  Testing spatiotemporal hypothesis of bacterial evolution using methicillin-resistant Staphylococcus aureus ST239 genome-wide data within a bayesian framework. , 2011, Molecular biology and evolution.

[15]  Ming-Hui Chen,et al.  Improving marginal likelihood estimation for Bayesian phylogenetic model selection. , 2011, Systematic biology.

[16]  Ming-Hui Chen,et al.  Choosing among Partition Models in Bayesian Phylogenetics , 2010, Molecular biology and evolution.

[17]  Steven Wolinsky,et al.  Direct evidence of extensive diversity of HIV-1 in Kinshasa by 1960 , 2008, Nature.

[18]  A. Pettitt,et al.  Marginal likelihood estimation via power posteriors , 2008 .

[19]  M. Suchard,et al.  Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. , 2008, Molecular biology and evolution.

[20]  D. Bryant,et al.  A general comparison of relaxed molecular clock models. , 2007, Molecular biology and evolution.

[21]  H. Philippe,et al.  Assessing site-interdependent phylogenetic models of sequence evolution. , 2006, Molecular biology and evolution.

[22]  H. Philippe,et al.  Computing Bayes factors using thermodynamic integration. , 2006, Systematic biology.

[23]  Alexei J Drummond,et al.  Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. , 2006, Molecular biology and evolution.

[24]  Mike Steel,et al.  Should phylogenetic models be trying to "fit an elephant"? , 2005, Trends in genetics : TIG.

[25]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[26]  S. Sheather Density Estimation , 2004 .

[27]  Jonathan P. Bollback,et al.  Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology , 2001, Science.

[28]  M. Suchard,et al.  Bayesian selection of continuous-time Markov chain evolutionary models. , 2001, Molecular biology and evolution.

[29]  Xiao-Li Meng,et al.  Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling , 1998 .

[30]  B. Korber,et al.  An African HIV-1 sequence from 1959 and implications for the origin of the epidemic , 1998, Nature.

[31]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.

[32]  Z. Yang,et al.  Among-site rate variation and its impact on phylogenetic analyses. , 1996, Trends in ecology & evolution.

[33]  R J Little,et al.  Bayesian hypothesis testing of four-taxon topologies using molecular sequence data. , 1996, Biometrics.

[34]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[35]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[36]  H. Jeffreys Some Tests of Significance, Treated by the Theory of Probability , 1935, Mathematical Proceedings of the Cambridge Philosophical Society.