When trees grow too long: investigating the causes of highly inaccurate bayesian branch-length estimates.

A surprising number of recent Bayesian phylogenetic analyses contain branch-length estimates that are several orders of magnitude longer than corresponding maximum-likelihood estimates. The levels of divergence implied by such branch lengths are unreasonable for studies using biological data and are known to be false for studies using simulated data. We conducted additional Bayesian analyses and studied approximate-posterior surfaces to investigate the causes underlying these large errors. We manipulated the starting parameter values of the Markov chain Monte Carlo (MCMC) analyses, the moves used by the MCMC analyses, and the prior-probability distribution on branch lengths. We demonstrate that inaccurate branch-length estimates result from either 1) poor mixing of MCMC chains or 2) posterior distributions with excessive weight at long tree lengths. Both effects are caused by a rapid increase in the volume of branch-length space as branches become longer. In the former case, both an MCMC move that scales all branch lengths in the tree simultaneously and the use of overdispersed starting branch lengths allow the chain to accurately sample the posterior distribution and should be used in Bayesian analyses of phylogeny. In the latter case, branch-length priors can have strong effects on resulting inferences and should be carefully chosen to reflect biological expectations. We provide a formula to calculate an exponential rate parameter for the branch-length prior that should eliminate inference of biased branch lengths in many cases. In any phylogenetic analysis, the biological plausibility of branch-length output must be carefully considered.

[1]  Ziheng Yang Empirical evaluation of a prior for Bayesian phylogenetic inference , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[2]  H. Shaffer,et al.  Species limits and phylogeography of North American cricket frogs (Acris: Hylidae). , 2008, Molecular phylogenetics and evolution.

[3]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[4]  D. Hillis,et al.  All-Male Asexuality: Origin and Maintenance of Androgenesis in the Asian Clam Corbicula , 2008, Evolution; international journal of organic evolution.

[5]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[6]  C. Simon,et al.  Accurate branch length estimation in partitioned Bayesian analyses requires accommodation of among-partition rate variation and attention to branch length priors. , 2006, Systematic biology.

[7]  A. Lemmon,et al.  GEOLOGICAL AND CLIMATIC FORCES DRIVING SPECIATION IN THE CONTINENTALLY DISTRIBUTED TRILLING CHORUS FROGS (PSEUDACRIS) , 2007, Evolution; international journal of organic evolution.

[8]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[9]  Bret Larget,et al.  Introduction to Markov Chain Monte Carlo Methods in Molecular Evolution , 2005 .

[10]  Ziheng Yang,et al.  Fair-balance paradox, star-tree paradox, and Bayesian phylogenetics. , 2007, Molecular biology and evolution.

[11]  K. Holsinger,et al.  Polytomies and Bayesian phylogenetic inference. , 2005, Systematic biology.

[12]  Ziheng Yang,et al.  Bayesian inference in molecular phylogenetics , 2007, Mathematics of Evolution and Phylogeny.

[13]  J. T. Collins,et al.  Phylogeny-based delimitation of species boundaries and contact zones in the trilling chorus frogs (Pseudacris). , 2007, Molecular phylogenetics and evolution.

[14]  H. Kishino,et al.  Estimating the rate of evolution of the rate of molecular evolution. , 1998, Molecular biology and evolution.

[15]  D. Swofford PAUP*: Phylogenetic analysis using parsimony (*and other methods), Version 4.0b10 , 2002 .

[16]  A. Leaché,et al.  Phylogeny, divergence times and species limits of spiny lizards (Sceloporus magister species group) in western North American deserts and Baja California , 2007, Molecular ecology.

[17]  J. Keogh,et al.  Ancient phylogeographic divergence in southeastern Australia among populations of the widespread common froglet, Crinia signifera. , 2008, Molecular phylogenetics and evolution.

[18]  D. Marshall,et al.  Cryptic failure of partitioned Bayesian phylogenetic analyses: lost in the land of long trees. , 2010, Systematic biology.

[19]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[20]  Simon Whelan,et al.  Statistical Methods in Molecular Evolution , 2005 .

[21]  Ziheng Yang,et al.  Branch-length prior influences Bayesian posterior probability of phylogeny. , 2005, Systematic biology.

[22]  Jeremy M. Brown,et al.  The importance of data partitioning and the utility of Bayes factors in Bayesian phylogenetics. , 2007, Systematic biology.

[23]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[24]  H. Shaffer,et al.  Conflicting mitochondrial and nuclear phylogenies for the widely disjunct Emys (Testudines: Emydidae) species complex, and what they tell us about biogeography and hybridization. , 2009, Systematic biology.

[25]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[26]  Olivier Gascuel,et al.  Mathematics of Evolution and Phylogeny , 2005 .