Tail paradox, partial identifiability, and influential priors in Bayesian branch length inference.

Recent studies have observed that Bayesian analyses of sequence data sets using the program MrBayes sometimes generate extremely large branch lengths, with posterior credibility intervals for the tree length (sum of branch lengths) excluding the maximum likelihood estimates. Suggested explanations for this phenomenon include the existence of multiple local peaks in the posterior, lack of convergence of the chain in the tail of the posterior, mixing problems, and misspecified priors on branch lengths. Here, we analyze the behavior of Bayesian Markov chain Monte Carlo algorithms when the chain is in the tail of the posterior distribution and note that all these phenomena can occur. In Bayesian phylogenetics, the likelihood function approaches a constant instead of zero when the branch lengths increase to infinity. The flat tail of the likelihood can cause poor mixing and undue influence of the prior. We suggest that the main cause of the extreme branch length estimates produced in many Bayesian analyses is the poor choice of a default prior on branch lengths in current Bayesian phylogenetic programs. The default prior in MrBayes assigns independent and identical distributions to branch lengths, imposing strong (and unreasonable) assumptions about the tree length. The problem is exacerbated by the strong correlation between the branch lengths and parameters in models of variable rates among sites or among site partitions. To resolve the problem, we suggest two multivariate priors for the branch lengths (called compound Dirichlet priors) that are fairly diffuse and demonstrate their utility in the special case of branch length estimation on a star phylogeny. Our analysis highlights the need for careful thought in the specification of high-dimensional priors in Bayesian analyses.

[1]  Marco A. R. Ferreira,et al.  Bayesian analysis of elapsed times in continuous‐time Markov chains , 2008 .

[2]  K. Holsinger,et al.  Polytomies and Bayesian phylogenetic inference. , 2005, Systematic biology.

[3]  Ziheng Yang,et al.  Branch-length prior influences Bayesian posterior probability of phylogeny. , 2005, Systematic biology.

[4]  D. Hillis,et al.  All-Male Asexuality: Origin and Maintenance of Androgenesis in the Asian Clam Corbicula , 2008, Evolution; international journal of organic evolution.

[5]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[6]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[7]  Eric Vigoda,et al.  Phylogenetic MCMC Algorithms Are Misleading on Mixtures of Trees , 2005, Science.

[8]  R. Tweedie,et al.  Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms , 1996 .

[9]  H. Munro,et al.  Mammalian protein metabolism. Volume 4. , 1964 .

[10]  D. Marshall,et al.  Cryptic failure of partitioned Bayesian phylogenetic analyses: lost in the land of long trees. , 2010, Systematic biology.

[11]  Jeremy M. Brown,et al.  The importance of data partitioning and the utility of Bayes factors in Bayesian phylogenetics. , 2007, Systematic biology.

[12]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[13]  R. Tweedie,et al.  Rates of convergence of the Hastings and Metropolis algorithms , 1996 .

[14]  M. Suchard,et al.  Bayesian selection of continuous-time Markov chain evolutionary models. , 2001, Molecular biology and evolution.

[15]  Jeremy M. Brown,et al.  When trees grow too long: investigating the causes of highly inaccurate bayesian branch-length estimates. , 2010, Systematic biology.

[16]  Ziheng Yang,et al.  Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. , 2006, Molecular biology and evolution.

[17]  Alexei J. Drummond,et al.  Calibrated Tree Priors for Relaxed Phylogenetics and Divergence Time Estimation , 2011, Systematic biology.

[18]  A. G. Pedersen,et al.  Computational Molecular Evolution , 2013 .

[19]  H. Kishino,et al.  Estimating the rate of evolution of the rate of molecular evolution. , 1998, Molecular biology and evolution.

[20]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[21]  S. Jeffery Evolution of Protein Molecules , 1979 .

[22]  A. Leaché,et al.  Phylogeny, divergence times and species limits of spiny lizards (Sceloporus magister species group) in western North American deserts and Baja California , 2007, Molecular ecology.

[23]  N. Takahata,et al.  Recent African origin of modern humans revealed by complete sequences of hominoid mitochondrial DNAs. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[24]  J. S. Rogers,et al.  Multiple local maxima for likelihoods of phylogenetic trees: a simulation study. , 1999, Molecular biology and evolution.

[25]  J. Keogh,et al.  Ancient phylogeographic divergence in southeastern Australia among populations of the widespread common froglet, Crinia signifera. , 2008, Molecular phylogenetics and evolution.