Model parameterization, prior distributions, and the general time-reversible model in Bayesian phylogenetics.

Bayesian phylogenetic methods require the selection of prior probability distributions for all parameters of the model of evolution. These distributions allow one to incorporate prior information into a Bayesian analysis, but even in the absence of meaningful prior information, a prior distribution must be chosen. In such situations, researchers typically seek to choose a prior that will have little effect on the posterior estimates produced by an analysis, allowing the data to dominate. Sometimes a prior that is uniform (assigning equal prior probability density to all points within some range) is chosen for this purpose. In reality, the appropriate prior depends on the parameterization chosen for the model of evolution, a choice that is largely arbitrary. There is an extensive Bayesian literature on appropriate prior choice, and it has long been appreciated that there are parameterizations for which uniform priors can have a strong influence on posterior estimates. We here discuss the relationship between model parameterization and prior specification, using the general time-reversible model of nucleotide evolution as an example. We present Bayesian analyses of 10 simulated data sets obtained using a variety of prior distributions and parameterizations of the general time-reversible model. Uniform priors can produce biased parameter estimates under realistic conditions, and a variety of alternative priors avoid this bias.

[1]  M. Suchard,et al.  Testing a molecular clock without an outgroup: derivations of induced priors on branch-length restrictions in a Bayesian framework. , 2003, Systematic biology.

[2]  Derrick J. Zwickl,et al.  Phylogenetic relationships of the dwarf boas and a comparison of Bayesian and bootstrap measures of phylogenetic support. , 2002, Molecular phylogenetics and evolution.

[3]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[4]  Kunkel Jm,et al.  Spontaneous subclavain vein thrombosis: a successful combined approach of local thrombolytic therapy followed by first rib resection. , 1989 .

[5]  A. Zharkikh Estimation of evolutionary distances between nucleotide sequences , 1994, Journal of Molecular Evolution.

[6]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[7]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[8]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[9]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[10]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[11]  H. Akaike A new look at the Bayes procedure , 1978 .

[12]  J. Q. Smith,et al.  1. Bayesian Statistics 4 , 1993 .

[13]  Nozer D. Singpurwalla,et al.  Non-informative priors do not exist A dialogue with José M. Bernardo , 1997 .

[14]  Marco Bernardo,et al.  Noninformative Priors Do Not Exist: A Discussion with Jos , 1997 .

[15]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[16]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[17]  Hani Doss,et al.  Phylogenetic Tree Construction using Markov Chain , 1996 .

[18]  G. Serio,et al.  A new method for calculating evolutionary substitution rates , 2005, Journal of Molecular Evolution.

[19]  B. Rannala Identi(cid:142)ability of Parameters in MCMC Bayesian Inference of Phylogeny , 2002 .

[20]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[21]  Sanford Weisberg,et al.  Computing science and statistics : proceedings of the 30th Symposium on the Interface, Minneapolis, Minnesota, May 13-16, 1998 : dimension reduction, computational complexity and information , 1998 .

[22]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .

[23]  D. Ord,et al.  PAUP:Phylogenetic analysis using parsi-mony , 1993 .

[24]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[25]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[26]  J. Huelsenbeck,et al.  MRBAYES : Bayesian inference of phylogeny , 2001 .

[27]  L. Wasserman,et al.  The Selection of Prior Distributions by Formal Rules , 1996 .

[28]  P. Cassette,et al.  Development of a Reference , 1983 .

[29]  Luke Tierney Markov Chain Monte Carlo Algorithms , 2006 .