A dirichlet process prior for estimating lineage-specific substitution rates.

We introduce a new model for relaxing the assumption of a strict molecular clock for use as a prior in Bayesian methods for divergence time estimation. Lineage-specific rates of substitution are modeled using a Dirichlet process prior (DPP), a type of stochastic process that assumes lineages of a phylogenetic tree are distributed into distinct rate classes. Under the Dirichlet process, the number of rate classes, assignment of branches to rate classes, and the rate value associated with each class are treated as random variables. The performance of this model was evaluated by conducting analyses on data sets simulated under a range of different models. We compared the Dirichlet process model with two alternative models for rate variation: the strict molecular clock and the independent rates model. Our results show that divergence time estimation under the DPP provides robust estimates of node ages and branch rates without significantly reducing power. Further analyses were conducted on a biological data set, and we provide examples of ways to summarize Markov chain Monte Carlo samples under this model.

[1]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[2]  Michael J. Sanderson,et al.  A Nonparametric Approach to Estimating Divergence Times in the Absence of Rate Constancy , 1997 .

[3]  H. Philippe,et al.  A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. , 2004, Molecular biology and evolution.

[4]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[5]  Tanja Stadler,et al.  Simulating trees with a fixed number of extant species. , 2011, Systematic biology.

[6]  J. Huelsenbeck,et al.  A compound poisson process for relaxing the molecular clock. , 2000, Genetics.

[7]  Z. Yang,et al.  Estimation of primate speciation dates using local molecular clocks. , 2000, Molecular biology and evolution.

[8]  Hirohisa Kishino,et al.  Divergence time and evolutionary rate estimation with multilocus data. , 2002, Systematic biology.

[9]  María José García-Zattera,et al.  A Dirichlet process mixture model for the analysis of correlated binary responses , 2007, Comput. Stat. Data Anal..

[10]  H Kishino,et al.  Converting distance to time: application to human evolution. , 1990, Methods in enzymology.

[11]  J. Huelsenbeck,et al.  Inference of Population Structure Under a Dirichlet Process Model , 2007, Genetics.

[12]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[13]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[14]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[15]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[16]  Ziheng Yang,et al.  Inferring speciation times under an episodic molecular clock. , 2007, Systematic biology.

[17]  Michael A. West,et al.  Hierarchical priors and mixture models, with applications in regression and density estimation , 2006 .

[18]  M. Suchard,et al.  Bayesian random local clocks, or one rate to rule them all , 2010, BMC Biology.

[19]  Jun S. Liu Nonparametric hierarchical Bayes via sequential imputations , 1996 .

[20]  Seraina Klopfstein,et al.  A Total-Evidence Approach to Dating with Fossils, Applied to the Early Radiation of the Hymenoptera , 2012, Systematic biology.

[21]  E. Susko Improved least squares topology testing and estimation. , 2011, Systematic biology.

[22]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[23]  A. Rambaut TRACER v1.5 , 2009 .

[24]  Robert G. Gallager,et al.  Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.

[25]  D. Bryant,et al.  A general comparison of relaxed molecular clock models. , 2007, Molecular biology and evolution.

[26]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[27]  Marc A Suchard,et al.  A nonparametric method for accommodating and testing across-site rate variation. , 2007, Systematic biology.

[28]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[29]  M. Suchard,et al.  Bayesian selection of continuous-time Markov chain evolutionary models. , 2001, Molecular biology and evolution.

[30]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[31]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.

[32]  G. Yule,et al.  A Mathematical Theory of Evolution Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[33]  B S Weir,et al.  Testing for equality of evolutionary rates. , 1992, Genetics.

[34]  Ziheng Yang,et al.  Comparison of likelihood and Bayesian methods for estimating divergence times using multiple gene Loci and calibration points, with application to a radiation of cute-looking mouse lemur species. , 2003, Systematic biology.

[35]  D. Kendall On the Generalized "Birth-and-Death" Process , 1948 .

[36]  Ming-Hui Chen,et al.  Choosing among Partition Models in Bayesian Phylogenetics , 2010, Molecular biology and evolution.

[37]  G. Yule,et al.  A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[38]  Edward A. Wasil Aspects of Uncertainty. A Tribute to D. V. Lindley , 1995 .

[39]  Wai Lok Sibon Li,et al.  Model Averaging and Bayes Factor Calculation of Relaxed Molecular Clocks in Bayesian Phylogenetics , 2011, Molecular biology and evolution.

[40]  R. Dorazio On selecting a prior for the precision parameter of Dirichlet process mixture models , 2009 .

[41]  Jeet Sukumaran,et al.  DendroPy: a Python library for phylogenetic computing , 2010, Bioinform..

[42]  Ziheng Yang,et al.  Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. , 2006, Molecular biology and evolution.

[43]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[44]  H. Kishino,et al.  Estimation of branching dates among primates by molecular clocks of nuclear DNA which slowed down in Hominoidea , 1989 .

[45]  H. Philippe,et al.  Computing Bayes factors using thermodynamic integration. , 2006, Systematic biology.

[46]  B. Rannala,et al.  Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference , 1996, Journal of Molecular Evolution.

[47]  Virginia Held,et al.  Birth and Death , 1989, Ethics.

[48]  David Bryant,et al.  Continuous and tractable models for the variation of evolutionary rates. , 2006, Mathematical biosciences.

[49]  B. Larget,et al.  Bayesian estimation of concordance among gene trees. , 2006, Molecular biology and evolution.

[50]  Ming-Hui Chen,et al.  Improving marginal likelihood estimation for Bayesian phylogenetic model selection. , 2011, Systematic biology.

[51]  Tanja Stadler,et al.  Sampling trees from evolutionary models. , 2010, Systematic biology.

[52]  Michael I. Jordan,et al.  Nonparametric empirical Bayes for the Dirichlet process mixture model , 2006, Stat. Comput..

[53]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[54]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[55]  H. Kishino,et al.  Estimating the rate of evolution of the rate of molecular evolution. , 1998, Molecular biology and evolution.

[56]  Alex Dornburg,et al.  Relaxed clocks and inferences of heterogeneous patterns of nucleotide substitution and divergence time estimates across whales and dolphins (Mammalia: Cetacea). , 2012, Molecular biology and evolution.

[57]  Simon Whelan,et al.  Statistical Methods in Molecular Evolution , 2005 .

[58]  D. Dunson Nonparametric Bayes local partition models for random effects. , 2009, Biometrika.

[59]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[60]  W. Bruno,et al.  Performance of a divergence time estimation method under a probabilistic model of rate evolution. , 2001, Molecular biology and evolution.

[61]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[62]  M. Sanderson Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach. , 2002, Molecular biology and evolution.

[63]  S. MacEachern,et al.  Bayesian Nonparametric Spatial Modeling With Dirichlet Process Mixing , 2005 .

[64]  B. Weir,et al.  Detecting substitution-rate heterogeneity among regions of a nucleotide sequence. , 1994, Molecular biology and evolution.

[65]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[66]  Tanja Gernhard,et al.  The conditioned reconstructed process. , 2008, Journal of theoretical biology.

[67]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[68]  John P Huelsenbeck,et al.  A Dirichlet process model for detecting positive selection in protein-coding DNA sequences. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[69]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[70]  H. Kishino,et al.  Estimation of Divergence Times from Molecular Sequence Data , 2005 .

[71]  Dan Gusfield,et al.  Partition-distance: A problem and class of perfect graphs arising in clustering , 2002, Inf. Process. Lett..