BEAST: Bayesian evolutionary analysis by sampling trees

BackgroundThe evolutionary analysis of molecular sequence variation is a statistical enterprise. This is reflected in the increased use of probabilistic models for phylogenetic inference, multiple sequence alignment, and molecular population genetics. Here we present BEAST: a fast, flexible software architecture for Bayesian analysis of molecular sequences related by an evolutionary tree. A large number of popular stochastic models of sequence evolution are provided and tree-based models suitable for both within- and between-species sequence data are implemented.ResultsBEAST version 1.4.6 consists of 81000 lines of Java source code, 779 classes and 81 packages. It provides models for DNA and protein sequence evolution, highly parametric coalescent analysis, relaxed clock phylogenetics, non-contemporaneous sequence data, statistical alignment and a wide range of options for prior distributions. BEAST source code is object-oriented, modular in design and freely available at http://beast-mcmc.googlecode.com/ under the GNU LGPL license.ConclusionBEAST is a powerful and flexible evolutionary analysis package for molecular sequence variation. It also provides a resource for the further development of new models and statistical methods of evolutionary analysis.

[1]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[2]  L. Pauling,et al.  Evolutionary Divergence and Convergence in Proteins , 1965 .

[3]  A. Wilson,et al.  A molecular time scale for human evolution. , 1969, Proceedings of the National Academy of Sciences of the United States of America.

[4]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[5]  Thomas Uzzell,et al.  Fitting Discrete Probability Distributions to Evolutionary Events , 1971, Science.

[6]  C. J-F,et al.  THE COALESCENT , 1980 .

[7]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[8]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[9]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[10]  S. Tavaré,et al.  Sampling theory for neutral alleles in a varying environment. , 1994, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[11]  W. Li,et al.  Maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites. , 1995, Molecular biology and evolution.

[12]  Michael J. Sanderson,et al.  A Nonparametric Approach to Estimating Divergence Times in the Absence of Rate Constancy , 1997 .

[13]  M. Steel,et al.  General time-reversible distances with unequal rates across sites: mixing gamma and inverse Gaussian distributions with invariant sites. , 1997, Molecular phylogenetics and evolution.

[14]  A. Rambaut,et al.  Estimating divergence dates from molecular sequences. , 1998, Molecular biology and evolution.

[15]  H. Kishino,et al.  Estimating the rate of evolution of the rate of molecular evolution. , 1998, Molecular biology and evolution.

[16]  M. Beaumont Detecting population expansion and decline using microsatellites. , 1999, Genetics.

[17]  Andrew Rambaut,et al.  Estimating the rate of molecular evolution: incorporating non-contemporaneous sequences into maximum likelihood phylogenies , 2000, Bioinform..

[18]  Z. Yang,et al.  Estimation of primate speciation dates using local molecular clocks. , 2000, Molecular biology and evolution.

[19]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[20]  W. Bruno,et al.  Performance of a divergence time estimation method under a probabilistic model of rate evolution. , 2001, Molecular biology and evolution.

[21]  M. Sanderson Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach. , 2002, Molecular biology and evolution.

[22]  Effrey,et al.  Divergence Time and Evolutionary Rate Estimation with Multilocus Data , 2002 .

[23]  Alexei J Drummond,et al.  Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. , 2002, Genetics.

[24]  O. Pybus,et al.  The epidemiology and iatrogenic transmission of hepatitis C virus in Egypt: a Bayesian coalescent approach. , 2003, Molecular biology and evolution.

[25]  David J. Balding,et al.  Inferences from DNA data: population histories, evolutionary processes and forensic match probabilities , 2003 .

[26]  A. Rodrigo,et al.  Measurably evolving populations , 2003 .

[27]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[28]  Stephane Aris-Brosou,et al.  Bayesian models of episodic evolution support a late precambrian explosive diversification of the Metazoa. , 2003, Molecular biology and evolution.

[29]  S. Sampling theory for neutral alleles in a varying environment , 2003 .

[30]  Beth Shapiro,et al.  Rise and Fall of the Beringian Steppe Bison , 2004, Science.

[31]  P. Lemey,et al.  The Molecular Population Genetics of HIV-1 Group O , 2004, Genetics.

[32]  B. Rannala,et al.  Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. , 2004, Systematic biology.

[33]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[34]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[35]  O. Pybus,et al.  Bayesian coalescent inference of past population dynamics from molecular sequences. , 2005, Molecular biology and evolution.

[36]  István Miklós,et al.  Bayesian coestimation of phylogeny and sequence alignment , 2005, BMC Bioinformatics.

[37]  M. Suchard,et al.  Joint Bayesian estimation of alignment and phylogeny. , 2005, Systematic biology.

[38]  G. Serio,et al.  A new method for calculating evolutionary substitution rates , 2005, Journal of Molecular Evolution.

[39]  J. Felsenstein,et al.  An evolutionary model for maximum likelihood alignment of DNA sequences , 1991, Journal of Molecular Evolution.

[40]  Mary K. Kuhner,et al.  LAMARC 2.0: maximum likelihood and Bayesian estimation of population parameters , 2006, Bioinform..

[41]  Alexei J Drummond,et al.  Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. , 2006, Molecular biology and evolution.

[42]  Benjamin D. Redelings,et al.  BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny , 2006, Bioinform..

[43]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.