Clock-constrained tree proposal operators in Bayesian phylogenetic inference

Bayesian Markov chain Monte Carlo (MCMC) has become one of the principle methods of performing inference of phylogenetic trees. The MCMC algorithm requires the definition of a transition kernel over the state space, which depends on tree proposal operators. So, the precise form of these operators has a large impact on the computational efficiency of the algorithm. In this paper we investigate the efficiency of different tree proposals specialized on clock-constrained phylogenetic trees. Two new operators are developed and their efficiency is compared to five standard operators. Each of the seven operators is tested individually on three synthetic datasets and eleven real datasets. In addition, the single operators are compared to different mixtures of operators. Results show that our new operators perform better than their standard counterparts, but no operator alone achieved a high efficiency on the full panel of data sets tested. Finally, our new proposed mixture using all operators together provides better performance than current techniques.

[1]  D. Balding,et al.  Genealogical inference from microsatellite data. , 1998, Genetics.

[2]  M A Newton,et al.  Bayesian Phylogenetic Inference via Markov Chain Monte Carlo Methods , 1999, Biometrics.

[3]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[4]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[5]  Mary K. Kuhner,et al.  LAMARC 2.0: maximum likelihood and Bayesian estimation of population parameters , 2006, Bioinform..

[6]  Peter Beerli,et al.  Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  J. Huelsenbeck,et al.  Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. , 2008, Systematic biology.

[8]  B. Carlin,et al.  Diagnostics: A Comparative Review , 2022 .

[9]  David J. Balding,et al.  Inferences from DNA data: population histories, evolutionary processes and forensic match probabilities , 2003 .

[10]  Anthony Quinn,et al.  A data-driven Bayesian sampling scheme for unsupervised image segmentation , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[11]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[12]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[13]  Alexei J Drummond,et al.  Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. , 2002, Genetics.

[14]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[15]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[16]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[17]  Jonathan P. Bollback,et al.  Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology , 2001, Science.

[18]  Hani Doss,et al.  Phylogenetic Tree Construction Using Markov Chain Monte Carlo , 2000 .

[19]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[20]  M. Newton,et al.  Phylogenetic Inference for Binary Data on Dendograms Using Markov Chain Monte Carlo , 1997 .

[21]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .