Bayesian and maximum likelihood phylogenetic analyses of protein sequence data under relative branch-length differences and model violation

BackgroundBayesian phylogenetic inference holds promise as an alternative to maximum likelihood, particularly for large molecular-sequence data sets. We have investigated the performance of Bayesian inference with empirical and simulated protein-sequence data under conditions of relative branch-length differences and model violation.ResultsWith empirical protein-sequence data, Bayesian posterior probabilities provide more-generous estimates of subtree reliability than does the nonparametric bootstrap combined with maximum likelihood inference, reaching 100% posterior probability at bootstrap proportions around 80%. With simulated 7-taxon protein-sequence datasets, Bayesian posterior probabilities are somewhat more generous than bootstrap proportions, but do not saturate. Compared with likelihood, Bayesian phylogenetic inference can be as or more robust to relative branch-length differences for datasets of this size, particularly when among-sites rate variation is modeled using a gamma distribution. When the (known) correct model was used to infer trees, Bayesian inference recovered the (known) correct tree in 100% of instances in which one or two branches were up to 20-fold longer than the others. At ratios more extreme than 20-fold, topological accuracy of reconstruction degraded only slowly when only one branch was of relatively greater length, but more rapidly when there were two such branches. Under an incorrect model of sequence change, inaccurate trees were sometimes observed at less extreme branch-length ratios, and (particularly for trees with single long branches) such trees tended to be more inaccurate. The effect of model violation on accuracy of reconstruction for trees with two long branches was more variable, but gamma-corrected Bayesian inference nonetheless yielded more-accurate trees than did either maximum likelihood or uncorrected Bayesian inference across the range of conditions we examined. Assuming an exponential Bayesian prior on branch lengths did not improve, and under certain extreme conditions significantly diminished, performance. The two topology-comparison metrics we employed, edit distance and Robinson-Foulds symmetric distance, yielded different but highly complementary measures of performance.ConclusionsOur results demonstrate that Bayesian inference can be relatively robust against biologically reasonable levels of relative branch-length differences and model violation, and thus may provide a promising alternative to maximum likelihood for inference of phylogenetic trees from protein-sequence data.

[1]  M. Steel,et al.  Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees , 2001 .

[2]  W. Doolittle,et al.  Prokaryotic evolution in light of gene transfer. , 2002, Molecular biology and evolution.

[3]  Elliott Sober,et al.  The contest between parsimony and likelihood. , 2004, Systematic biology.

[4]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.

[5]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[6]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[7]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[8]  B. Efron,et al.  Bootstrap confidence levels for phylogenetic trees. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Masatoshi Nei,et al.  Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  M. Steel,et al.  Distributions of Tree Comparison Metrics—Some New Results , 1993 .

[11]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[12]  János Podani Simulation of Random Dendrograms and Comparison Tests: Some Comments , 2000, J. Classif..

[13]  T. Britton,et al.  Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylogenetics. , 2003, Systematic biology.

[14]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[15]  D. Winkler,et al.  Phylogeny of the tree swallow genus, Tachycineta (Aves: Hirundinidae), by Bayesian analysis of mitochondrial DNA sequences. , 2002, Molecular phylogenetics and evolution.

[16]  Z. Yang,et al.  Models of amino acid substitution and applications to mitochondrial protein evolution. , 1998, Molecular biology and evolution.

[17]  P. Waddell,et al.  A phylogenetic foundation for comparative mammalian genomics. , 2001, Genome informatics. International Conference on Genome Informatics.

[18]  Jonathan P. Bollback,et al.  Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology , 2001, Science.

[19]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[20]  J. S. Rogers,et al.  Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. , 2001, Systematic biology.

[21]  R. Zardoya,et al.  Origin of plant glycerol transporters by horizontal gene transfer and functional recruitment , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[22]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[23]  M. Klotz,et al.  The molecular evolution of catalatic hydroperoxidases: evidence for multiple lateral transfer of genes between prokaryota and from bacteria into eukaryota. , 2003, Molecular biology and evolution.

[24]  Bernard M. E. Moret,et al.  An investigation of phylogenetic likelihood methods , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[25]  Diego Pol,et al.  Biases in Maximum Likelihood and Parsimony: A Simulation Approach to a 10-Taxon Case , 2001 .

[26]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[27]  M. Ragan,et al.  Are Ichthyosporea animals or fungi? Bayesian phylogenetic analysis of elongation factor 1alpha of Ichthyophonus irregularis. , 2003, Molecular phylogenetics and evolution.

[28]  R. Vos,et al.  Accelerated likelihood surface exploration: the likelihood ratchet. , 2003, Systematic biology.

[29]  Laura F. Landweber,et al.  How Mitochondria Redefine the Code , 2001, Journal of Molecular Evolution.

[30]  A. Leaché,et al.  Molecular systematics of the Eastern Fence Lizard (Sceloporus undulatus): a comparison of Parsimony, Likelihood, and Bayesian approaches. , 2002, Systematic biology.

[31]  P. Waddell,et al.  Evaluating placental inter-ordinal phylogenies with novel sequences including RAG1, gamma-fibrinogen, ND6, and mt-tRNA, plus MCMC-driven nucleotide, amino acid, and codon models. , 2003, Molecular phylogenetics and evolution.

[32]  Ziheng Yang,et al.  Phylogenetic Analysis by Maximum Likelihood (PAML) , 2002 .

[33]  Masami Hasegawa,et al.  Accuracies of the simple methods for estimating the bootstrap probability of a maximum-likelihood tree , 1994 .

[34]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[35]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[36]  C. Bauer,et al.  A cytochrome b origin of photosynthetic reaction centers: an evolutionary link between respiration and photosynthesis. , 2002, Journal of molecular biology.

[37]  Peter Arensburger,et al.  Combined data, Bayesian phylogenetics, and the origin of the New Zealand cicada genera. , 2002, Systematic biology.

[38]  J. Bull,et al.  An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis , 1993 .

[39]  O. Gascuel,et al.  Improvement of distance-based phylogenetic methods by a local maximum likelihood approach using triplets. , 2002, Molecular biology and evolution.

[40]  F. Lutzoni,et al.  Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov chain Monte Carlo sampling and bootstrapping in assessing phylogenetic confidence. , 2003, Molecular biology and evolution.

[41]  Hani Doss,et al.  Phylogenetic Tree Construction Using Markov Chain Monte Carlo , 2000 .

[42]  Martin Vingron,et al.  TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing , 2002, Bioinform..

[43]  Z. Yang,et al.  How often do wrong models produce better phylogenies? , 1997, Molecular biology and evolution.

[44]  Dong-Guk Shin,et al.  Nodal distance algorithm: calculating a phylogenetic tree comparison metric , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[45]  Wen-Hsiung Li,et al.  NJML: a hybrid algorithm for the neighbor-joining and maximum-likelihood methods. , 2000, Molecular biology and evolution.

[46]  L. Orgel,et al.  Phylogenetic Classification and the Universal Tree , 1999 .

[47]  A. Lemmon,et al.  The metapopulation genetic algorithm: An efficient solution for the problem of large phylogeny estimation , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Hirohisa Kishino,et al.  Very fast algorithms for evaluating the stability of ML and Bayesian phylogenetic trees from sequence data. , 2002, Genome informatics. International Conference on Genome Informatics.

[49]  W. Bruno,et al.  Topological bias and inconsistency of maximum likelihood using wrong models. , 1999, Molecular biology and evolution.

[50]  W. Doolittle,et al.  Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. , 2003, Molecular biology and evolution.

[51]  Doolittle Wf Phylogenetic Classification and the Universal Tree , 1999 .

[52]  Christian N. S. Pedersen,et al.  Computing the Quartet Distance between Evolutionary Trees in Time O(n log n) , 2001, Algorithmica.

[53]  Antonis Rokas,et al.  Comparing bootstrap and posterior probability values in the four-taxon case. , 2003, Systematic biology.

[54]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[55]  J. Keith Sampling phylogenetic tree space with the generalized Gibbs sampler , 2015, Cladistics : the international journal of the Willi Hennig Society.

[56]  M A Newton,et al.  Bayesian Phylogenetic Inference via Markov Chain Monte Carlo Methods , 1999, Biometrics.

[57]  Derrick J. Zwickl,et al.  Phylogenetic relationships of the dwarf boas and a comparison of Bayesian and bootstrap measures of phylogenetic support. , 2002, Molecular phylogenetics and evolution.

[58]  R. Zardoya,et al.  The complete mitochondrial genome of the nudibranch Roboastra europaea (Mollusca: Gastropoda) supports the monophyly of opisthobranchs. , 2002, Molecular biology and evolution.

[59]  D. Pearl,et al.  Stochastic search strategy for estimation of maximum likelihood phylogenetic trees. , 2001, Systematic biology.

[60]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[61]  C. Woese Interpreting the universal phylogenetic tree. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[62]  Wen-Hsiung Li,et al.  What is the Bootstrap Technique , 1994 .

[63]  H. Kishino,et al.  Maximum likelihood inference of protein phylogeny and the origin of chloroplasts , 1990, Journal of Molecular Evolution.

[64]  Ming Li,et al.  Computing the quartet distance between evolutionary trees , 2000, SODA '00.