A bias in ML estimates of branch lengths in the presence of multiple signals.

Sequence data often have competing signals that are detected by network programs or Lento plots. Such data can be formed by generating sequences on more than one tree, and combining the results, a mixture model. We report that with such mixture models, the estimates of edge (branch) lengths from maximum likelihood (ML) methods that assume a single tree are biased. Based on the observed number of competing signals in real data, such a bias of ML is expected to occur frequently. Because network methods can recover competing signals more accurately, there is a need for ML methods allowing a network. A fundamental problem is that mixture models can have more parameters than can be recovered from the data, so that some mixtures are not, in principle, identifiable. We recommend that network programs be incorporated into best practice analysis, along with ML and Bayesian trees.

[1]  D. Penny,et al.  Use of spectral analysis to test hypotheses on the origin of pinnipeds. , 1995, Molecular biology and evolution.

[2]  Barbara R. Holland,et al.  Multiple maxima of likelihood in phylogenetic trees: an analytic approach , 2000, RECOMB '00.

[3]  Eric Vigoda,et al.  Pitfalls of heterogeneous processes for phylogenetic reconstruction. , 2007, Systematic biology.

[4]  V Moulton,et al.  Likelihood analysis of phylogenetic networks using directed graphical models. , 2000, Molecular biology and evolution.

[5]  D. Penny,et al.  Genome-scale phylogeny and the detection of systematic biases. , 2004, Molecular biology and evolution.

[6]  Joseph T. Chang,et al.  Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. , 1996, Mathematical biosciences.

[7]  David Posada,et al.  RDP2: recombination detection and analysis from sequence alignments , 2005, Bioinform..

[8]  B. Rannala Identi(cid:142)ability of Parameters in MCMC Bayesian Inference of Phylogeny , 2002 .

[9]  D. Penny,et al.  Treeness triangles: visualizing the loss of phylogenetic signal. , 2007, Molecular biology and evolution.

[10]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[11]  Vincent Moulton,et al.  Spectronet: a package for computing spectra and median networks. , 2002, Applied bioinformatics.

[12]  Mike Steel,et al.  Phylogenetic mixtures on a single tree can mimic a tree of another topology. , 2007, Systematic biology.

[13]  P. Lewis,et al.  Success of maximum likelihood phylogeny inference in the four-taxon case. , 1995, Molecular biology and evolution.

[14]  M. Pagel,et al.  A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. , 2004, Systematic biology.

[15]  D Penny,et al.  Evolution of chlorophyll and bacteriochlorophyll: the problem of invariant sites in sequence analysis. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[16]  D. Swofford,et al.  Should we use model-based methods for phylogenetic inference when we know that assumptions about among-site rate variation and nucleotide substitution pattern are violated? , 2001, Systematic biology.

[17]  T. Buckley,et al.  Model misspecification and probabilistic tests of topology: evidence from empirical data sets. , 2002, Systematic biology.

[18]  Elizabeth S. Allman,et al.  The Identifiability of Tree Topology for Phylogenetic Models, Including Covarion and Mixture Models , 2005, J. Comput. Biol..

[19]  Vincent Moulton,et al.  Using consensus networks to visualize contradictory evidence for species phylogeny. , 2004, Molecular biology and evolution.

[20]  Mike Steel,et al.  Should phylogenetic models be trying to "fit an elephant"? , 2005, Trends in genetics : TIG.

[21]  D. Penny,et al.  Controversy on chloroplast origins , 1992, FEBS Letters.

[22]  A. Dress,et al.  A canonical decomposition theory for metrics on a finite set , 1992 .

[23]  R. Graham,et al.  Unlikelihood that minimal phylogenies for a realistic biological study can be constructed in reasonable computational time , 1982 .

[24]  D. Penny,et al.  Spectral analysis of phylogenetic data , 1993 .

[25]  J. Huelsenbeck,et al.  SUCCESS OF PHYLOGENETIC METHODS IN THE FOUR-TAXON CASE , 1993 .

[26]  O. Pybus,et al.  Bayesian coalescent inference of past population dynamics from molecular sequences. , 2005, Molecular biology and evolution.

[27]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[28]  N. Rosenberg,et al.  Discordance of Species Trees with Their Most Likely Gene Trees , 2006, PLoS genetics.

[29]  Bryan Kolaczkowski,et al.  Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous , 2004, Nature.