Tracing the decay of the historical signal in biological sequence data.

Alignments of nucleotide or amino acid sequences may contain a variety of different signals, one of which is the historical signal that we often try to recover by phylogenetic analysis. Other signals, such as those arising due to compositional heterogeneities, among-lineage and among-site rate heterogeneities, invariant sites, and covariotides, may interfere adversely with the recovery of the historical signal. The effect of the interaction of these signals on phylogenetic inference is not well understood and may, in many cases, even be underappreciated. In this study, we investigate this matter and present results based on Monte Carlo simulations. We explored the success of four phylogenetic methods in recovering the true tree from data that had evolved under conditions where the equilibrium base frequencies and substitution rates were allowed to vary among lineages. Seven scenarios with increasingly complex conditions were investigated. All of the methods tested, with the exception of neighbor-joining using LogDet distances, were sensitive to compositional convergence in nonsister lineages. Maximum parsimony was also susceptible to attraction between long edges. In many cases, however, phylogenetic inference methods can still recover the true tree when misleading signals are present, in some instances even when the historical signal is no longer dominant. These results highlight the growing need for simple methods to detect violation of the phylogenetic assumptions.

[1]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[2]  D. Penny,et al.  Mitochondrial genomes of a bandicoot and a brushtail possum confirm the monophyly of australidelphian marsupials , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[3]  S. Katiyar,et al.  Phylogenetic Analysis of β-Tubulin Sequences from Amitochondrial Protozoa , 1996 .

[4]  T. Andrews,et al.  Accelerated Evolution of Cytochrome b in Simian Primates: Adaptive Evolution in Concert with Other Mitochondrial Proteins? , 1998, Journal of Molecular Evolution.

[5]  J. Huelsenbeck Tree-Length Distribution Skewness: An Indicator of Phylogenetic Information , 1991 .

[6]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[7]  D Penny,et al.  Progress with methods for constructing evolutionary trees. , 1992, Trends in ecology & evolution.

[8]  Sudhir Kumar,et al.  Heterogeneity of nucleotide frequencies among evolutionary lineages and phylogenetic inference. , 2003, Molecular biology and evolution.

[9]  J. Lake,et al.  Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[10]  J. Palmer,et al.  Animals and fungi are each other's closest relatives: congruent evidence from multiple proteins. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[11]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[12]  Fitch Wm An estimation of the number of invariable sites is necessary for the accurate estimation of the number of nucleotide substitutions since a common ancestor. , 1986 .

[13]  Michael D. Hendy,et al.  A Framework for the Quantitative Study of Evolutionary Trees , 1989 .

[14]  B. Chang,et al.  Bias in phylogenetic reconstruction of vertebrate rhodopsin sequences. , 2000, Molecular biology and evolution.

[15]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[16]  D. Penny,et al.  Use of spectral analysis to test hypotheses on the origin of pinnipeds. , 1995, Molecular biology and evolution.

[17]  Masami Hasegawa,et al.  Ribosomal RNA trees misleading? , 1993, Nature.

[18]  M. Gouy,et al.  Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. , 1998, Molecular biology and evolution.

[19]  C. Gissi,et al.  Evolutionary genomics in Metazoa: the mitochondrial DNA as a model system. , 1999, Gene.

[20]  J. Palmer,et al.  Evidence from beta-tubulin phylogeny that microsporidia evolved from within the fungi. , 2000, Molecular biology and evolution.

[21]  P. Lockhart,et al.  Substitutional bias confounds inference of cyanelle origins from sequence data , 1992, Journal of Molecular Evolution.

[22]  E. Holmes,et al.  The evolution of base composition and phylogenetic inference. , 2000, Trends in ecology & evolution.

[23]  W. Müller,et al.  Evolutionary relationships of Metazoa within the eukaryotes based on molecular data from Porifera , 1999, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[24]  M. A. Steel,et al.  Confidence in evolutionary trees from biological sequence data , 1993, Nature.

[25]  F. Ayala,et al.  Shared nucleotide composition biases among species and their impact on phylogenetic reconstructions of the Drosophilidae. , 2001, Molecular biology and evolution.

[26]  D Penny,et al.  Evolution of chlorophyll and bacteriochlorophyll: the problem of invariant sites in sequence analysis. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[27]  W. Doolittle,et al.  Alpha-tubulin from early-diverging eukaryotic lineages and the evolution of the tubulin family. , 1996, Molecular biology and evolution.

[28]  J. Huelsenbeck Performance of Phylogenetic Methods in Simulation , 1995 .

[29]  W. Bruno,et al.  Topological bias and inconsistency of maximum likelihood using wrong models. , 1999, Molecular biology and evolution.

[30]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[31]  P. Sharp,et al.  Rates and dates of divergence between AIDS virus nucleotide sequences. , 1988, Molecular biology and evolution.

[32]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[33]  M. Siddall,et al.  Success of Parsimony in the Four‐Taxon Case: Long‐Branch Repulsion by Likelihood in the Farris Zone , 1998 .

[34]  M. Gouy,et al.  Inferring phylogenies from DNA sequences of unequal base compositions. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[35]  S. Katiyar,et al.  Tubulin genes from AIDS-associated microsporidia and implications for phylogeny and benzimidazole sensitivity. , 1996, Molecular and biochemical parasitology.

[36]  J. S. Rogers,et al.  Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. , 2001, Systematic biology.

[37]  Diego Pol,et al.  Biases in Maximum Likelihood and Parsimony: A Simulation Approach to a 10-Taxon Case , 2001 .

[38]  A. Dress,et al.  Split decomposition: a new and useful approach to phylogenetic analysis of distance data. , 1992, Molecular phylogenetics and evolution.

[39]  Z. Yang,et al.  On the use of nucleic acid sequences to infer early branchings in the tree of life. , 1995, Molecular biology and evolution.

[40]  Andrey A. Zharkikh,et al.  Inconsistency of the Maximum-parsimony Method: the Case of Five Taxa With a Molecular Clock , 1993 .

[41]  J. Huelsenbeck,et al.  Application and accuracy of molecular phylogenies. , 1994, Science.

[42]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[43]  R. Debry,et al.  The consistency of several phylogeny-inference methods under varying evolutionary rates. , 1992, Molecular biology and evolution.

[44]  D. Penny,et al.  Spectral analysis of phylogenetic data , 1993 .

[45]  J. Huelsenbeck,et al.  SUCCESS OF PHYLOGENETIC METHODS IN THE FOUR-TAXON CASE , 1993 .

[46]  M. Steel,et al.  Recovering evolutionary trees under a more realistic model of sequence evolution. , 1994, Molecular biology and evolution.

[47]  N. Sueoka CHAPTER 9 – Compositional Variation and Heterogeneity of Nucleic Acids and Protein in Bacteria , 1964 .

[48]  Michael D. Hendy,et al.  Hadamard conjugation: a versatile tool for modelling nucleotide sequence evolution , 1993 .

[49]  J. Huelsenbeck,et al.  Hobgoblin of phylogenetics? , 1994, Nature.

[50]  Junhyong Kim,et al.  GENERAL INCONSISTENCY CONDITIONS FOR MAXIMUM PARSIMONY: EFFECTS OF BRANCH LENGTHS AND INCREASING NUMBERS OF TAXA , 1996 .

[51]  J. Farris Likelihood and Inconsistency , 1999, Cladistics : the international journal of the Willi Hennig Society.

[52]  Detlef D. Leipe,et al.  Evolutionary history of "early-diverging" eukaryotes: the excavate taxon Carpediemonas is a close relative of Giardia. , 2002, Molecular biology and evolution.

[53]  D. Penny,et al.  Outgroup misplacement and phylogenetic inaccuracy under a molecular clock--a simulation study. , 2003, Systematic biology.

[54]  Faisal Ababneh,et al.  The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. , 2004, Systematic biology.

[55]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[56]  M. Nei,et al.  Relative efficiencies of the maximum-likelihood, neighbor-joining, and maximum-parsimony methods when substitution rate varies with site. , 1994, Molecular biology and evolution.

[57]  P. Lewis,et al.  Effects of nucleotide composition bias on the success of the parsimony criterion in phylogenetic inference. , 2001, Molecular biology and evolution.

[58]  P. Lewis,et al.  Success of maximum likelihood phylogeny inference in the four-taxon case. , 1995, Molecular biology and evolution.

[59]  W. Li,et al.  Evidence for higher rates of nucleotide substitution in rodents than in man. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[60]  Faisal Ababneh,et al.  Hetero: a program to simulate the evolution of DNA on a four-taxon tree. , 2003, Applied bioinformatics.

[61]  B S Weir,et al.  Testing for equality of evolutionary rates. , 1992, Genetics.

[62]  A. Halpern,et al.  Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. , 2000, Molecular biology and evolution.

[63]  W. Fitch An estimation of the number of invariable sites is necessary for the accurate estimation of the number of nucleotide substitutions since a common ancestor. , 1986, Progress in clinical and biological research.

[64]  S. Easteal,et al.  The partition matrix: exploring variable phylogenetic signals along nucleotide sequence alignments. , 1997, Molecular biology and evolution.

[65]  M. Steel Recovering a tree from the leaf colourations it generates under a Markov model , 1994 .

[66]  Joseph T. Chang,et al.  Inconsistency of evolutionary tree topology reconstruction methods when substitution rates vary across characters. , 1996, Mathematical biosciences.

[67]  W. Fitch The estimate of total nucleotide substitutions from pairwise differences is biased. , 1986, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.