Exploring among-site rate variation models in a maximum likelihood framework using empirical data: effects of model assumptions on estimates of topology, branch lengths, and bootstrap support.

We have investigated the effects of different among-site rate variation models on the estimation of substitution model parameters, branch lengths, topology, and bootstrap proportions under minimum evolution (ME) and maximum likelihood (ML). Specifically, we examined equal rates, invariable sites, gamma-distributed rates, and site-specific rates (SSR) models, using mitochondrial DNA sequence data from three protein-coding genes and one tRNA gene from species of the New Zealand cicada genus Maoricicada. Estimates of topology were relatively insensitive to the substitution model used; however, estimates of bootstrap support, branch lengths, and R-matrices (underlying relative substitution rate matrix) were strongly influenced by the assumptions of the substitution model. We identified one situation where ME and ML tree building became inaccurate when implemented with an inappropriate among-site rate variation model. Despite the fact the SSR models often have a better fit to the data than do invariable sites and gamma rates models, SSR models have some serious weaknesses. First, SSR rate parameters are not comparable across data sets, unlike the proportion of invariable sites or the alpha shape parameter of the gamma distribution. Second, the extreme among-site rate variation within codon positions is problematic for SSR models, which explicitly assume rate homogeneity within each rate class. Third, the SSR models appear to give severe underestimates of R-matrices and branch lengths relative to invariable sites and gamma rates models in this example. We recommend performing phylogenetic analyses under a range of substitution models to test the effects of model assumptions not only on estimates of topology but also on estimates of branch length and nodal support.

[1]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .

[2]  Hidetoshi Shimodaira,et al.  Evaluating hypotheses on the origin and evolution of the New Zealand alpine cicadas (Maoricicada) using multiple-comparison tests of tree topology. , 2001, Molecular biology and evolution.

[3]  G. Hewitt,et al.  The sequence and structure of the meadow grasshopper (Chorthippus parallelus) mitochondrial srRNA, ND2, COl, COll ATPase8 and 9 tRNA genes , 1996, Insect molecular biology.

[4]  D Penny,et al.  Evolution of chlorophyll and bacteriochlorophyll: the problem of invariant sites in sequence analysis. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[5]  E. L. Cabot,et al.  Simultaneous editing of multiple nucleic acid and protein sequences with ESEE , 1989, Comput. Appl. Biosci..

[6]  D. Swofford,et al.  The Effect of Taxon Sampling on Estimating Rate Heterogeneity Parameters of Maximum-Likelihood Models , 1999 .

[7]  D. Swofford,et al.  Evolution of the Mitochondrial Cytochrome Oxidase II Gene in Collembola , 1997, Journal of Molecular Evolution.

[8]  A. Rambaut,et al.  Estimating divergence dates from molecular sequences. , 1998, Molecular biology and evolution.

[9]  Nick Goldman,et al.  MAXIMUM LIKELIHOOD TREES FROM DNA SEQUENCES: A PECULIAR STATISTICAL ESTIMATION PROBLEM , 1995 .

[10]  R. Ward,et al.  Mitochondrial genes and mammalian phylogenies: increasing the reliability of branch length estimation. , 2000, Molecular biology and evolution.

[11]  J. Huelsenbeck Performance of Phylogenetic Methods in Simulation , 1995 .

[12]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[13]  M. Steel,et al.  A covariotide model explains apparent phylogenetic structure of oxygenic photosynthetic lineages. , 1998, Molecular biology and evolution.

[14]  K. Holsinger,et al.  Among-site rate variation and phylogenetic analysis of 12S rRNA in sigmodontine rodents. , 1995, Molecular biology and evolution.

[15]  C. Simon,et al.  Secondary structure and conserved motifs of the frequently sequenced domains IV and V of the insect mitochondrial large subunit rRNA gene , 2000, Insect molecular biology.

[16]  J. Zhang,et al.  Performance of likelihood ratio tests of evolutionary hypotheses under inadequate substitution models. , 1999, Molecular biology and evolution.

[17]  A Gajdos,et al.  [Evolution of protein molecules. I. Protein synthesis]. , 1972, La Nouvelle presse medicale.

[18]  P. Lewis,et al.  A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. , 1998, Molecular biology and evolution.

[19]  R. Olmstead,et al.  Patterns of Sequence Evolution and Implications for Parsimony Analysis of Chloroplast DNA , 1998 .

[20]  P J Waddell,et al.  Using novel phylogenetic methods to evaluate mammalian mtDNA, including amino acid-invariant sites-LogDet plus site stripping, to detect internal conflicts in the data, with special reference to the positions of hedgehog, armadillo, and elephant. , 1999, Systematic biology.

[21]  Ramakant Sharma,et al.  Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood , 2003 .

[22]  J. Wakeley,et al.  The excess of transitions among nucleotide substitutions: new methods of estimating transition bias underscore its significance. , 1996, Trends in ecology & evolution.

[23]  D. Penny,et al.  Conserved sequence motifs, alignment, and secondary structure for the third domain of animal 12S rRNA. , 1996, Molecular biology and evolution.

[24]  P. Lewis,et al.  Success of maximum likelihood phylogeny inference in the four-taxon case. , 1995, Molecular biology and evolution.

[25]  K. Strimmer,et al.  Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies , 1996 .

[26]  M. Nei,et al.  Relative efficiencies of the maximum-likelihood, neighbor-joining, and maximum-parsimony methods when substitution rate varies with site. , 1994, Molecular biology and evolution.

[27]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[28]  B. Crespi,et al.  Evolution, weighting, and phylogenetic utility of mitochondrial gene sequences and a compilation of conserved polymerase chain reaction primers , 1994 .

[29]  W. Li,et al.  Maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites. , 1995, Molecular biology and evolution.

[30]  M. Steel,et al.  Modeling the covarion hypothesis of nucleotide substitution. , 1998, Mathematical biosciences.

[31]  D. Hillis,et al.  BEST‐FIT MAXIMUM‐LIKELIHOOD MODELS FOR PHYLOGENETIC INFERENCE: EMPIRICAL TESTS WITH KNOWN PHYLOGENIES , 1998, Evolution; international journal of organic evolution.

[32]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[33]  M. Miyamoto,et al.  Testing the covarion hypothesis of molecular evolution. , 1995, Molecular biology and evolution.

[34]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[35]  S. Muse,et al.  A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. , 1994, Molecular biology and evolution.

[36]  R. Redner Note on the Consistency of the Maximum Likelihood Estimate for Nonidentifiable Distributions , 1981 .

[37]  P. Sunnucks,et al.  Numerous transposed sequences of mitochondrial cytochrome oxidase I-II in aphids of the genus Sitobion (Hemiptera: Aphididae). , 1996, Molecular biology and evolution.

[38]  Ziheng Yang,et al.  Estimation of the Transition/Transversion Rate Bias and Species Sampling , 1999, Journal of Molecular Evolution.

[39]  J. Wakeley,et al.  Substitution-rate variation among sites and the estimation of transition bias. , 1994, Molecular biology and evolution.

[40]  Michael D. Hendy,et al.  A Framework for the Quantitative Study of Evolutionary Trees , 1989 .

[41]  D Penny,et al.  Hadamard conjugations and modeling sequence evolution with unequal rates across sites. , 1997, Molecular phylogenetics and evolution.

[42]  K. Crandall,et al.  Phylogeny Estimation and Hypothesis Testing Using Maximum Likelihood , 1997 .

[43]  C. Fleming,et al.  New Zealand cicadas of the genus Maoricicada (Homoptera: Tibicinidae) , 1978 .

[44]  J. Huelsenbeck,et al.  Hobgoblin of phylogenetics? , 1994, Nature.

[45]  C. W. Kilpatrick,et al.  Phylogeography and molecular systematics of the Peromyscus aztecus species group (Rodentia: Muridae) inferred using parsimony and likelihood. , 1997, Systematic biology.

[46]  K. Kidd,et al.  Phylogenetic analysis: concepts and methods. , 1971, American journal of human genetics.

[47]  Asami,et al.  Using Novel Phylogenetic Methods to Evaluate Mammalian mtDNA, Including Amino Acid-Invariant Sites-LogDet plus Site Stripping, to Detect Internal Conflicts in the Data, with Special Reference to the Positions of Hedgehog, Armadillo, and Elephant , 2001 .

[48]  S. Edwards,et al.  Can weighting improve bushy trees? Models of cytochrome b evolution and the molecular systematics of pipits and wagtails (Aves: Motacillidae). , 1998, Systematic biology.

[49]  Ziheng Yang Statistical Properties of the Maximum Likelihood Method of Phylogenetic Estimation and Comparison With Distance Matrix Methods , 1994 .

[50]  J. S. Rogers,et al.  On the consistency of maximum likelihood estimation of phylogenetic trees from nucleotide sequences. , 1997, Systematic biology.

[51]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[52]  M. Steel,et al.  General time-reversible distances with unequal rates across sites: mixing gamma and inverse Gaussian distributions with invariant sites. , 1997, Molecular phylogenetics and evolution.

[53]  G. Pesole,et al.  Complete mitochondrial DNA sequence of the fat dormouse, Glis glis: further evidence of rodent paraphyly. , 1998, Molecular biology and evolution.

[54]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[55]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[56]  M. Gouy,et al.  Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. , 1998, Molecular biology and evolution.

[57]  M. Nei,et al.  MEGA: Molecular Evolutionary Genetics Analysis, Version 1.02. , 1995 .

[58]  Z. Yang,et al.  Among-site rate variation and its impact on phylogenetic analyses. , 1996, Trends in ecology & evolution.

[59]  Hidetoshi Shimodaira,et al.  Multiple Comparisons of Log-Likelihoods with Applications to Phylogenetic Inference , 1999, Molecular Biology and Evolution.

[60]  Ziheng Yang,et al.  STATISTICAL TESTS OF HOST‐PARASITE COSPECIATION , 1997, Evolution; international journal of organic evolution.

[61]  Z. Yang,et al.  A space-time process model for the evolution of DNA sequences. , 1995, Genetics.

[62]  J. S. Rogers,et al.  A fast method for approximating maximum likelihoods of phylogenetic trees from nucleotide sequences. , 1998, Systematic biology.

[63]  Bc Haimson,et al.  A Simple Method for Estimating In Situ Stresses at Great Depths , 1974 .

[64]  J. Huelsenbeck The robustness of two phylogenetic methods: four-taxon simulations reveal a slight superiority of maximum likelihood over neighbor joining. , 1995, Molecular biology and evolution.

[65]  A. Edwards Likelihood (Expanded Edition) , 1972 .

[66]  B. Rannala,et al.  Phylogenetic methods come of age: testing hypotheses in an evolutionary context. , 1997, Science.

[67]  T. Gojobori,et al.  Correct and incorrect vertebrate phylogenies obtained by the entire mitochondrial DNA sequences. , 1999, Molecular biology and evolution.

[68]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[69]  A. Halpern,et al.  Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. , 2000, Molecular biology and evolution.

[70]  M. Nei,et al.  A Simple Method for Estimating and Testing Minimum-Evolution Trees , 1992 .

[71]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[72]  G. Sensabaugh,et al.  A mitochondrial control region and cytochrome b phylogeny of sika deer (Cervus nippon) and report of tandem repeats in the control region. , 1999, Molecular phylogenetics and evolution.

[73]  Simon Whelan,et al.  Distributions of statistics used for the comparison of models of sequence evolution in phylogenetics , 1999 .