Accounting for gene rate heterogeneity in phylogenetic inference.

Traditionally, phylogenetic analyses over many genes combine data into a contiguous block. Under this concatenated model, all genes are assumed to evolve at the same rate. However, it is clear that genes evolve at very different rates and that accounting for this rate heterogeneity is important if we are to accurately infer phylogenies from heterogeneous multigene data sets. There remain open questions regarding how best to incorporate gene rate parameters into phylogenetic models and which properties of real data correlate with improved fit over the concatenated model. In this study, two methods of accounting for gene rate heterogeneity are compared: the n-parameter method, which allows for each of the n gene partitions to have a gene rate parameter, and the alpha-parameter method, which fits a distribution to the gene rates. Results demonstrate that the n-parameter method is both computationally faster and in general provides a better fit over the concatenated model than the alpha-parameter method. Furthermore, improved model fit over the concatenated model is highly correlated with the presence of a gene with a slow relative rate of evolution.

[1]  Ziheng Yang,et al.  Maximum-likelihood models for combined analyses of multiple sequence data , 1996, Journal of Molecular Evolution.

[2]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[3]  Liangjiang Wang,et al.  The WRKY transcription factor superfamily: its origin in eukaryotes and expansion in plants , 2005, BMC Evolutionary Biology.

[4]  S. Carroll,et al.  More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy. , 2005, Molecular biology and evolution.

[5]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[6]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[7]  M. Stanhope,et al.  Molecules consolidate the placental mammal tree. , 2004, Trends in ecology & evolution.

[8]  H. Akaike A new look at the statistical model identification , 1974 .

[9]  W. Doolittle,et al.  Microsporidia are related to Fungi: evidence from the largest subunit of RNA polymerase II and other proteins. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[10]  D. Penny,et al.  Genome-scale phylogeny and the detection of systematic biases. , 2004, Molecular biology and evolution.

[11]  H. Philippe,et al.  Archaea sister group of Bacteria? Indications from tree reconstruction artifacts in ancient phylogenies. , 1999, Molecular biology and evolution.

[12]  M. Melkonian,et al.  Are combined analyses better than single gene phylogenies? A case study using SSU rDNA and rbcL sequence comparisons in the Zygnematophyceae (Streptophyta). , 2003, Molecular biology and evolution.

[13]  Itay Mayrose,et al.  A Gamma mixture model better accounts for among site rate heterogeneity , 2005, ECCB/JBI.

[14]  T. Cavalier-smith,et al.  Analyses of RNA Polymerase II genes from free-living protists: phylogeny, long branch attraction, and the eukaryotic big bang. , 2002, Molecular biology and evolution.

[15]  Peter G Foster,et al.  Modeling compositional heterogeneity. , 2004, Systematic biology.

[16]  D Penny,et al.  Evolution of chlorophyll and bacteriochlorophyll: the problem of invariant sites in sequence analysis. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Joseph Felsenstein,et al.  Taking Variation of Evolutionary Rates Between Sites into Account in Inferring Phylogenies , 2001, Journal of Molecular Evolution.

[18]  B. Rannala,et al.  Closing the gap between rocks and clocks. , 2005, Heredity.

[19]  S. O’Brien,et al.  Molecular phylogenetics and the origins of placental mammals , 2001, Nature.

[20]  J. William,et al.  Combining data in phylogenetic analysis. , 1996, Trends in ecology & evolution.

[21]  Tal Pupko,et al.  Combining multiple data sets in a likelihood analysis: which models are the best? , 2002, Molecular biology and evolution.

[22]  Hervé Philippe,et al.  An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics. , 2005, Systematic biology.

[23]  J. Huelsenbeck,et al.  Bayesian phylogenetic analysis of combined data. , 2004, Systematic biology.

[24]  Timothy J. Harlow,et al.  A hybrid clustering approach to recognition of protein families in 114 microbial genomes , 2004, BMC Bioinformatics.

[25]  Padhraic Smyth,et al.  Model selection for probabilistic clustering using cross-validated likelihood , 2000, Stat. Comput..

[26]  Diana J. Kao,et al.  Parallel adaptive radiations in two major clades of placental mammals , 2001, Nature.

[27]  G. Naylor,et al.  Choosing the best genes for the job: the case for stationary genes in genome-scale phylogenetics. , 2005, Systematic biology.

[28]  Terry Gaasterland,et al.  The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[29]  David Bryant,et al.  Calculating the evolutionary rates of different genes: a fast, accurate estimator with applications to maximum likelihood phylogenetic analysis. , 2005, Systematic biology.

[30]  Frédéric Delsuc,et al.  Heterotachy and long-branch attraction in phylogenetics , 2005, BMC Evolutionary Biology.

[31]  J. Bull,et al.  Partitioning and combining data in phylogenetic analysis , 1993 .