Inferring ancient divergences requires genes with strong phylogenetic signals

To tackle incongruence, the topological conflict between different gene trees, phylogenomic studies couple concatenation with practices such as rogue taxon removal or the use of slowly evolving genes. Phylogenomic analysis of 1,070 orthologues from 23 yeast genomes identified 1,070 distinct gene trees, which were all incongruent with the phylogeny inferred from concatenation. Incongruence severity increased for shorter internodes located deeper in the phylogeny. Notably, whereas most practices had little or negative impact on the yeast phylogeny, the use of genes or internodes with high average internode support significantly improved the robustness of inference. We obtained similar results in analyses of vertebrate and metazoan phylogenomic data sets. These results question the exclusive reliance on concatenation and associated practices, and argue that selecting genes with strong phylogenetic signals and demonstrating the absence of significant incongruence are essential for accurately reconstructing ancient divergences.

[1]  Michael B. Eisen,et al.  The Awesome Power of Yeast Evolutionary Genetics: New Genome Sequences and Strain Resources for the Saccharomyces sensu stricto Genus , 2011, G3: Genes | Genomes | Genetics.

[2]  F. Delsuc,et al.  Tunicates and not cephalochordates are the closest living relatives of vertebrates , 2006, Nature.

[3]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[4]  Brigitte Cambon,et al.  Eukaryote-to-eukaryote gene transfer events revealed by the genome sequence of the wine yeast Saccharomyces cerevisiae EC1118 , 2009, Proceedings of the National Academy of Sciences.

[5]  N. Goldman,et al.  Addressing Inter-Gene Heterogeneity in Maximum Likelihood Phylogenomic Analysis: Yeasts Revisited , 2011, PloS one.

[6]  C. Bult,et al.  TESTING SIGNIFICANCE OF INCONGRUENCE , 1994 .

[7]  G. Butler,et al.  Yeast genome evolution—the origin of the species , 2007, Yeast.

[8]  D. Penny,et al.  Genome-scale phylogeny and the detection of systematic biases. , 2004, Molecular biology and evolution.

[9]  Noah A Rosenberg,et al.  Gene tree discordance, phylogenetic inference and the multispecies coalescent. , 2009, Trends in ecology & evolution.

[10]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[11]  Ofir Cohen,et al.  Large-scale parsimony analysis of metazoan indels in protein-coding genes. , 2010, Molecular biology and evolution.

[12]  A. Zwick,et al.  Sources of Signal in 62 Protein-Coding Nuclear Genes for Higher-Level Phylogenetics of Arthropods , 2011, PloS one.

[13]  Elchanan Mossel,et al.  A phase transition for a random cluster model on phylogenetic trees. , 2004, Mathematical biosciences.

[14]  S. Carroll,et al.  More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy. , 2005, Molecular biology and evolution.

[15]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[16]  Allen G. Rodrigo,et al.  A randomisation test of the null hypothesis that two cladograms are sample estimates of a parametric phylogenetic tree , 1993 .

[17]  Stephen A. Smith,et al.  Resolving the evolutionary relationships of molluscs with phylogenomic tools , 2011, Nature.

[18]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[19]  Kevin P. Byrne,et al.  Analysis of gene evolution and metabolic pathways using the Candida Gene Order Browser , 2010, BMC Genomics.

[20]  T. Speed,et al.  GOstat: find statistically overrepresented Gene Ontologies within a group of genes. , 2004, Bioinformatics.

[21]  Kevin P. Byrne,et al.  Multiple rounds of speciation associated with reciprocal gene loss in polyploid yeasts , 2006, Nature.

[22]  Jason E Stajich,et al.  A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis , 2006, BMC Evolutionary Biology.

[23]  J. Shultz,et al.  Arthropod relationships revealed by phylogenomic analysis of nuclear protein-coding sequences , 2010, Nature.

[24]  Luay Nakhleh,et al.  The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection , 2012, PLoS genetics.

[25]  Sarah J. Bourlat,et al.  Deuterostome phylogeny reveals monophyletic chordates and the new phylum Xenoturbellida , 2006, Nature.

[26]  Derrick J. Zwickl Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion , 2006 .

[27]  Kevin P. Byrne,et al.  The Yeast Gene Order Browser: combining curated homology and syntenic context reveals gene fate in polyploid species. , 2005, Genome research.

[28]  Kazutaka Katoh,et al.  Recent developments in the MAFFT multiple sequence alignment program , 2008, Briefings Bioinform..

[29]  S. Whelan,et al.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. , 2001, Molecular biology and evolution.

[30]  Mark Johnston,et al.  Leveraging skewed transcript abundance by RNA-Seq to increase the genomic depth of the tree of life , 2010, Proceedings of the National Academy of Sciences.

[31]  Sophie Brachat,et al.  Contribution of Horizontal Gene Transfer to the Evolution of Saccharomyces cerevisiae , 2005, Eukaryotic Cell.

[32]  S. Carroll,et al.  Animal Evolution and the Molecular Signature of Radiations Compressed in Time , 2005, Science.

[33]  Vladimir Makarenkov,et al.  T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks , 2012, Nucleic Acids Res..

[34]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..

[35]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[36]  David Q. Matus,et al.  Broad phylogenomic sampling improves resolution of the animal tree of life , 2008, Nature.

[37]  R DeSalle,et al.  Multiple sources of character information and the phylogeny of Hawaiian drosophilids. , 1997, Systematic biology.

[38]  C. Cunningham,et al.  Can three incongruence tests predict when data should be combined? , 1997, Molecular biology and evolution.

[39]  H. Philippe,et al.  Resolving Difficult Phylogenetic Questions: Why More Sequences Are Not Enough , 2011, PLoS biology.

[40]  Carol J. Bult,et al.  Constructing a Significance Test for Incongruence , 1995 .

[41]  F. Delsuc,et al.  Phylogenomics: the beginning of incongruence? , 2006, Trends in genetics : TIG.

[42]  S. Carroll,et al.  Bushes in the Tree of Life , 2006, PLoS biology.

[43]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[44]  B. Dujon Yeast evolutionary genomics , 2010, Nature Reviews Genetics.

[45]  Antonis Rokas,et al.  The ASP3 locus in Saccharomyces cerevisiae originated by horizontal gene transfer from Wickerhamomyces. , 2012, FEMS yeast research.

[46]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[47]  A. R. Templeton,et al.  PHYLOGENETIC INFERENCE FROM RESTRICTION ENDONUCLEASE CLEAVAGE SITE MAPS WITH PARTICULAR REFERENCE TO THE EVOLUTION OF HUMANS AND THE APES , 1983, Evolution; international journal of organic evolution.

[48]  L. Moroz,et al.  Phylogenomics reveals deep molluscan relationships , 2011, Nature.

[49]  David Posada,et al.  ProtTest: selection of best-fit models of protein evolution , 2005, Bioinform..

[50]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[51]  Sergei L. Kosakovsky Pond,et al.  Statistics and truth in phylogenomics. , 2012, Molecular biology and evolution.

[52]  Jeffrey P Townsend,et al.  Phylogenetic signal and noise: predicting the power of a data set to resolve phylogeny. , 2012, Systematic biology.

[53]  B. Schierwater,et al.  Concatenated Analysis Sheds Light on Early Metazoan Evolution and Fuels a Modern “Urmetazoon” Hypothesis , 2009, PLoS biology.

[54]  Gerard Talavera,et al.  Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. , 2007, Systematic biology.

[55]  D. Pearl,et al.  Estimating species phylogenies using coalescence times among sequences. , 2009, Systematic biology.

[56]  A. Rokas,et al.  Evaluating Ortholog Prediction Algorithms in a Yeast Model Clade , 2011, PloS one.

[57]  Antonis Rokas,et al.  Multiple GAL pathway gene clusters evolved independently and by different mechanisms in fungi , 2010, Proceedings of the National Academy of Sciences.

[58]  Jason E Stajich,et al.  Resolving arthropod phylogeny: exploring phylogenetic signal within 41 kb of protein-coding nuclear gene sequence. , 2008, Systematic biology.

[59]  Vincent Moulton,et al.  Using consensus networks to visualize contradictory evidence for species phylogeny. , 2004, Molecular biology and evolution.

[60]  Antonis Rokas,et al.  Parallel inactivation of multiple GAL pathway genes and ecological diversification in yeasts. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[61]  Liran Carmel,et al.  Ecdysozoan clade rejected by genome-wide analysis of rare amino acid replacements. , 2007, Molecular biology and evolution.

[62]  Corinne Da Silva,et al.  Phylogenomics Revives Traditional Views on Deep Animal Relationships , 2009, Current Biology.