Addressing Inter-Gene Heterogeneity in Maximum Likelihood Phylogenomic Analysis: Yeasts Revisited

Phylogenomic approaches to the resolution of inter-species relationships have become well established in recent years. Often these involve concatenation of many orthologous genes found in the respective genomes followed by analysis using standard phylogenetic models. Genome-scale data promise increased resolution by minimising sampling error, yet are associated with well-known but often inappropriately addressed caveats arising through data heterogeneity and model violation. These can lead to the reconstruction of highly-supported but incorrect topologies. With the aim of obtaining a species tree for 18 species within the ascomycetous yeasts, we have investigated the use of appropriate evolutionary models to address inter-gene heterogeneities and the scalability and validity of supermatrix analysis as the phylogenetic problem becomes more difficult and the number of genes analysed approaches truly phylogenomic dimensions. We have extended a widely-known early phylogenomic study of yeasts by adding additional species to increase diversity and augmenting the number of genes under analysis. We have investigated sophisticated maximum likelihood analyses, considering not only a concatenated version of the data but also partitioned models where each gene constitutes a partition and parameters are free to vary between the different partitions (thereby accounting for variation in the evolutionary processes at different loci). We find considerable increases in likelihood using these complex models, arguing for the need for appropriate models when analyzing phylogenomic data. Using these methods, we were able to reconstruct a well-supported tree for 18 ascomycetous yeasts spanning about 250 million years of evolution.

[1]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[2]  Thomas Uzzell,et al.  Fitting Discrete Probability Distributions to Evolutionary Events , 1971, Science.

[3]  H. Akaike A new look at the statistical model identification , 1974 .

[4]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[5]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[6]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[7]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[8]  J. Oliver,et al.  The general stochastic model of nucleotide substitution. , 1990, Journal of theoretical biology.

[9]  M. Steel,et al.  Recovering evolutionary trees under a more realistic model of sequence evolution. , 1994, Molecular biology and evolution.

[10]  N. Goldman,et al.  Comparison of models for nucleotide substitution used in maximum-likelihood phylogenetic estimation. , 1994, Molecular biology and evolution.

[11]  M. Tuite,et al.  The CUG codon is decoded in vivo as serine and not leucine in Candida albicans. , 1995, Nucleic acids research.

[12]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[13]  D. Weakliem A Critique of the Bayesian Information Criterion for Model Selection , 1999 .

[14]  T. Sugita,et al.  Nonuniversal usage of the leucine CUG codon in yeasts: Investigation of basidiomycetous yeast. , 1999, The Journal of general and applied microbiology.

[15]  Wei Qian,et al.  Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. , 2000, Molecular biology and evolution.

[16]  S. Whelan,et al.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. , 2001, Molecular biology and evolution.

[17]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[18]  Simon Whelan,et al.  A novel use of equilibrium frequencies in models of sequence evolution. , 2002, Molecular biology and evolution.

[19]  H. Philippe,et al.  Heterotachy, an important process of protein evolution. , 2002, Molecular biology and evolution.

[20]  Simon Whelan,et al.  Pandit: a database of protein and associated nucleotide domains with inferred trees , 2003, Bioinform..

[21]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[22]  D. Penny,et al.  Comment on "Hexapod Origins: Monophyletic or Paraphyletic?" , 2003, Science.

[23]  C. Kurtzman,et al.  Phylogenetic relationships among yeasts of the 'Saccharomyces complex' determined from multigene sequence analyses. , 2003, FEMS yeast research.

[24]  Thomas G. Mitchell,et al.  Phylogeny and Evolution of Medical Species of Candida and Related Taxa: a Multigenic Analysis , 2004, Journal of Clinical Microbiology.

[25]  D. Penny,et al.  Genome-scale phylogeny and the detection of systematic biases. , 2004, Molecular biology and evolution.

[26]  Michael P. Cummings,et al.  PAML (Phylogenetic Analysis by Maximum Likelihood) , 2004 .

[27]  David L. Swofford,et al.  Are Guinea Pigs Rodents? The Importance of Adequate Models in Molecular Phylogenetics , 1997, Journal of Mammalian Evolution.

[28]  S. Ho,et al.  Tracing the decay of the historical signal in biological sequence data. , 2004, Systematic biology.

[29]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[30]  David R. Anderson,et al.  Multimodel Inference , 2004 .

[31]  B. Dujon,et al.  Genome evolution in yeasts , 2004, Nature.

[32]  H. Philippe,et al.  A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. , 2004, Molecular biology and evolution.

[33]  D. Posada,et al.  Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests. , 2004, Systematic biology.

[34]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[35]  Hervé Philippe,et al.  An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics. , 2005, Systematic biology.

[36]  H. Kishino,et al.  Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea , 1989, Journal of Molecular Evolution.

[37]  Jason E Stajich,et al.  A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis , 2006, BMC Evolutionary Biology.

[38]  V. Robert,et al.  Cophenetic correlation analysis as a strategy to select phylogenetically informative proteins: an example from the fungal kingdom , 2007, BMC Evolutionary Biology.

[39]  Michael Weiss,et al.  Phylogenomics reveal a robust fungal tree of life. , 2006, FEMS yeast research.

[40]  F. Delsuc,et al.  Phylogenomics: the beginning of incongruence? , 2006, Trends in genetics : TIG.

[41]  Kevin P. Byrne,et al.  Multiple rounds of speciation associated with reciprocal gene loss in polyploid yeasts , 2006, Nature.

[42]  Kenneth H. Wolfe,et al.  Visualizing syntenic relationships among the hemiascomycetes with the Yeast Gene Order Browser , 2005, Nucleic Acids Res..

[43]  Masami Hasegawa,et al.  Rooting the eutherian tree: the power and pitfalls of phylogenomics , 2007, Genome Biology.

[44]  Magnus Rattray,et al.  Comparative genome analysis across a kingdom of eukaryotic organisms: specialization and diversification in the fungi. , 2007, Genome research.

[45]  Avi Pfeffer,et al.  Automatic genome-wide reconstruction of phylogenetic gene trees , 2007, ISMB/ECCB.

[46]  Naiara Rodríguez-Ezpeleta,et al.  Detecting and overcoming systematic errors in genome-scale phylogenies. , 2007, Systematic biology.

[47]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[48]  Simon Whelan,et al.  New approaches to phylogenetic tree search and their application to large numbers of protein alignments. , 2007, Systematic biology.

[49]  O. Gascuel,et al.  An improved general amino acid replacement matrix. , 2008, Molecular biology and evolution.

[50]  Jack Sullivan,et al.  Does choice in model selection affect maximum likelihood analysis? , 2008, Systematic biology.

[51]  Simon Whelan,et al.  Spatial and temporal heterogeneity in nucleotide sequence evolution. , 2008, Molecular biology and evolution.

[52]  Kazutaka Katoh,et al.  Recent developments in the MAFFT multiple sequence alignment program , 2008, Briefings Bioinform..

[53]  O. Gascuel,et al.  Phylogenetic mixture models for proteins , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[54]  V. Moulton,et al.  Exploring contradictory phylogenetic relationships in yeasts. , 2008, FEMS yeast research.

[55]  M. Pagel,et al.  Modelling heterotachy in phylogenetic inference by reversible-jump Markov chain Monte Carlo , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[56]  H. Daniel,et al.  Re-examining the phylogeny of clinically relevant Candida species and allied genera based on multigene analyses. , 2008, FEMS yeast research.

[57]  A. Rokas,et al.  From gene-scale to genome-scale phylogenetics: the data flood in, but the challenges remain. , 2008, Methods in molecular biology.

[58]  David Hewitt,et al.  The Ascomycota tree of life: a phylum-wide phylogeny clarifies the origin and evolution of fundamental reproductive and ecological traits. , 2009, Systematic biology.

[59]  Toni Gabaldón,et al.  The Tree versus the Forest: The Fungal Tree of Life and the Topological Diversity within the Yeast Phylome , 2009, PloS one.

[60]  Noah A Rosenberg,et al.  Gene tree discordance, phylogenetic inference and the multispecies coalescent. , 2009, Trends in ecology & evolution.

[61]  Hiroshi Tanaka,et al.  A likelihood look at the supermatrix-supertree controversy. , 2009, Gene.

[62]  O. Gascuel,et al.  Accounting for solvent accessibility and secondary structure in protein phylogenetics is clearly beneficial. , 2010, Systematic biology.

[63]  B. Hausdorf,et al.  Compositional heterogeneity and phylogenomic inference of metazoan relationships. , 2010, Molecular biology and evolution.

[64]  H. Philippe,et al.  Resolving Difficult Phylogenetic Questions: Why More Sequences Are Not Enough , 2011, PLoS biology.

[65]  A. G. Pedersen,et al.  Computational Molecular Evolution , 2013 .