Why Concatenation Fails Near the Anomaly Zone

Abstract. Genome‐scale sequencing has been of great benefit in recovering species trees but has not provided final answers. Despite the rapid accumulation of molecular sequences, resolving short and deep branches of the tree of life has remained a challenge and has prompted the development of new strategies that can make the best use of available data. One such strategy—the concatenation of gene alignments—can be successful when coupled with many tree estimation methods, but has also been shown to fail when there are high levels of incomplete lineage sorting. Here, we focus on the failure of likelihood‐based methods in retrieving a rooted, asymmetric four‐taxon species tree from concatenated data when the species tree is in or near the anomaly zone—a region of parameter space where the most common gene tree does not match the species tree because of incomplete lineage sorting. First, we use coalescent theory to prove that most informative sites will support the species tree in the anomaly zone, and that as a consequence maximum‐parsimony succeeds in recovering the species tree from concatenated data. We further show that maximum‐likelihood tree estimation from concatenated data fails both inside and outside the anomaly zone, and that this failure cannot be easily predicted from the topology of the most common gene tree. We demonstrate that likelihood‐based methods often fail in a region partially overlapping the anomaly zone, likely because of the lower relative cost of substitutions on discordant gene tree branches that are absent from the species tree. Our results confirm and extend previous reports on the performance of these methods applied to concatenated data from a rooted, asymmetric four‐taxon species tree, and highlight avenues for future work improving the performance of methods aimed at recovering species tree.

[1]  Ziheng Yang,et al.  Challenges in Species Tree Estimation Under the Multispecies Coalescent Model , 2016, Genetics.

[2]  Matthew W. Hahn,et al.  Gene tree discordance causes apparent substitution rate variation , 2015, bioRxiv.

[3]  Claudia R. Solís-Lemus,et al.  Inconsistency of Species Tree Methods under Gene Flow. , 2016, Systematic biology.

[4]  Tandy Warnow,et al.  Evaluating Summary Methods for Multilocus Species Tree Estimation in the Presence of Incomplete Lineage Sorting. , 2016, Systematic biology.

[5]  Vladimir N Minin,et al.  Detecting the Anomaly Zone in Species Trees and Evidence for a Misleading Signal in Higher-Level Skink Phylogeny (Squamata: Scincidae). , 2016, Systematic biology.

[6]  Daniel B. Sloan,et al.  The effects of subsampling gene trees on coalescent methods applied to ancient divergences. , 2016, Molecular phylogenetics and evolution.

[7]  Matthew W. Hahn,et al.  Phylogenomics Reveals Three Sources of Adaptive Variation during a Rapid Radiation , 2016, PLoS biology.

[8]  Matthew W. Hahn,et al.  Irrational exuberance for resolved species trees , 2016, Evolution; international journal of organic evolution.

[9]  James Mallet,et al.  How reticulated are species? , 2015, BioEssays : news and reviews in molecular, cellular and developmental biology.

[10]  Claudia R. Solís-Lemus,et al.  Inferring Phylogenetic Networks with Maximum Pseudolikelihood under Incomplete Lineage Sorting , 2015, PLoS genetics.

[11]  J. DaCosta,et al.  ddRAD-seq phylogenetics based on nucleotide, indel, and presence-absence polymorphisms: Analyses of two avian genera with contrasting histories. , 2016, Molecular phylogenetics and evolution.

[12]  Scott V Edwards,et al.  Implementing and testing the multispecies coalescent model: A valuable paradigm for phylogenomics. , 2016, Molecular phylogenetics and evolution.

[13]  J. Rhodes,et al.  There are no caterpillars in a wicked forest. , 2015, Theoretical population biology.

[14]  H. Ellegren,et al.  The Dynamics of Incomplete Lineage Sorting across the Ancient Adaptive Radiation of Neoavian Birds , 2015, PLoS biology.

[15]  Tandy Warnow,et al.  On the Robustness to Gene Tree Estimation Error (or lack thereof) of Coalescent-Based Species Tree Methods. , 2015, Systematic biology.

[16]  Tandy J. Warnow,et al.  ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes , 2015, Bioinform..

[17]  Nicola De Maio,et al.  PoMo: An Allele Frequency-Based Approach for Species Tree Estimation , 2015, bioRxiv.

[18]  J. Bunge,et al.  Consistency of a phylogenetic tree maximum likelihood estimator , 2015 .

[19]  J. Sites,et al.  Model-based approach to test hard polytomies in the Eulaemus clade of the most diverse South American lizard genus Liolaemus (Liolaemini, Squamata) , 2015 .

[20]  David Stern,et al.  Concatenation and Species Tree Methods Exhibit Statistically Indistinguishable Accuracy under a Range of Simulated Conditions , 2015, PLoS currents.

[21]  M. Steel,et al.  Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent. , 2015, Theoretical population biology.

[22]  Liang Tang,et al.  Multilocus species tree analyses resolve the ancient radiation of the subtribe Zizaniinae (Poaceae). , 2015, Molecular phylogenetics and evolution.

[23]  Adam D. Leaché,et al.  Phylogenomics of Phrynosomatid Lizards: Conflicting Signals from Sequence Capture versus Restriction Site Associated DNA Sequencing , 2015, Genome biology and evolution.

[24]  J. G. Burleigh,et al.  Synthesis of phylogeny and taxonomy into a comprehensive tree of life , 2014, Proceedings of the National Academy of Sciences.

[25]  Laura Kubatko,et al.  Identifiability of the unrooted species tree topology under the coalescent model with time-reversible substitution processes, site-specific rate variation, and invariable sites. , 2014, Journal of theoretical biology.

[26]  Andreas R. Pfenning,et al.  Comparative genomics reveals insights into avian genome evolution and adaptation , 2014, Science.

[27]  John Gatesy,et al.  Phylogenetic analysis at deep timescales: unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum. , 2014, Molecular phylogenetics and evolution.

[28]  Eric S. Lander,et al.  The genomic substrate for adaptive radiation in African cichlid fish , 2014, Nature.

[29]  Mukul S. Bansal,et al.  Most parsimonious reconciliation in the presence of gene duplication, loss, and deep coalescence using labeled coalescent trees , 2014, Genome research.

[30]  Ziheng Yang,et al.  The influence of gene flow on species tree estimation: a simulation study. , 2014, Systematic biology.

[31]  L. Nakhleh,et al.  Computational approaches to species phylogeny inference and gene tree reconciliation. , 2013, Trends in ecology & evolution.

[32]  B. O’Meara Evolutionary Inferences from Phylogenies: A Review of Methods , 2012 .

[33]  Hayley C. Lanier,et al.  Is recombination a problem for species-tree analyses? , 2012, Systematic biology.

[34]  Thomas Mailund,et al.  Incomplete lineage sorting patterns among human, chimpanzee, and orangutan suggest recent orangutan speciation and widespread selection. , 2011, Genome research.

[35]  Colin N. Dewey,et al.  BUCKy: Gene tree/species tree reconciliation with Bayesian concordance analysis , 2010, Bioinform..

[36]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[37]  Scott V Edwards,et al.  A maximum pseudo-likelihood approach for estimating species trees under the coalescent model , 2010, BMC Evolutionary Biology.

[38]  Colin N. Dewey,et al.  Fine-Scale Phylogenetic Discordance across the House Mouse Genome , 2009, PLoS genetics.

[39]  L. Knowles,et al.  What is the danger of the anomaly zone for empirical phylogenetics? , 2009, Systematic biology.

[40]  D. Pearl,et al.  Estimating species phylogenies using coalescence times among sequences. , 2009, Systematic biology.

[41]  S. Edwards,et al.  Phylogenetic analysis in the anomaly zone. , 2009, Systematic biology.

[42]  Noah A Rosenberg,et al.  Gene tree discordance, phylogenetic inference and the multispecies coalescent. , 2009, Trends in ecology & evolution.

[43]  S. Edwards IS A NEW AND GENERAL THEORY OF MOLECULAR SYSTEMATICS EMERGING? , 2009, Evolution; international journal of organic evolution.

[44]  Luay Nakhleh,et al.  PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships , 2008, BMC Bioinformatics.

[45]  Noah A Rosenberg,et al.  Discordance of species trees with their most likely gene trees: the case of five taxa. , 2008, Systematic biology.

[46]  M. Telford Phylogenomics , 2007, Current Biology.

[47]  D. Pearl,et al.  Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. , 2007, Systematic biology.

[48]  L. Kubatko,et al.  Inconsistency of phylogenetic estimates from concatenated data under coalescence. , 2007, Systematic biology.

[49]  B. Larget,et al.  Bayesian estimation of concordance among gene trees. , 2006, Molecular biology and evolution.

[50]  Alan M. Moses,et al.  Widespread Discordance of Gene Trees with Species Tree in Drosophila: Evidence for Incomplete Lineage Sorting , 2006, PLoS genetics.

[51]  N. Rosenberg,et al.  Discordance of Species Trees with Their Most Likely Gene Trees , 2006, PLoS genetics.

[52]  M. Rosenberg,et al.  Multiple sequence alignment accuracy and phylogenetic inference. , 2006, Systematic biology.

[53]  James H. Degnan,et al.  GENE TREE DISTRIBUTIONS UNDER THE COALESCENT PROCESS , 2005, Evolution; international journal of organic evolution.

[54]  Yoshio Tateno,et al.  Accuracy of estimated phylogenetic trees from molecular data , 2005, Journal of Molecular Evolution.

[55]  Bryan Kolaczkowski,et al.  Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous , 2004, Nature.

[56]  David L. Swofford,et al.  Are Guinea Pigs Rodents? The Importance of Adequate Models in Molecular Phylogenetics , 1997, Journal of Mammalian Evolution.

[57]  H. Gee Evolution: Ending incongruence , 2003, Nature.

[58]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[59]  Noah A Rosenberg,et al.  The probability of topological concordance of gene trees and species trees. , 2002, Theoretical population biology.

[60]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[61]  W. Murphy,et al.  Resolution of the Early Placental Mammal Radiation Using Bayesian Phylogenetics , 2001, Science.

[62]  J. S. Rogers,et al.  Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. , 2001, Systematic biology.

[63]  J. Hein,et al.  Consequences of recombination on traditional phylogenetic analysis. , 2000, Genetics.

[64]  D Penny,et al.  Parsimony, likelihood, and the role of models in molecular phylogenetics. , 2000, Molecular biology and evolution.

[65]  D. Soltis,et al.  Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology , 1999, Nature.

[66]  R. Page,et al.  How should species phylogenies be inferred from sequence data? , 1999, Systematic biology.

[67]  W. Maddison Gene Trees in Species Trees , 1997 .

[68]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[69]  J. Bull,et al.  Combining data in phylogenetic analysis. , 1996, Trends in ecology & evolution.

[70]  Roderic D. M. Page,et al.  FORUM ON CONSENSUS, CONFIDENCE, AND "TOTAL EVIDENCE" , 1996 .

[71]  Roderic D. M. Page,et al.  ON CONSENSUS, CONFIDENCE, AND “TOTAL EVIDENCE” , 1996 .

[72]  J. Huelsenbeck Performance of Phylogenetic Methods in Simulation , 1995 .

[73]  P. Lewis,et al.  Success of maximum likelihood phylogeny inference in the four-taxon case. , 1995, Molecular biology and evolution.

[74]  M. Nei,et al.  Relationships between gene trees and species trees. , 1988, Molecular biology and evolution.

[75]  S. Tavaré,et al.  Line-of-descent and genealogical processes, and their applications in population genetics models. , 1984, Theoretical population biology.

[76]  F. Tajima Evolutionary relationship of DNA sequences in finite populations. , 1983, Genetics.

[77]  Richard R. Hudson,et al.  TESTING THE CONSTANT‐RATE NEUTRAL ALLELE MODEL WITH PROTEIN SEQUENCE DATA , 1983, Evolution; international journal of organic evolution.

[78]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[79]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[80]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[81]  L. Pauling,et al.  Molecules as documents of evolutionary history. , 1965, Journal of theoretical biology.