Disentangling Sources of Gene Tree Discordance in Phylogenomic Datasets: Testing Ancient Hybridizations in Amaranthaceae s.l

Gene tree discordance in large genomic datasets can be caused by evolutionary processes such as incomplete lineage sorting and hybridization, as well as model violation, and errors in data processing, orthology inference, and gene tree estimation. Species tree methods that identify and accommodate all sources of conflict are not available, but a combination of multiple approaches can help tease apart alternative sources of conflict. Here, using a phylotranscriptomic analysis in combination with reference genomes, we test a hypothesis of ancient hybridization events within the plant family Amaranthaceae s.l. that was previously supported by morphological, ecological, and Sanger-based molecular data. The dataset included seven genomes and 88 transcriptomes, 17 generated for this study. We examined gene-tree discordance using coalescent-based species trees and network inference, gene tree discordance analyses, site pattern tests of introgression, topology tests, synteny analyses, and simulations. We found that a combination of processes might have generated the high levels of gene tree discordance in the backbone of Amaranthaceae s.l. Furthermore, we found evidence that three consecutive short internal branches produce anomalous trees contributing to the discordance. Overall, our results suggest that Amaranthaceae s.l. might be a product of an ancient and rapid lineage diversification, and remains, and probably will remain, unresolved. This work highlights the potential problems of identifiability associated with the sources of gene tree discordance including, in particular, phylogenetic network methods. Our results also demonstrate the importance of thoroughly testing for multiple sources of conflict in phylogenomic analyses, especially in the context of ancient, rapid radiations. We provide several recommendations for exploring conflicting signals in such situations.

[1]  A. Moussalli,et al.  Phylogenomics Uncovers Confidence and Conflict in the Rapid Radiation of Australo-Papuan Rodents. , 2020, Systematic biology.

[2]  A. Stamatakis,et al.  Quartet-based computations of internode certainty provide robust measures of phylogenetic incongruence. , 2020, Systematic biology.

[3]  C. Kidner,et al.  Large‐scale genomic sequence data resolve the deepest divergences in the legume phylogeny and support a near‐simultaneous evolutionary origin of all six subfamilies , 2019, The New phytologist.

[4]  C. Ané,et al.  Phylogenetic trees and networks can serve as powerful and complementary approaches for analysis of genomic data. , 2019, Systematic biology.

[5]  Robert K. Jansen,et al.  Incongruence between gene trees and species trees and phylogenetic signal variation in plastid genes. , 2019, Molecular phylogenetics and evolution.

[6]  Huajun Wang,et al.  Halophyte Halogeton glomeratus, a promising candidate for phytoremediation of heavy metal-contaminated saline soils , 2019, Plant and Soil.

[7]  Xinhao Liu,et al.  A divide-and-conquer method for scalable phylogenetic network inference from multilocus data , 2019, bioRxiv.

[8]  R. Lücking,et al.  Multiple historical processes obscure phylogenetic relationships in a taxonomically difficult group (Lobariaceae, Ascomycota) , 2019, Scientific Reports.

[9]  A. Lemmon,et al.  Uncovering the genomic signature of ancient introgression between white oak lineages (Quercus). , 2019, The New phytologist.

[10]  Stephen A. Smith,et al.  Plastid phylogenomic insights into the evolution of Caryophyllales. , 2019, Molecular phylogenetics and evolution.

[11]  T. Zhao,et al.  Network-based microsynteny analysis identifies major differences and genomic outliers in mammalian and angiosperm genomes , 2019, Proceedings of the National Academy of Sciences.

[12]  Gregory W. Stull,et al.  Characterizing gene tree conflict in plastome-inferred phylogenies , 2019, bioRxiv.

[13]  B. Faircloth,et al.  Resolving Deep Nodes in an Ancient Radiation of Neotropical Fishes in the Presence of Conflicting Signals from Incomplete Lineage Sorting. , 2018, Systematic biology.

[14]  Julie M. Allen,et al.  Impacts of Inference Method and Data set Filtering on Phylogenomic Resolution in a Rapid Radiation of Ground Squirrels (Xerinae: Marmotini) , 2018, Systematic biology.

[15]  Huw A. Ogilvie,et al.  Advances in Computational Methods for Phylogenetic Networks in the Presence of Hybridization , 2018, Bioinformatics and Phylogenetics.

[16]  L. Rieseberg,et al.  An evaluation of alternative explanations for widespread cytonuclear discordance in annual sunflowers (Helianthus). , 2018, The New phytologist.

[17]  Zhi-Xin Zhu,et al.  Plastid phylogenomic insights into the evolution of the Caprifoliaceae s.l.(Dipsacales). , 2019, Molecular phylogenetics and evolution.

[18]  Chang-Hui Shen The Genome , 2019, Diagnostic Molecular Biology.

[19]  Bui Quang Minh,et al.  New Methods to Calculate Concordance Factors for Phylogenomic Datasets , 2018, bioRxiv.

[20]  F. Perfectti,et al.  Comparative assessment shows the reliability of chloroplast genome assembly using RNA-seq , 2018, Scientific Reports.

[21]  Kevin J. Liu,et al.  FastNet: Fast and Accurate Statistical Inference of Phylogenetic Networks Using Large-Scale Genomic Sequence Data , 2018, RECOMB-CG.

[22]  Diego F. Morales-Briones,et al.  Performance of gene expression analyses using de novo assembled transcripts in polyploid species , 2018, bioRxiv.

[23]  S. Demissew,et al.  Evolutionary diversification of the African achyranthoid clade (Amaranthaceae) in the context of sterile flower evolution and epizoochory , 2018, Annals of botany.

[24]  Luay Nakhleh,et al.  DGEN: A Test Statistic for Detection of General Introgression Scenarios , 2018, bioRxiv.

[25]  Diego F. Morales-Briones,et al.  Phylogenomic analyses reveal a deep history of hybridization and polyploidy in the Neotropical genus Lachemilla (Rosaceae). , 2018, The New phytologist.

[26]  Chao Zhang,et al.  ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees , 2018, BMC Bioinformatics.

[27]  Diego F. Morales-Briones,et al.  Phylogeny and Evolution of the Neotropical Radiation of Lachemilla (Rosaceae): Uncovering a History of Reticulate Evolution and Implications for Infrageneric Classification , 2018, Systematic Botany.

[28]  Cody E. Hinchliff,et al.  Quartet Sampling distinguishes lack of support from conflicting support in the green plant tree of life. , 2018, American journal of botany.

[29]  Ning Wang,et al.  From cacti to carnivores: Improved phylotranscriptomic sampling and hierarchical homology inference provide further insight into the evolution of Caryophyllales. , 2018, American journal of botany.

[30]  Stephen A. Smith,et al.  A matter of phylogenetic scale: Distinguishing incomplete lineage sorting from lateral gene transfer as the cause of gene tree discord in recent versus deep diversification histories. , 2018, American journal of botany.

[31]  Timothy B Sackton,et al.  Whole-Genome Analyses Resolve the Phylogeny of Flightless Birds (Palaeognathae) in the Presence of an Empirical Anomaly Zone , 2018, bioRxiv.

[32]  Siavash Mirarab,et al.  Testing for Polytomies in Phylogenetic Species Trees Using Quartet Frequencies , 2017, Genes.

[33]  Robert K. Jansen,et al.  Plastome Phylogenetics: 30 Years of Inferences Into Plant Evolution , 2018 .

[34]  Yichen Zheng,et al.  Gene flow analysis method, the D-statistic, is robust in a wide parameter space , 2018, BMC Bioinformatics.

[35]  Luay Nakhleh,et al.  Inferring Phylogenetic Networks Using PhyloNet , 2017, bioRxiv.

[36]  G. Kadereit,et al.  Evolutionary ecology of fast seed germination—A case study in Amaranthaceae/Chenopodiaceae , 2017 .

[37]  M. Sanderson,et al.  Extensive gene tree discordance and hemiplasy shaped the genomes of North American columnar cacti , 2017, Proceedings of the National Academy of Sciences.

[38]  T. Ramaraj,et al.  Single-molecule sequencing and Hi-C-based proximity-guided assembly of amaranth (Amaranthus hypochondriacus) chromosomes provide insights into genome evolution , 2017, BMC Biology.

[39]  L. Aagesen,et al.  Macroclimatic niche limits and the evolution of C4 photosynthesis in Gomphrenoideae (Amaranthaceae) , 2017 .

[40]  A. von Haeseler,et al.  UFBoot2: Improving the Ultrafast Bootstrap Approximation , 2017, bioRxiv.

[41]  Oscar M. Vargas,et al.  Conflicting phylogenomic signals reveal a pattern of reticulate evolution in a recent high-Andean diversification (Asteraceae: Astereae: Diplostephium). , 2017, The New phytologist.

[42]  Z. Fei,et al.  Draft genome of spinach and transcriptome diversity of 120 Spinacia accessions , 2017, Nature Communications.

[43]  Thomas K. F. Wong,et al.  ModelFinder: Fast Model Selection for Accurate Phylogenetic Estimates , 2017, Nature Methods.

[44]  Stephen A. Smith,et al.  An efficient field and laboratory workflow for plant phylotranscriptomic projects1 , 2017, Applications in Plant Sciences.

[45]  Olga K. Kamneva,et al.  Simulation-Based Evaluation of Hybridization Network Reconstruction Methods in the Presence of Incomplete Lineage Sorting , 2017, Evolutionary bioinformatics online.

[46]  Joseph W. Brown,et al.  Phyx: phylogenetic tools for unix , 2017, Bioinform..

[47]  G. Kadereit,et al.  Phylogeny, biogeography, systematics and taxonomy of Salicornioideae (Amaranthaceae/Chenopodiaceae) – A cosmopolitan, highly specialized hygrohalophyte lineage dating back to the Oligocene , 2017 .

[48]  J. Mandel,et al.  Ancestral Gene Flow and Parallel Organellar Genome Capture Result in Extreme Phylogenomic Discord in a Lineage of Angiosperms , 2016, Systematic biology.

[49]  Michael Matschiner,et al.  Disentangling Incomplete Lineage Sorting and Introgression to Refine Species‐Tree Estimates for Lake Tanganyika Cichlid Fishes , 2016, Systematic biology.

[50]  Ute Roessner,et al.  The genome of Chenopodium quinoa , 2017, Nature.

[51]  Jeremy M. Brown,et al.  TreeScaper: Visualizing and Extracting Phylogenetic Signal from Sets of Trees. , 2016, Molecular biology and evolution.

[52]  Ziheng Yang,et al.  Challenges in Species Tree Estimation Under the Multispecies Coalescent Model , 2016, Genetics.

[53]  Kevin J. Liu,et al.  A scalability study of phylogenetic network inference methods using empirical datasets and simulations involving a single reticulation , 2016, BMC Bioinformatics.

[54]  En-Hua Xia,et al.  Full transcription of the chloroplast genome in photosynthetic eukaryotes , 2016, Scientific Reports.

[55]  Luay Nakhleh,et al.  Reticulate evolutionary history and extensive introgression in mosquito species revealed by phylogenetic network analysis , 2016, Molecular ecology.

[56]  Tandy Warnow,et al.  Evaluating Summary Methods for Multilocus Species Tree Estimation in the Presence of Incomplete Lineage Sorting. , 2016, Systematic biology.

[57]  Vladimir N Minin,et al.  Detecting the Anomaly Zone in Species Trees and Evidence for a Misleading Signal in Higher-Level Skink Phylogeny (Squamata: Scincidae). , 2016, Systematic biology.

[58]  Siavash Mirarab,et al.  Fast Coalescent-Based Computation of Local Branch Support from Quartet Frequencies , 2016, Molecular biology and evolution.

[59]  Claudia R. Solís-Lemus,et al.  Inferring Phylogenetic Networks with Maximum Pseudolikelihood under Incomplete Lineage Sorting , 2015, PLoS genetics.

[60]  Scott V Edwards,et al.  Implementing and testing the multispecies coalescent model: A valuable paradigm for phylogenomics. , 2016, Molecular phylogenetics and evolution.

[61]  L. Bromham,et al.  The Phylogenetic Association Between Salt Tolerance and Heavy Metal Hyperaccumulation in Angiosperms , 2016, Evolutionary Biology.

[62]  L. Kubatko,et al.  An invariants-based method for efficient identification of hybrid species from large-scale genomic data , 2015, BMC Evolutionary Biology.

[63]  Yun Yu,et al.  A maximum pseudo-likelihood approach for phylogenetic networks , 2015, BMC Genomics.

[64]  Walter G. Berendsohn,et al.  A taxonomic backbone for the global synthesis of species diversity in the angiosperm order Caryophyllales , 2015 .

[65]  Stephen A. Smith,et al.  Analysis of phylogenomic datasets reveals conflict, concordance, and gene duplications with examples from animals and plants , 2015, BMC Evolutionary Biology.

[66]  T. Flowers,et al.  Plant salt tolerance: adaptations in halophytes. , 2015, Annals of botany.

[67]  Xiaofang Jiang,et al.  Extensive introgression in a malaria vector species complex revealed by phylogenomics , 2015, Science.

[68]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[69]  Remco R. Bouckaert,et al.  DensiTree 2: Seeing Trees Through the Forest , 2014, bioRxiv.

[70]  Laura Salter Kubatko,et al.  Quartet Inference from SNP Data Under the Coalescent Model , 2014, Bioinform..

[71]  Kevin J. Liu,et al.  Maximum likelihood inference of reticulate evolutionary histories , 2014, Proceedings of the National Academy of Sciences.

[72]  Yang Liu,et al.  Mitochondrial phylogenomics of early land plants: mitigating the effects of saturation, compositional heterogeneity, and codon-usage bias. , 2014, Systematic biology.

[73]  E. Cooper Overly simplistic substitution models obscure green plant phylogeny. , 2014, Trends in plant science.

[74]  Stephen A. Smith,et al.  Orthology Inference in Nonmodel Organisms Using Transcriptomes and Low-Coverage Genomes: Improving Accuracy and Matrix Occupancy for Phylogenomics , 2014, Molecular biology and evolution.

[75]  A. Oshlack,et al.  Corset: enabling differential gene expression analysis for de novo assembled transcriptomes , 2014, Genome Biology.

[76]  Alexandros Stamatakis,et al.  Novel information theory-based measures for quantifying incongruence among phylogenetic trees. , 2014, Molecular biology and evolution.

[77]  T. Embley,et al.  Conflicting Phylogenies for Early Land Plants are Caused by Composition Biases among Synonymous Substitutions , 2014, Systematic biology.

[78]  Alexander Goesmann,et al.  The genome of the recently domesticated crop plant sugar beet (Beta vulgaris) , 2013, Nature.

[79]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[80]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[81]  Md. Shamsuzzoha Bayzid,et al.  RH : Evaluating species tree methods for ILS Evaluating summary methods for multi-locus species tree estimation in the presence of incomplete lineage sorting , 2014 .

[82]  N. Rosenberg Discordance of Species Trees with Their Most Likely Gene Trees: A Unifying Principle , 2013, Molecular biology and evolution.

[83]  D. Smith RNA-Seq data: a goldmine for organelle research. , 2013, Briefings in functional genomics.

[84]  G. Kadereit,et al.  Phylogeny of Polycnemoideae (Amaranthaceae): Implications for biogeography, character evolution and taxonomy , 2013 .

[85]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[86]  Brian C. O'Meara,et al.  treePL: divergence time estimation using penalized likelihood for large phylogenies , 2012, Bioinform..

[87]  A. Weber,et al.  RNA-Seq Assembly – Are We There Yet? , 2012, Front. Plant Sci..

[88]  D. Ackerly,et al.  A broader model for C4 photosynthesis evolution in plants inferred from the goosefoot family (Chenopodiaceae s.s.) , 2012, Proceedings of the Royal Society B: Biological Sciences.

[89]  R. Lanfear,et al.  Partitionfinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. , 2012, Molecular biology and evolution.

[90]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[91]  Shane S. Sturrock,et al.  Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data , 2012, Bioinform..

[92]  Luay Nakhleh,et al.  The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection , 2012, PLoS genetics.

[93]  Sergei L. Kosakovsky Pond,et al.  Statistics and truth in phylogenomics. , 2012, Molecular biology and evolution.

[94]  David Reich,et al.  Testing for ancient admixture between closely related populations. , 2011, Molecular biology and evolution.

[95]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[96]  N. Friedman,et al.  Trinity : reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2016 .

[97]  Klaus Peter Schliep,et al.  phangorn: phylogenetic analysis in R , 2010, Bioinform..

[98]  Philip L. F. Johnson,et al.  A Draft Sequence of the Neandertal Genome , 2010, Science.

[99]  Liang Liu,et al.  Phybase: an R package for species tree analysis , 2010, Bioinform..

[100]  L. Knowles,et al.  What is the danger of the anomaly zone for empirical phylogenetics? , 2009, Systematic biology.

[101]  Noah A Rosenberg,et al.  Gene tree discordance, phylogenetic inference and the multispecies coalescent. , 2009, Trends in ecology & evolution.

[102]  Jeffrey P. Mower The PREP suite: predictive RNA editors for plant mitochondrial genes, chloroplast genes and user-defined alignments , 2009, Nucleic Acids Res..

[103]  S. Edwards IS A NEW AND GENERAL THEORY OF MOLECULAR SYSTEMATICS EMERGING? , 2009, Evolution; international journal of organic evolution.

[104]  N. Galtier,et al.  Dealing with incongruence in phylogenomic analyses , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[105]  Luay Nakhleh,et al.  PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships , 2008, BMC Bioinformatics.

[106]  J. Doyle,et al.  The reticulate history of Medicago (Fabaceae). , 2008, Systematic biology.

[107]  C. Simon,et al.  Differentiating between hypotheses of lineage sorting and introgression in New Zealand alpine cicadas (Maoricicada Dugdale). , 2006, Systematic biology.

[108]  N. Rosenberg,et al.  Discordance of Species Trees with Their Most Likely Gene Trees , 2006, PLoS genetics.

[109]  D. Bryant,et al.  A Simple and Robust Statistical Test for Detecting the Presence of Recombination , 2006, Genetics.

[110]  J. Kadereit,et al.  A synopsis of Chenopodiaceae subfam. Betoideae and notes on the taxonomy of Beta , 2006 .

[111]  J. Kadereit,et al.  Understanding Mediterranean-Californian disjunctions: molecular evidence from Chenopodiaceae-Betoideae , 2006 .

[112]  K. Müller,et al.  Phylogenetics of amaranthaceae based on matK/trnK sequence data-evidence from parsimony, likelihood, and bayesian analyses , 2005 .

[113]  Peter G Foster,et al.  Modeling compositional heterogeneity. , 2004, Systematic biology.

[114]  J. Lundberg,et al.  An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants : APG II THE ANGIOSPERM PHYLOGENY GROUP * , 2003 .

[115]  T. Borsch,et al.  Phylogeny of Amaranthaceae and Chenopodiaceae and the Evolution of C4 Photosynthesis , 2003, International Journal of Plant Sciences.

[116]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[117]  A. Konstantinos,et al.  A Comparative Assessment , 2003 .

[118]  Hidetoshi Shimodaira An approximately unbiased test of phylogenetic tree selection. , 2002, Systematic biology.

[119]  J. Maniloff,et al.  Phylogeny and Evolution , 2002 .

[120]  Masami Hasegawa,et al.  CONSEL: for assessing the confidence of phylogenetic tree selection , 2001, Bioinform..

[121]  M. Holder,et al.  Difficulties in detecting hybridization. , 2001, Systematic biology.

[122]  Bin Ma,et al.  From Gene Trees to Species Trees , 2000, SIAM J. Comput..

[123]  M. P. Cummings,et al.  PAUP* Phylogenetic analysis using parsimony (*and other methods) Version 4 , 2000 .

[124]  D. Soltis,et al.  DISCORDANCE BETWEEN NUCLEAR AND CHLOROPLAST PHYLOGENIES IN THE HEUCHERA GROUP (SAXIFRAGACEAE) , 1995, Evolution; international journal of organic evolution.

[125]  T. Sang,et al.  Documentation of reticulate evolution in peonies (Paeonia) using internal transcribed spacer sequences of nuclear ribosomal DNA: implications for biogeography and concerted evolution. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[126]  J. Doyle,et al.  Gene Trees and Species Trees: Molecular Systematics as One-Character Taxonomy , 1992 .

[127]  L. Rieseberg,et al.  Phylogenetic consequences of cytoplasmic gene flow in plants. , 1991 .

[128]  M. Nei,et al.  Relationships between gene trees and species trees. , 1988, Molecular biology and evolution.

[129]  Wen-Hsiung Li,et al.  An evolutionary perspective on synonymous codon usage in unicellular organisms , 1986, Journal of Molecular Evolution.

[130]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[131]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[132]  N. Sugiura Further analysts of the data by akaike' s information criterion and the finite corrections , 1978 .

[133]  S. Srivastava Assorted angiosperm pollen from the Edmonton Formation (Maestrichtian), Alberta, Canada , 1969 .