Coalescent versus concatenation methods and the placement of Amborella as sister to water lilies.

The molecular era has fundamentally reshaped our knowledge of the evolution and diversification of angiosperms. One outstanding question is the phylogenetic placement of Amborella trichopoda Baill., commonly thought to represent the first lineage of extant angiosperms. Here, we leverage publicly available data and provide a broad coalescent-based species tree estimation of 45 seed plants. By incorporating 310 nuclear genes, our coalescent analyses strongly support a clade containing Amborella plus water lilies (i.e., Nymphaeales) that is sister to all other angiosperms across different nucleotide rate partitions. Our results also show that commonly applied concatenation methods produce strongly supported, but incongruent placements of Amborella: slow-evolving nucleotide sites corroborate results from coalescent analyses, whereas fast-evolving sites place Amborella alone as the first lineage of extant angiosperms. We further explored the performance of coalescent versus concatenation methods using nucleotide sequences simulated on (i) the two alternate placements of Amborella with branch lengths and substitution model parameters estimated from each of the 310 nuclear genes and (ii) three hypothetical species trees that are topologically identical except with respect to the degree of deep coalescence and branch lengths. Our results collectively suggest that the Amborella alone placement inferred using concatenation methods is likely misled by fast-evolving sites. This appears to be exacerbated by the combination of long branches in stem group angiosperms, Amborella, and Nymphaeales with the short internal branch separating Amborella and Nymphaeales. In contrast, coalescent methods appear to be more robust to elevated substitution rates.

[1]  V. Goremykin,et al.  Automated Removal of Noisy Data in Phylogenomic Analyses , 2010, Journal of Molecular Evolution.

[2]  J. Palmer,et al.  Long branch attraction, taxon sampling, and the earliest angiosperms: Amborella or monocots? , 2004, BMC Evolutionary Biology.

[3]  J. G. Burleigh,et al.  Phylogenetic analysis of 83 plastid genes further resolves the early diversification of eudicots , 2010, Proceedings of the National Academy of Sciences.

[4]  Faisal Ababneh,et al.  The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. , 2004, Systematic biology.

[5]  Andrew J. Heidel,et al.  Origin of land plants: Do conjugating green algae hold the key? , 2011, BMC Evolutionary Biology.

[6]  D H Campbell,et al.  THE ORIGIN OF LAND PLANTS. , 1930, Science.

[7]  Mark W. Chase,et al.  The earliest angiosperms: evidence from mitochondrial, plastid and nuclear genomes , 1999, Nature.

[8]  L. Kubatko,et al.  Inconsistency of phylogenetic estimates from concatenated data under coalescence. , 2007, Systematic biology.

[9]  W. Friedman,et al.  Reconstructing the ancestral female gametophyte of angiosperms: Insights from Amborella and other ancient lineages of flowering plants. , 2009, American journal of botany.

[10]  G. Ortí,et al.  Addressing gene tree discordance and non-stationarity to resolve a multi-locus phylogeny of the flatfishes (Teleostei: Pleuronectiformes). , 2013, Systematic biology.

[11]  S. Edwards,et al.  Phylogenetic analysis in the anomaly zone. , 2009, Systematic biology.

[12]  Yufeng Wu,et al.  COALESCENT‐BASED SPECIES TREE INFERENCE FROM GENE TREE TOPOLOGIES UNDER INCOMPLETE LINEAGE SORTING BY MAXIMUM LIKELIHOOD , 2012, Evolution; international journal of organic evolution.

[13]  Noah A Rosenberg,et al.  Discordance of species trees with their most likely gene trees: the case of five taxa. , 2008, Systematic biology.

[14]  R. Viola,et al.  Removal of Noisy Characters from Chloroplast Genome-Scale Data Suggests Revision of Phylogenetic Placements of Amborella and Ceratophyllum , 2009, Journal of Molecular Evolution.

[15]  D. Pearl,et al.  Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. , 2007, Systematic biology.

[16]  M. Salemi,et al.  The phylogenetic handbook : a practical approach to DNA and protein phylogeny , 2003 .

[17]  J. Doyle,et al.  Integrating Early Cretaceous Fossils into the Phylogeny of Living Angiosperms: ANITA Lines and Relatives of Chloranthaceae , 2014, International Journal of Plant Sciences.

[18]  R. Gutell,et al.  Phylogenetic Analyses of Basal Angiosperms Based on Nine Plastid, Mitochondrial, and Nuclear Genes , 2005, International Journal of Plant Sciences.

[19]  Henry D. Priest,et al.  The genome of woodland strawberry (Fragaria vesca) , 2011, Nature Genetics.

[20]  H. Philippe,et al.  Archaea sister group of Bacteria? Indications from tree reconstruction artifacts in ancient phylogenies. , 1999, Molecular biology and evolution.

[21]  Toni Gabaldón,et al.  trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses , 2009, Bioinform..

[22]  Matthew A. Gitzendanner,et al.  Phylogenetic Analysis of the Plastid Inverted Repeat for 244 Species: Insights into Deeper-Level Angiosperm Relationships from a Long, Slowly Evolving Sequence Region , 2011, International Journal of Plant Sciences.

[23]  J. Lundberg,et al.  An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants : APG II THE ANGIOSPERM PHYLOGENY GROUP * , 2003 .

[24]  Tae-Kun Seo Calculating bootstrap probabilities of phylogeny using multilocus sequence data. , 2008, Molecular biology and evolution.

[25]  Carla A. Cummins,et al.  A method for inferring the rate of evolution of homologous characters that can potentially improve phylogenetic inference, resolve deep divergence and correct systematic biases. , 2011, Systematic biology.

[26]  David Q. Matus,et al.  Broad phylogenomic sampling improves resolution of the animal tree of life , 2008, Nature.

[27]  James Leebens-Mack,et al.  Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns , 2007, Proceedings of the National Academy of Sciences.

[28]  David C. Tank,et al.  An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: , 2009 .

[29]  Douglas E. Soltis,et al.  A 567‐Taxon Data Set for Angiosperms: The Challenges Posed by Bayesian Analyses of Large Data Sets , 2007, International Journal of Plant Sciences.

[30]  R. Shoemaker,et al.  Placing paleopolyploidy in relation to taxon divergence: a phylogenetic analysis in legumes using 39 gene families. , 2005, Systematic biology.

[31]  M. Martindale,et al.  Assessing the root of bilaterian animals with scalable phylogenomic methods , 2009, Proceedings of the Royal Society B: Biological Sciences.

[32]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[33]  W. Doolittle,et al.  Microsporidia are related to Fungi: evidence from the largest subunit of RNA polymerase II and other proteins. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[34]  P. Lockhart,et al.  Deciphering ancient rapid radiations. , 2007, Trends in ecology & evolution.

[35]  Sen Song,et al.  Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model , 2012, Proceedings of the National Academy of Sciences.

[36]  Mark W. Chase,et al.  Evolution of the angiosperms: calibrating the family tree , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[37]  Daniel Stubbs,et al.  PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. , 2013, Systematic biology.

[38]  D. Soltis,et al.  Amborella not a "basal angiosperm"? Not so fast. , 2004, American journal of botany.

[39]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[40]  R. A. Atherton,et al.  The evolutionary root of flowering plants. , 2013, Systematic biology.

[41]  Mark L. Blaxter,et al.  prot4EST: Translating Expressed Sequence Tags from neglected genomes , 2004, BMC Bioinformatics.

[42]  H. Philippe,et al.  Difficult phylogenetic questions: more data, maybe; better methods, certainly , 2011, BMC Biology.

[43]  Laura Salter Kubatko,et al.  STEM: species tree estimation using maximum likelihood for gene trees under coalescence , 2009, Bioinform..

[44]  N. Rosenberg,et al.  Discordance of Species Trees with Their Most Likely Gene Trees , 2006, PLoS genetics.

[45]  C. Davis,et al.  Phylogenomics and Coalescent Analyses Resolve Extant Seed Plant Relationships , 2013, PloS one.

[46]  Davide Pisani,et al.  Identifying and removing fast-evolving sites using compatibility analysis: an example from the Arthropoda. , 2004, Systematic biology.

[47]  Y. Qiu,et al.  Angiosperm phylogeny inferred from sequences of four mitochondrial genes , 2010 .

[48]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[49]  Zhen Yan,et al.  Origin of land plants using the multispecies coalescent model. , 2013, Trends in plant science.

[50]  James Lyons-Weiler,et al.  Independent and combined analyses of sequences from all three genomic compartments converge on the root of flowering plant phylogeny. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[51]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[52]  G. Olsen,et al.  Earliest phylogenetic branchings: comparing rRNA-based evolutionary trees inferred with various techniques. , 1987, Cold Spring Harbor symposia on quantitative biology.

[53]  Scott V Edwards,et al.  A maximum pseudo-likelihood approach for estimating species trees under the coalescent model , 2010, BMC Evolutionary Biology.

[54]  Amborella Genome The Amborella Genome and the Evolution of Flowering Plants , 2013, Science.

[55]  P. Lewis,et al.  Effects of nucleotide composition bias on the success of the parsimony criterion in phylogenetic inference. , 2001, Molecular biology and evolution.

[56]  S. Jeffery Evolution of Protein Molecules , 1979 .

[57]  Claude W. dePamphilis,et al.  Ancestral polyploidy in seed plants and angiosperms , 2011, Nature.

[58]  J. G. Burleigh,et al.  Phylogenetic signal in nucleotide data from seed plants: implications for resolving the seed plant tree of life. , 2004, American journal of botany.

[59]  D. Soltis,et al.  Origin and Early Evolution of Angiosperms , 2008, Annals of the New York Academy of Sciences.

[60]  E. Canuel,et al.  Molecular and Fossil Evidence on the Origin of Angiosperms , 2012 .

[61]  Liang Liu,et al.  Phybase: an R package for species tree analysis , 2010, Bioinform..

[62]  Qingpo Liu,et al.  Comparative studies on codon usage pattern of chloroplasts and their host nuclear genes in four plant species , 2005, Journal of Genetics.

[63]  S. Magallón,et al.  Angiosperm diversification through time. , 2009, American journal of botany.

[64]  D. Soltis,et al.  Rosid radiation and the rapid rise of angiosperm-dominated forests , 2009, Proceedings of the National Academy of Sciences.

[65]  Alexandros Stamatakis,et al.  A Functional Phylogenomic View of the Seed Plants , 2011, PLoS genetics.

[66]  Peter G Foster,et al.  Modeling compositional heterogeneity. , 2004, Systematic biology.

[67]  M. Gouy,et al.  Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. , 1998, Molecular biology and evolution.

[68]  Peer Bork,et al.  PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments , 2006, Nucleic Acids Res..

[69]  M. Donoghue,et al.  The root of the angiosperms revisited , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[70]  Hong Ma,et al.  Highly conserved low-copy nuclear genes as effective markers for phylogenetic analyses in angiosperms. , 2012, The New phytologist.

[71]  X. Xia,et al.  DAMBE: software package for data analysis in molecular biology and evolution. , 2001, The Journal of heredity.

[72]  Matthew A. Gitzendanner,et al.  Another look at the root of the angiosperms reveals a familiar tale. , 2014, Systematic biology.

[73]  Bastien Boussau,et al.  Efficient likelihood computations with nonreversible models of evolution. , 2006, Systematic biology.

[74]  Yasuko Takahashi,et al.  Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events , 2022 .

[75]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[76]  Nathan C. Sheffield,et al.  Nonstationary evolution and compositional heterogeneity in beetle mitochondrial phylogenomics. , 2009, Systematic biology.

[77]  D. Pearl,et al.  Estimating species phylogenies using coalescence times among sequences. , 2009, Systematic biology.

[78]  M. Donoghue,et al.  An uncorrelated relaxed-clock analysis suggests an earlier origin for flowering plants , 2010, Proceedings of the National Academy of Sciences.

[79]  A. Drummond,et al.  Bayesian Inference of Species Trees from Multilocus Data , 2009, Molecular biology and evolution.

[80]  Patrick J. Biggs,et al.  Systematic Error in Seed Plant Phylogenomics , 2011, Genome biology and evolution.

[81]  M. Donoghue,et al.  The root of angiosperm phylogeny inferred from duplicate phytochrome genes. , 1999, Science.

[82]  Matthew D. Wilkerson,et al.  PlantGDB: a resource for comparative plant genomics , 2007, Nucleic Acids Res..

[83]  W. Friedman Embryological evidence for developmental lability during early angiosperm evolution , 2006, Nature.

[84]  C. Delwiche,et al.  Multigene Phylogeny of the Green Lineage Reveals the Origin and Diversification of Land Plants , 2010, Current Biology.

[85]  D. E. Soltis,et al.  Angiosperm phylogeny: 17 genes, 640 taxa. , 2011, American journal of botany.

[86]  Hervé Philippe,et al.  An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics. , 2005, Systematic biology.

[87]  S. Cannon,et al.  An analysis of synteny of Arachis with Lotus and Medicago sheds new light on the structure, stability and evolution of legume genomes , 2009, BMC Genomics.

[88]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[89]  Yeting Zhang,et al.  A genome triplication associated with early diversification of the core eudicots , 2012, Genome Biology.

[90]  Hidetoshi Shimodaira Testing Regions with Nonsmooth Boundaries via Multiscale Bootstrap , 2008 .

[91]  H. Philippe,et al.  A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. , 2004, Molecular biology and evolution.

[92]  H. Philippe,et al.  Ancient phylogenetic relationships. , 2002, Theoretical population biology.

[93]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[94]  B. Hausdorf,et al.  Compositional heterogeneity and phylogenomic inference of metazoan relationships. , 2010, Molecular biology and evolution.

[95]  H. Philippe,et al.  Resolving Difficult Phylogenetic Questions: Why More Sequences Are Not Enough , 2011, PLoS biology.

[96]  L. Knowles,et al.  What is the danger of the anomaly zone for empirical phylogenetics? , 2009, Systematic biology.

[97]  J. Palmer,et al.  Multigene analyses identify the three earliest lineages of extant flowering plants , 1999, Current Biology.

[98]  D. Soltis,et al.  Phylogeny of Basal Angiosperms: Analyses of Five Genes from Three Genomes1 , 2000, International Journal of Plant Sciences.

[99]  Haibao Tang,et al.  Angiosperm genome comparisons reveal early polyploidy in the monocot lineage , 2009, Proceedings of the National Academy of Sciences.

[100]  Scott V Edwards,et al.  Coalescent methods for estimating phylogenetic trees. , 2009, Molecular phylogenetics and evolution.

[101]  Yong Wang,et al.  An index of substitution saturation and its application. , 2003, Molecular phylogenetics and evolution.

[102]  D. Soltis,et al.  T HE AGE AND DIVERSIFICATION OF THE ANGIOSPERMS RE - REVISITED 1 , 2010 .

[103]  Jim Leebens-Mack,et al.  Identifying the basal angiosperm node in chloroplast genome phylogenies: sampling one's way out of the Felsenstein zone. , 2005, Molecular biology and evolution.

[104]  J. William,et al.  Combining data in phylogenetic analysis. , 1996, Trends in ecology & evolution.

[105]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[106]  Pamela S Soltis,et al.  Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms , 2007, Proceedings of the National Academy of Sciences.

[107]  宁北芳,et al.  疟原虫var基因转换速率变化导致抗原变异[英]/Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A , 2005 .

[108]  W. Kress,et al.  Angiosperm phylogeny inferred from 18S rDNA, rbcL, and atpB sequences , 2000 .

[109]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[110]  Hervé Philippe,et al.  Early–branching or fast–evolving eukaryotes? An answer based on slowly evolving positions , 2000, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[111]  Casey W. Dunn,et al.  Phyutility: a phyloinformatics tool for trees, alignments and molecular data , 2008, Bioinform..

[112]  D. Soltis,et al.  Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology , 1999, Nature.

[113]  Hidetoshi Shimodaira An approximately unbiased test of phylogenetic tree selection. , 2002, Systematic biology.