One thousand plant transcriptomes and the phylogenomics of green plants

Green plants (Viridiplantae) include around 450,000–500,000 species1,2 of great diversity and have important roles in terrestrial and aquatic ecosystems. Here, as part of the One Thousand Plant Transcriptomes Initiative, we sequenced the vegetative transcriptomes of 1,124 species that span the diversity of plants in a broad sense (Archaeplastida), including green plants (Viridiplantae), glaucophytes (Glaucophyta) and red algae (Rhodophyta). Our analysis provides a robust phylogenomic framework for examining the evolution of green plants. Most inferred species relationships are well supported across multiple species tree and supermatrix analyses, but discordance among plastid and nuclear gene trees at a few important nodes highlights the complexity of plant genome evolution, including polyploidy, periods of rapid speciation, and extinction. Incomplete sorting of ancestral variation, polyploidization and massive expansions of gene families punctuate the evolutionary history of green plants. Notably, we find that large expansions of gene families preceded the origins of green plants, land plants and vascular plants, whereas whole-genome duplications are inferred to have occurred repeatedly throughout the evolution of flowering plants and ferns. The increasing availability of high-quality plant genome sequences and advances in functional genomics are enabling research on genome evolution across the green tree of life.The One Thousand Plant Transcriptomes Initiative provides a robust phylogenomic framework for examining green plant evolution that comprises the transcriptomes and genomes of diverse species of green plants.

[1]  A. Force,et al.  Preservation of duplicate genes by complementary, degenerative mutations. , 1999, Genetics.

[2]  C. N. Stewart,et al.  The evolutionary history of ferns inferred from 25 low-copy nuclear genes. , 2015, American journal of botany.

[3]  J A Eisen,et al.  Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. , 1998, Genome research.

[4]  P. Szövényi,et al.  Bryophyte diversity and evolution: windows into the early evolution of land plants. , 2011, American journal of botany.

[5]  Steven Maere,et al.  Tangled up in two: a burst of genome duplications at the end of the Cretaceous and the consequences for plant evolution , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[6]  Yeting Zhang,et al.  A genome triplication associated with early diversification of the core eudicots , 2012, Genome Biology.

[7]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[8]  J. Poulain,et al.  The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla , 2007, Nature.

[9]  F. Leliaert,et al.  Evolution and cytological diversification of the green seaweeds (Ulvophyceae). , 2010, Molecular biology and evolution.

[10]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[11]  S. Kelly,et al.  OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy , 2015, Genome Biology.

[12]  C. Delwiche,et al.  The Evolutionary Origin of a Terrestrial Flora , 2015, Current Biology.

[13]  G. Theißen,et al.  Structure and Evolution of Plant MADS Domain Transcription Factors , 2016 .

[14]  D. Sankoff,et al.  Polyploidy and angiosperm diversification. , 2009, American journal of botany.

[15]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[16]  Tandy J. Warnow,et al.  Ultra-large alignments using phylogeny-aware profiles , 2015, Genome Biology.

[17]  J. Palmer,et al.  Seed plant phylogeny inferred from all three plant genomes: monophyly of extant gymnosperms and origin of Gnetales from conifers. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[18]  B. Marin Nested in the Chlorellales or independent class? Phylogeny and classification of the Pedinophyceae (Viridiplantae) revealed by molecular phylogenetic analyses of complete nuclear and plastid-encoded rRNA operons. , 2012, Protist.

[19]  H. Saedler,et al.  Two ancient classes of MIKC-type MADS-box genes are present in the moss Physcomitrella patens. , 2002, Molecular biology and evolution.

[20]  Md. Shamsuzzoha Bayzid,et al.  Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses , 2014, PloS one.

[21]  Michael S. Barker,et al.  Unfurling Fern Biology in the Genomics Age , 2010 .

[22]  Dennis W. Stevenson,et al.  Algal ancestor of land plants was preadapted for symbiosis , 2015, Proceedings of the National Academy of Sciences.

[23]  R. Corlett Plant diversity in a changing world: Status, trends, and conservation needs , 2016, Plant diversity.

[24]  E. Yang,et al.  Evidence of ancient genome reduction in red algae (Rhodophyta) , 2015, Journal of phycology.

[25]  A. Leitch,et al.  Genome Size Diversity and Evolution in Land Plants , 2013 .

[26]  Siavash Mirarab,et al.  Fragmentary Gene Sequences Negatively Impact Gene Tree and Species Tree Reconstruction , 2017, Molecular biology and evolution.

[27]  Sudhir Kumar,et al.  TimeTree: A Resource for Timelines, Timetrees, and Divergence Times. , 2017, Molecular biology and evolution.

[28]  Matthew A. Gitzendanner,et al.  Modified CTAB and TRIzol protocols improve RNA extraction from chemically complex Embryophyta , 2015, Applications in plant sciences.

[29]  J. Gordon Burleigh,et al.  Evaluating and Characterizing Ancient Whole-Genome Duplications in Plants with Gene Count Data , 2016, Genome biology and evolution.

[30]  Rolf Lohaus,et al.  Revisiting ancestral polyploidy in plants , 2017, Science Advances.

[31]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[32]  Alexey M. Kozlov,et al.  ExaML version 3: a tool for phylogenomic analyses on supercomputers , 2015, Bioinform..

[33]  P. Edger,et al.  Ancient whole genome duplications, novelty and diversification: the WGD Radiation Lag-Time Model. , 2012, Current opinion in plant biology.

[34]  Steven Maere,et al.  The Origin of Floral Organ Identity Quartets , 2017, Plant Cell.

[35]  Claudia R. Solís-Lemus,et al.  Inconsistency of Species Tree Methods under Gene Flow. , 2016, Systematic biology.

[36]  Siavash Mirarab,et al.  Testing for Polytomies in Phylogenetic Species Trees Using Quartet Frequencies , 2017, Genes.

[37]  Naiara Rodríguez-Ezpeleta,et al.  Monophyly of Primary Photosynthetic Eukaryotes: Green Plants, Red Algae, and Glaucophytes , 2005, Current Biology.

[38]  S. Rensing,et al.  Three rings for the evolution of plastid shape: a tale of land plant FtsZ , 2017, Protoplasma.

[39]  J. Wiens,et al.  Missing data, incomplete taxa, and phylogenetic accuracy. , 2003, Systematic biology.

[40]  J. Wiens,et al.  Missing data in phylogenetic analysis: reconciling results from simulations and empirical data. , 2011, Systematic biology.

[41]  Michael J. Sanderson,et al.  The prevalence of terraced treescapes in analyses of phylogenetic data sets , 2018, BMC Evolutionary Biology.

[42]  Noah A Rosenberg,et al.  Gene tree discordance, phylogenetic inference and the multispecies coalescent. , 2009, Trends in ecology & evolution.

[43]  Md. Shamsuzzoha Bayzid,et al.  Statistical binning enables an accurate coalescent-based estimation of the avian tree , 2014, Science.

[44]  Stephen A. Smith,et al.  Analysis of phylogenomic datasets reveals conflict, concordance, and gene duplications with examples from animals and plants , 2015, BMC Evolutionary Biology.

[45]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[46]  Marta Matvienko,et al.  Multiple paleopolyploidizations during the evolution of the Compositae reveal parallel patterns of duplicate gene retention after millions of years. , 2008, Molecular biology and evolution.

[47]  James Leebens-Mack,et al.  Evaluating Methods for Isolating Total RNA and Predicting the Success of Sequencing Phylogenetically Diverse Plant Transcriptomes , 2012, PloS one.

[48]  Siavash Mirarab,et al.  Fast Coalescent-Based Computation of Local Branch Support from Quartet Frequencies , 2016, Molecular biology and evolution.

[49]  S. Berger,et al.  Dasycladales: An Illustrated Monograph of a Fascinating Algal Order , 1992 .

[50]  Yang Zhong,et al.  The position of gnetales among seed plants: overcoming pitfalls of chloroplast phylogenomics. , 2010, Molecular biology and evolution.

[51]  P. Bernier Dasycladales — An illustrated monograph of a fascinating algal order , 1992 .

[52]  Erin K. Molloy,et al.  The performance of coalescent-based species tree estimation methods under models of missing data , 2018, BMC Genomics.

[53]  Tandy Warnow,et al.  To include or not to include: The impact of gene filtering on species tree estimation methods , 2017, bioRxiv.

[54]  Stephen J. Callister,et al.  Evidence-based green algal genomics reveals marine diversity and ancestral characteristics of land plants , 2016, BMC Genomics.

[55]  D. Kapraun Nuclear DNA content estimates in green algal lineages: chlorophyta and streptophyta. , 2006, Annals of botany.

[56]  Amborella Genome The Amborella Genome and the Evolution of Flowering Plants , 2013, Science.

[57]  Saravanaraj N. Ayyampalayam,et al.  Phylotranscriptomic analysis of the origin and early diversification of land plants , 2014, Proceedings of the National Academy of Sciences.

[58]  Hirohisa Kishino,et al.  Incorporating gene-specific variation when inferring and evaluating optimal evolutionary tree topologies from multilocus sequence data. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[59]  B. Marin,et al.  Streptophyte algae and the origin of embryophytes. , 2009, Annals of botany.

[60]  Michael S. Barker,et al.  EvoPipes.net: Bioinformatic Tools for Ecological and Evolutionary Genomics , 2010, Evolutionary bioinformatics online.

[61]  Christian R. Boehm,et al.  Insights into Land Plant Evolution Garnered from the Marchantia polymorpha Genome , 2017, Cell.

[62]  Tandy J. Warnow,et al.  ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes , 2015, Bioinform..

[63]  David R. Hunter,et al.  mixtools: An R Package for Analyzing Mixture Models , 2009 .

[64]  Stephen A. Smith,et al.  Optimizing de novo assembly of short-read RNA-seq data for phylogenomics , 2013, BMC Genomics.

[65]  S. Kelly,et al.  The Stepwise Increase in the Number of Transcription Factor Families in the Precambrian Predated the Diversification of Plants On Land. , 2016, Molecular biology and evolution.

[66]  Siavash Mirarab,et al.  DiscoVista: Interpretable visualizations of gene tree discordance. , 2017, Molecular phylogenetics and evolution.

[67]  Guy Baele,et al.  Analysis of 41 plant genomes supports a wave of successful genome duplications in association with the Cretaceous–Paleogene boundary , 2014, Genome research.

[68]  Michael S. Barker,et al.  Early genome duplications in conifers and other seed plants , 2015, Science Advances.

[69]  Daniel W. A. Buchan,et al.  The tomato genome sequence provides insights into fleshy fruit evolution , 2012, Nature.

[70]  C. dePamphilis,et al.  Phylogeny of seed plants based on all three genomic compartments: extant gymnosperms are monophyletic and Gnetales' closest relatives are conifers. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[71]  Joel Sjöstrand,et al.  GenPhyloData: realistic simulation of gene family evolution , 2013, BMC Bioinformatics.

[72]  R. McCourt,et al.  Green algae and the origin of land plants. , 2004, American journal of botany.

[73]  C. N. Stewart,et al.  Multiple polyploidy events in the early radiation of nodulating and nonnodulating legumes. , 2015, Molecular biology and evolution.

[74]  B. Mueller‐Roeber,et al.  Genome-Wide Phylogenetic Comparative Analysis of Plant Transcriptional Regulation: A Timeline of Loss, Gain, Expansion, and Correlation with Complexity , 2010, Genome biology and evolution.

[75]  Naomi S. Altman,et al.  Horizontal gene transfer is more frequent with increased heterotrophy and contributes to parasite adaptation , 2016, Proceedings of the National Academy of Sciences.

[76]  S. Graham,et al.  Phylogenomic inference in extremis: A case study with mycoheterotroph plastomes. , 2018, American journal of botany.

[77]  Ping Liu,et al.  A genome for gnetophytes and early evolution of seed plants , 2018, Nature Plants.

[78]  David M. Goodstein,et al.  Phytozome: a comparative platform for green plant genomics , 2011, Nucleic Acids Res..

[79]  F. Parcy,et al.  A link between LEAFY and B-gene homologues in Welwitschia mirabilis sheds light on ancestral mechanisms prefiguring floral development. , 2017, The New phytologist.

[80]  Melissa D. Lehti-Shiu,et al.  Importance of Lineage-Specific Expansion of Plant Tandem Duplicates in the Adaptive Response to Environmental Stimuli1[W][OA] , 2008, Plant Physiology.

[81]  Manolis Kellis,et al.  Unified modeling of gene duplication, loss, and coalescence using a locus tree. , 2012, Genome research.

[82]  Charles-Elie Rabier,et al.  Detecting and locating whole genome duplications on a phylogeny: a probabilistic approach. , 2014, Molecular biology and evolution.

[83]  M. Melkonian,et al.  Apparition of the NAC Transcription Factors Predates the Emergence of Land Plants. , 2016, Molecular plant.

[84]  Matthew W. Hahn,et al.  Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution , 2007, Genome Biology.

[85]  Adam Godzik,et al.  Clustering of highly homologous sequences to reduce the size of large protein databases , 2001, Bioinform..

[86]  R. Govaerts,et al.  Counting counts: revised estimates of numbers of accepted species of flowering plants, seed plants, vascular plants and land plants with a review of other recent estimates , 2016 .

[87]  J. Raes,et al.  Modeling gene and genome duplications in eukaryotes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[88]  Sebastian Proost,et al.  Gamma paleohexaploidy in the stem lineage of core eudicots: significance for MADS-box gene and species diversification. , 2012, Molecular biology and evolution.

[89]  Claude W. dePamphilis,et al.  Ancestral polyploidy in seed plants and angiosperms , 2011, Nature.

[90]  R. Durbin,et al.  GeneWise and Genomewise. , 2004, Genome research.

[91]  S. Rensing,et al.  Comprehensive Genome-Wide Classification Reveals That Many Plant-Specific Transcription Factors Evolved in Streptophyte Algae , 2017, Genome biology and evolution.

[92]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[93]  J. Palmer,et al.  Multigene phylogeny of land plants with special reference to bryophytes and the earliest land plants. , 2000, Molecular biology and evolution.

[94]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[95]  Michael S. Barker,et al.  Impact of whole-genome duplication events on diversification rates in angiosperms. , 2018, American journal of botany.

[96]  David C. Tank,et al.  Nested radiations and the pulse of angiosperm diversification: increased diversification rates often follow whole genome duplications. , 2015, The New phytologist.

[97]  Pamela S Soltis,et al.  Plastid phylogenomic analysis of green plants: A billion years of evolutionary history. , 2018, American journal of botany.

[98]  Michael S. Barker,et al.  Multiple large-scale gene and genome duplications during the evolution of hexapods , 2018, Proceedings of the National Academy of Sciences.

[99]  D. Nelson,et al.  A P450-centric view of plant evolution. , 2011, The Plant journal : for cell and molecular biology.

[100]  A. Zharkikh,et al.  Molecular phylogeny of extant gymnosperms and seed plant evolution: analysis of nuclear 18S rRNA sequences. , 1997, Molecular biology and evolution.

[101]  Adam Godzik,et al.  Tolerating some redundancy significantly speeds up clustering of large protein databases , 2002, Bioinform..

[102]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[103]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[104]  Tandy J. Warnow,et al.  PASTA: Ultra-Large Multiple Sequence Alignment , 2014, RECOMB.

[105]  J. Bowman,et al.  Field Guide to Plant Model Systems , 2016, Cell.

[106]  Tandy Warnow,et al.  Evaluating Summary Methods for Multilocus Species Tree Estimation in the Presence of Incomplete Lineage Sorting. , 2016, Systematic biology.

[107]  Mark N. Puttick,et al.  The Interrelationships of Land Plants and the Nature of the Ancestral Embryophyte , 2018, Current Biology.