Assembled and annotated 26.5 Gbp coast redwood genome: a resource for estimating evolutionary adaptive potential and investigating hexaploid origin

Abstract Sequencing, assembly, and annotation of the 26.5 Gbp hexaploid genome of coast redwood (Sequoia sempervirens) was completed leading toward discovery of genes related to climate adaptation and investigation of the origin of the hexaploid genome. Deep-coverage short-read Illumina sequencing data from haploid tissue from a single seed were combined with long-read Oxford Nanopore Technologies sequencing data from diploid needle tissue to create an initial assembly, which was then scaffolded using proximity ligation data to produce a highly contiguous final assembly, SESE 2.1, with a scaffold N50 size of 44.9 Mbp. The assembly included several scaffolds that span entire chromosome arms, confirmed by the presence of telomere and centromere sequences on the ends of the scaffolds. The structural annotation produced 118,906 genes with 113 containing introns that exceed 500 Kbp in length and one reaching 2 Mb. Nearly 19 Gbp of the genome represented repetitive content with the vast majority characterized as long terminal repeats, with a 2.9:1 ratio of Copia to Gypsy elements that may aid in gene expression control. Comparison of coast redwood to other conifers revealed species-specific expansions for a plethora of abiotic and biotic stress response genes, including those involved in fungal disease resistance, detoxification, and physical injury/structural remodeling and others supporting flavonoid biosynthesis. Analysis of multiple genes that exist in triplicate in coast redwood but only once in its diploid relative, giant sequoia, supports a previous hypothesis that the hexaploidy is the result of autopolyploidy rather than any hybridizations with separate but closely related conifer species.

[1]  S. Salzberg,et al.  Genome-wide association identifies candidate genes for drought tolerance in coast redwood and giant sequoia. , 2021, The Plant journal : for cell and molecular biology.

[2]  D. Neale,et al.  Genome-wide association identifies candidate genes for drought tolerance in coast redwood and giant sequoia , 2021, bioRxiv.

[3]  V. Volkov,et al.  A Quest for Mechanisms of Plant Root Exudation Brings New Results and Models, 300 Years after Hales , 2020, Plants.

[4]  Bao Liu,et al.  Dynamic and reversible DNA methylation changes induced by genome separation and merger of polyploid wheat , 2020, BMC biology.

[5]  M. Petitou,et al.  Efficient selective deacetylation of complex oligosaccharides using the neutral organotin catalyst [tBu2SnOH(Cl)]2. , 2020, Carbohydrate research.

[6]  S. Salzberg,et al.  A Reference Genome Sequence for Giant Sequoia , 2020, G3.

[7]  V. V. Sharov,et al.  De novo transcriptome assembly of cold stressed clones of the hexaploid Sequoia sempervirens (D. Don) Endl. , 2020, Scientific Data.

[8]  Xiangfeng Wang,et al.  Genome Size Evolution Mediated by Gypsy Retrotransposons in Brassicaceae , 2020, Genom. Proteom. Bioinform..

[9]  M. Roytberg,et al.  Brain-related genes are specifically enriched with long phase 1 introns , 2020, PloS one.

[10]  Alexander Hart,et al.  EnTAP: Bringing faster and smarter functional annotation to non‐model eukaryotic transcriptomes , 2020, Molecular ecology resources.

[11]  V. Colot,et al.  Relaxed purifying selection in autopolyploids drives transposable element over-accumulation which provides variants for local adaptation , 2019, Nature Communications.

[12]  J. Bohlmann,et al.  Oleoresin defenses in conifers: chemical diversity, terpene synthases, limitations of oleoresin defense under climate change. , 2019, The New phytologist.

[13]  Olga Chernomor,et al.  IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era , 2019, bioRxiv.

[14]  S. Kelly,et al.  OrthoFinder: phylogenetic orthology inference for comparative genomics , 2019, Genome Biology.

[15]  James E. Allen,et al.  Ensembl Genomes 2020—enabling non-vertebrate genomic research , 2019, Nucleic Acids Res..

[16]  Steven L Salzberg,et al.  Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype , 2019, Nature Biotechnology.

[17]  T. Alioto,et al.  A Reference Genome Sequence for the European Silver Fir (Abies alba Mill.): A Community-Generated Genomic Resource , 2019, G3: Genes, Genomes, Genetics.

[18]  Jill Wegrzyn,et al.  gFACs: Gene Filtering, Analysis, and Conversion to Unify Genome Annotations Across Alignment and Gene Prediction Frameworks , 2019, Genom. Proteom. Bioinform..

[19]  J. Vilo,et al.  g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update) , 2019, Nucleic Acids Res..

[20]  A. Rose Introns as Gene Regulators: A Brick on the Accelerator , 2019, Front. Genet..

[21]  Dmitry A. Kuzmin,et al.  Stepwise large genome assembly approach: a case of Siberian larch (Larix sibirica Ledeb) , 2019, BMC Bioinformatics.

[22]  Stephen P. Ficklin,et al.  Growing and cultivating the forest genomics database, TreeGenes , 2018, Database J. Biol. Databases Curation.

[23]  Lukasz Kurgan,et al.  Sequence Similarity Searching , 2018, Current protocols in protein science.

[24]  J. Gershenzon,et al.  Low genetic variation is associated with low mutation rate in the giant duckweed , 2018, Nature Communications.

[25]  Sudhir Kumar,et al.  MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. , 2018, Molecular biology and evolution.

[26]  Kelvin J. Liu,et al.  High Molecular Weight DNA Extraction from Recalcitrant Plant Species for Third Generation Sequencing , 2018 .

[27]  Adam M. Phillippy,et al.  MUMmer4: A fast and versatile genome alignment system , 2018, PLoS Comput. Biol..

[28]  Shujun Ou,et al.  LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons1[OPEN] , 2017, Plant Physiology.

[29]  Andrea Schrader,et al.  TRANSPARENT TESTA GLABRA 1-Dependent Regulation of Flavonoid Biosynthesis , 2017, Plants.

[30]  Douglas R. Hoen,et al.  Abiotic Stress Phenotypes Are Associated with Conserved Genes Derived from Transposable Elements , 2017, Front. Plant Sci..

[31]  Long-fang O. Chen,et al.  Genome-wide analysis of GDSL-type esterases/lipases in Arabidopsis , 2017, Plant Molecular Biology.

[32]  Tomasz E. Koralewski,et al.  The Douglas-Fir Genome Sequence Reveals Specialization of the Photosynthetic Apparatus in Pinaceae , 2017, G3: Genes, Genomes, Genetics.

[33]  Han Fang,et al.  GenomeScope: Fast reference-free genome profiling from short reads , 2016, bioRxiv.

[34]  Thomas K. F. Wong,et al.  ModelFinder: Fast Model Selection for Accurate Phylogenetic Estimates , 2017, Nature Methods.

[35]  S. Salzberg,et al.  Sequence of the Sugar Pine Megagenome , 2016, Genetics.

[36]  Ben Nichols,et al.  Distributed under Creative Commons Cc-by 4.0 Vsearch: a Versatile Open Source Tool for Metagenomics , 2022 .

[37]  Francois Sabot,et al.  LTRclassifier: A website for fast structural LTR retrotransposons classification in plants , 2016, Mobile genetic elements.

[38]  S. Salzberg,et al.  Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the mega-reads algorithm , 2016, bioRxiv.

[39]  D. Baum,et al.  Whole genome duplication in coast redwood (Sequoia sempervirens) and its implications for explaining the rarity of polyploidy in conifers. , 2016, The New phytologist.

[40]  Steven J. M. Jones,et al.  Improved white spruce (Picea glauca) genome assemblies and annotation of large gene families of conifer terpenoid and phenolic defense metabolism. , 2015, The Plant journal : for cell and molecular biology.

[41]  Brendan L. O’Connell,et al.  Chromosome-scale shotgun assembly using an in vitro method for long-range linkage , 2015, Genome research.

[42]  Katharina J. Hoff,et al.  BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS , 2016, Bioinform..

[43]  Alex T. Kalinka,et al.  Introns and gene expression: Cellular constraints, transcriptional regulation, and evolutionary consequences , 2014, BioEssays : news and reviews in molecular, cellular and developmental biology.

[44]  Xiao Sun,et al.  Data access for the 1,000 Plants (1KP) project , 2014, GigaScience.

[45]  Gary D Bader,et al.  Biological Network Exploration with Cytoscape 3 , 2014, Current protocols in bioinformatics.

[46]  Gary D Bader,et al.  GeneMANIA: Fast gene network construction and function prediction for Cytoscape , 2014, F1000Research.

[47]  Le-Shin Wu,et al.  Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies , 2014, Genome Biology.

[48]  S. Salzberg,et al.  Sequencing and Assembly of the 22-Gb Loblolly Pine Genome , 2014, Genetics.

[49]  Matthew Fraser,et al.  InterProScan 5: genome-scale protein function classification , 2014, Bioinform..

[50]  Shutian Li Transcriptional control of flavonoid biosynthesis , 2014, Plant signaling & behavior.

[51]  Rod A Wing,et al.  Assembly and Validation of the Genome of the Nonmodel Basal Angiosperm Amborella , 2013, Science.

[52]  Michael Roberts,et al.  The MaSuRCA genome assembler , 2013, Bioinform..

[53]  Inanç Birol,et al.  Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data , 2013, Bioinform..

[54]  Douglas G. Scofield,et al.  The Norway spruce genome sequence and conifer genome evolution , 2013, Nature.

[55]  Evgeny M. Zdobnov,et al.  OrthoDB: a hierarchical catalog of animal, fungal and bacterial orthologs , 2012, Nucleic Acids Res..

[56]  Jin-Hua Ran,et al.  Three genome-based phylogeny of Cupressaceae s.l.: further evidence for the evolution of gymnosperms and Southern Hemisphere biogeography. , 2012, Molecular phylogenetics and evolution.

[57]  C. Scheuring,et al.  Preparation of megabase-sized DNA from a variety of organisms using the nuclei method for advanced genomics research , 2012, Nature Protocols.

[58]  R. Mullen,et al.  Meta-analysis of the expression profiles of the Arabidopsis ESCRT machinery , 2011, Plant signaling & behavior.

[59]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[60]  G. Parra,et al.  Comparative and functional analysis of intron-mediated enhancement signals reveals conserved features among plants , 2011, Nucleic acids research.

[61]  Carl Kingsford,et al.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011, Bioinform..

[62]  J. Ragle,et al.  IUCN Red List of Threatened Species , 2010 .

[63]  Jaimyoung Kwon,et al.  Identification of genes induced in proteoid roots of white lupin under nitrogen and phosphorus deprivation, with functional characterization of a formamidase , 2010, Plant and Soil.

[64]  Cristian Chaparro,et al.  Exceptional Diversity, Non-Random Distribution, and Rapid Evolution of Retroelements in the B73 Maize Genome , 2009, PLoS genetics.

[65]  M. Madej,et al.  Presettlement and modern disturbance regimes in coast redwood forests: Implications for the conservation of old-growth stands , 2009 .

[66]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[67]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[68]  T. Pyhäjärvi,et al.  Genomic diversity in forest trees. , 2007, Current opinion in plant biology.

[69]  Sang-Gu Kim,et al.  Distinct Roles of the First Introns on the Expression of Arabidopsis Profilin Gene Family Members1 , 2005, Plant Physiology.

[70]  M. R. Ahuja,et al.  Evolution of Genome Size in Conifers , 2005 .

[71]  Gordon Gremme,et al.  Engineering a software tool for gene structure prediction in higher organisms , 2005, Inf. Softw. Technol..

[72]  M. R. Ahuja,et al.  Polyploidy in Gymnosperms: Revisited , 2005 .

[73]  Brian Smith-White,et al.  Plant Genome Resources at the National Center for Biotechnology Information , 2005, Plant Physiology.

[74]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[75]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[76]  N. Carpita,et al.  The Galactose Residues of Xyloglucan Are Essential to Maintain Mechanical Strength of the Primary Cell Walls in Arabidopsis during Growth1 , 2004, Plant Physiology.

[77]  J. Casacuberta,et al.  Plant LTR-retrotransposons and MITEs: control of transposition and impact on the evolution of plant genes and genomes. , 2003, Gene.

[78]  S. Goodwin,et al.  Cloning and Characterization of the WAX2 Gene of Arabidopsis Involved in Cuticle Membrane and Wax Production Article, publication date, and citation information can be found at www.plantcell.org/cgi/doi/10.1105/tpc.010926. , 2003, The Plant Cell Online.

[79]  S. Shimada,et al.  OARE-1, a Ty1-copia retrotransposon in oat activated by abiotic and biotic stresses. , 2001, Plant & cell physiology.

[80]  D. Guttman,et al.  Functional analysis of the type III effectors AvrRpt2 and AvrRpm1 of Pseudomonas syringae with the use of a single-copy genomic integration system. , 2001, Molecular plant-microbe interactions : MPMI.

[81]  L. Fuchigami,et al.  Physiological and Environmental Requirements for Poplar (Populus deltoides) Bark Storage Protein Degradation , 1993, Plant physiology.

[82]  B. Weir,et al.  ESTIMATING F‐STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE , 1984, Evolution; international journal of organic evolution.

[83]  H. A. Simons,et al.  Karyology of Sequoia sempervirens: Karyotype and Accessory Chromosomes , 1970 .

[84]  T. N. Khoshoo POLYPLOIDY IN GYMNOSPERMS , 1959, Basic life sciences.

[85]  G. Stebbins The Chromosomes and Relationships of Metasequoia and Sequoia. , 1948, Science.

[86]  Mosè Manni,et al.  BUSCO: Assessing Genome Assembly and Annotation Completeness. , 2019, Methods in molecular biology.

[87]  M. R. Ahuja Climate Change, Genetic Diversity, and Conservation of Paleoendemic Redwoods , 2017 .

[88]  M. Okuyama,et al.  Functional characterization of UDP‐rhamnose‐dependent rhamnosyltransferase involved in anthocyanin modification, a key enzyme determining blue coloration in Lobelia erinus , 2017, The Plant journal : for cell and molecular biology.

[89]  Kazutaka Katoh,et al.  MAFFT: iterative refinement and additional methods. , 2014, Methods in molecular biology.

[90]  F. J. Corpas,et al.  Redox and nitric oxide (NO) homeostasis differentially regulated in tomato roots and leaves under salinity stress , 2013 .

[91]  Ira M. Hall,et al.  BEDTools: a flexible suite of utilities for comparing genomic features , 2010, Bioinform..

[92]  Robert C. Edgar,et al.  Search and clustering orders of magnitude faster than BLAST , 2010 .

[93]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[94]  S. Goodwin,et al.  Cloning and Characterization of the WAX2 Gene of Arabidopsis Involved in Cuticle Membrane and Wax Production , 2003 .

[95]  M. R. Ahuja,et al.  Origins of Polyploidy in Coast Redwood (Sequoia sempervirens (D. DON) ENDL.) and Relationship of Coast Redwood to other Genera of Taxodiaceae , 2002 .

[96]  R. M. Lanner Conifers of California , 1999 .

[97]  A. Bailly,et al.  Early results of a rangewide provenance test of Sequoia sempervirens. , 1995 .