Single-molecule sequencing and optical mapping yields an improved genome of woodland strawberry (Fragaria vesca) with chromosome-scale contiguity

Abstract Background Although draft genomes are available for most agronomically important plant species, the majority are incomplete, highly fragmented, and often riddled with assembly and scaffolding errors. These assembly issues hinder advances in tool development for functional genomics and systems biology. Findings Here we utilized a robust, cost-effective approach to produce high-quality reference genomes. We report a near-complete genome of diploid woodland strawberry (Fragaria vesca) using single-molecule real-time sequencing from Pacific Biosciences (PacBio). This assembly has a contig N50 length of ∼7.9 million base pairs (Mb), representing a ∼300-fold improvement of the previous version. The vast majority (>99.8%) of the assembly was anchored to 7 pseudomolecules using 2 sets of optical maps from Bionano Genomics. We obtained ∼24.96 Mb of sequence not present in the previous version of the F. vesca genome and produced an improved annotation that includes 1496 new genes. Comparative syntenic analyses uncovered numerous, large-scale scaffolding errors present in each chromosome in the previously published version of the F. vesca genome. Conclusions Our results highlight the need to improve existing short-read based reference genomes. Furthermore, we demonstrate how genome quality impacts commonly used analyses for addressing both fundamental and applied biological questions.

[1]  D. Sargent,et al.  Additive QTLs on three chromosomes control flowering time in woodland strawberry (Fragaria vesca L.) , 2017, Horticulture Research.

[2]  N. Jiang,et al.  LTR_retriever: a highly accurate and sensitive program for identification of LTR retrotransposons , 2017, bioRxiv.

[3]  Steven G. Schroeder,et al.  Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome , 2017, Nature Genetics.

[4]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[5]  Ute Roessner,et al.  The genome of Chenopodium quinoa , 2017, Nature.

[6]  James R. Knight,et al.  An improved genome assembly uncovers prolific tandem repeats in Atlantic cod , 2016, bioRxiv.

[7]  L. Smith,et al.  Short tandem repeats, segmental duplications, gene deletion, and genomic instability in a rapidly diversified immune gene family , 2016, BMC Genomics.

[8]  Wei Wei,et al.  The WRKY transcription factors in the diploid woodland strawberry Fragaria vesca: Identification and expression analysis under biotic and abiotic stresses. , 2016, Plant physiology and biochemistry : PPB.

[9]  S. Klemsdal,et al.  Expression of resistance gene analogs in woodland strawberry (Fragaria vesca) during infection with Phytophthora cactorum , 2016, Molecular Genetics and Genomics.

[10]  D. Sargent,et al.  A High‐Density Linkage Map of the Ancestral Diploid Strawberry, Fragaria iinumae, Constructed with Single Nucleotide Polymorphism Markers from the IStraw90 Array and Genotyping by Sequencing , 2016, The plant genome.

[11]  Pankaj Jaiswal,et al.  FragariaCyc: A Metabolic Pathway Database for Woodland Strawberry Fragaria vesca , 2016, Front. Plant Sci..

[12]  Jacob A. Tennessen,et al.  Homomorphic ZW chromosomes in a wild strawberry show distinctive recombination heterogeneity but a small sex‐determining region , 2015, The New phytologist.

[13]  Haibao Tang,et al.  Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum , 2015, Nature.

[14]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[15]  J. Tennessen,et al.  Multilocus Sex Determination Revealed in Two Populations of Gynodioecious Wild Strawberry, Fragaria vesca subsp. bracteata , 2015, G3: Genes, Genomes, Genetics.

[16]  Carlos G Schrago,et al.  Long-Read Single Molecule Sequencing to Resolve Tandem Gene Copies: The Mst77Y Region on the Drosophila melanogaster Y Chromosome , 2015, G3: Genes, Genomes, Genetics.

[17]  T. Michael,et al.  Progress, challenges and the future of crop genomes. , 2015, Current opinion in plant biology.

[18]  J. Slovin,et al.  Re-annotation of the woodland strawberry (Fragaria vesca) genome , 2015, BMC Genomics.

[19]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[20]  A. Liston,et al.  Fragaria: a genus with deep historical roots and ripe for evolutionary and ecological insights. , 2014, American journal of botany.

[21]  Carolyn J. Lawrence-Dill,et al.  MAKER-P: A Tool Kit for the Rapid Creation, Management, and Quality Control of Plant Genome Annotations1[W][OPEN] , 2013, Plant Physiology.

[22]  E. Lyons,et al.  Whole Genome and Tandem Duplicate Retention Facilitated Glucosinolate Pathway Diversification in the Mustard Family , 2013, Genome biology and evolution.

[23]  Jacob A. Tennessen,et al.  Targeted Sequence Capture Provides Insight into Genome Structure and Genetics of Male Sterility in a Gynodioecious Diploid Strawberry, Fragaria vesca ssp. bracteata (Rosaceae) , 2013, G3: Genes, Genomes, Genetics.

[24]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[25]  Zhongchi Liu,et al.  Genome-Scale Transcriptomic Insights into Early-Stage Fruit Development in Woodland Strawberry Fragaria vesca[C][W] , 2013, Plant Cell.

[26]  M. Albani,et al.  Mutation in TERMINAL FLOWER1 Reverses the Photoperiodic Requirement for Flowering in the Wild Strawberry Fragaria vesca1[W] , 2012, Plant Physiology.

[27]  M. Yandell,et al.  A beginner's guide to eukaryotic genome annotation , 2012, Nature Reviews Genetics.

[28]  M. Schatz,et al.  Current challenges in de novo plant genome sequencing and assembly , 2012, Genome Biology.

[29]  J. Slovin,et al.  Flower and early fruit development in a diploid strawberry, Fragaria vesca , 2012, Planta.

[30]  M. Batzer,et al.  Repetitive Elements May Comprise Over Two-Thirds of the Human Genome , 2011, PLoS genetics.

[31]  T. Davis,et al.  Conservation and loss of ribosomal RNA gene sites in diploid and polyploid Fragaria (Rosaceae) , 2011, BMC Plant Biology.

[32]  K. Folta,et al.  Genetics, genomics and breeding of berries. , 2011 .

[33]  Henry D. Priest,et al.  The genome of woodland strawberry (Fragaria vesca) , 2011, Nature Genetics.

[34]  Susan R. Wessler,et al.  MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences , 2010, Nucleic acids research.

[35]  J. Chris Pires,et al.  Gene and genome duplications: the impact of dosage-sensitivity on the fate of nuclear genes , 2009, Chromosome Research.

[36]  Michael Freeling,et al.  The Value of Nonmodel Genomes and an Example Using SynMap Within CoGe to Dissect the Hexaploidy that Predates the Rosids , 2008, Tropical Plant Biology.

[37]  T. Davis,et al.  Strawberry Genes and Genomics , 2006 .

[38]  Patrice Koehl,et al.  Plant NBS-LRR proteins: adaptable guards , 2006, Genome Biology.

[39]  Takuji Sasaki,et al.  The map-based sequence of the rice genome , 2005, Nature.

[40]  Steven Salzberg,et al.  DAGchainer: a tool for mining segmental genome duplications and synteny , 2004, Bioinform..

[41]  Brad A. Chapman,et al.  Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events , 2003, Nature.

[42]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[43]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[44]  R. S. Bringhurst,et al.  Origin of Fragaria polyploids. I. Cytological analysis , 1967 .

[45]  Yasuko Takahashi,et al.  Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events , 2022 .