Evolutionary history and pan-genome dynamics of strawberry (Fragaria spp.)

Significance Strawberry is a very popular fruit. The strawberry genus (Fragaria) has emerged as a model system for various fundamental and applied research in recent years. Here, by using high-throughput sequencing technologies, we provide de novo whole-genome sequences for five wild strawberry species and genome resequencing data for 128 additional accessions of key species. Our analyses resulted in robust estimates of the evolutionary history for most diploid strawberry species, the discovery of a new diploid species (Fragaria emeiensis Jia J. Lei), and the construction of a pan-genome for strawberry. We also examined the evolutionary dynamics of gene families. This study provides a powerful genomic platform and resource for future studies in strawberry. Strawberry (Fragaria spp.) has emerged as a model system for various fundamental and applied research in recent years. In total, the genomes of five different species have been sequenced over the past 10 y. Here, we report chromosome-scale reference genomes for five strawberry species, including three newly sequenced species’ genomes, and genome resequencing data for 128 additional accessions to estimate the genetic diversity, structure, and demographic history of key Fragaria species. Our analyses obtained fully resolved and strongly supported phylogenies and divergence times for most diploid strawberry species. These analyses also uncovered a new diploid species (Fragaria emeiensis Jia J. Lei). Finally, we constructed a pan-genome for Fragaria and examined the evolutionary dynamics of gene families. Notably, we identified multiple independent single base mutations of the MYB10 gene associated with white pigmented fruit shared by different strawberry species. These reference genomes and datasets, combined with our phylogenetic estimates, should serve as a powerful comparative genomic platform and resource for future studies in strawberry.

[1]  Jian Sun,et al.  Complete chloroplast genome sequencing of ten wild Fragaria species in China provides evidence for phylogenetic evolution of Fragaria. , 2021, Genomics.

[2]  Voichita D. Marinescu,et al.  Progressive Cactus is a multiple-genome aligner for the thousand-genome era , 2020, Nature.

[3]  M. Hardigan,et al.  Allelic Variation of MYB10 Is the Major Force Controlling Natural Variation in Skin and Flesh Color in Strawberry (Fragaria spp.) Fruit[OPEN] , 2020, Plant Cell.

[4]  K. Folta,et al.  Tracing the Diploid Ancestry of the Cultivated Octoploid Strawberry , 2020, Molecular biology and evolution.

[5]  Jia-Fu Jiang,et al.  Large-Scale Comparative Analyses of Tick Genomes Elucidate Their Genetic Diversity and Vector Capacities , 2020, Cell.

[6]  Richard H. Ree,et al.  Ancient orogenic and monsoon-driven assembly of the world’s richest temperate alpine flora , 2020, Science.

[7]  Hui Ma,et al.  Genomic basis of homoploid hybrid speciation within chestnut trees , 2020, Nature Communications.

[8]  W. Schwab,et al.  Metabolite Quantitative Trait Loci for flavonoids provide new insights into the genetic architecture of strawberry (Fragaria x ananassa) fruit quality. , 2020, Journal of agricultural and food chemistry.

[9]  W. Schwab,et al.  Metabolite Quantitative Trait Loci for flavonoids provide new insights into the genetic architecture of strawberry (Fragaria x ananassa) fruit quality , 2020, bioRxiv.

[10]  Xueying Zhang,et al.  Chromosome-level genome assembly and annotation of the loquat (Eriobotrya japonica) genome , 2020, GigaScience.

[11]  Baotian Wang,et al.  The high‐quality genome of diploid strawberry (Fragaria nilgerrensis) provides new insights into anthocyanin accumulation , 2020, Plant biotechnology journal.

[12]  S. Knapp,et al.  Reply to: Revisiting the origin of octoploid strawberry , 2019, Nature Genetics.

[13]  Zhongchi Liu,et al.  Updated annotation of the wild strawberry Fragaria vesca V4 genome , 2019, Horticulture Research.

[14]  Katrín Halldórsdóttir,et al.  Codweb: Whole-genome sequencing uncovers extensive reticulations fueling adaptation among Atlantic, Arctic, and Pacific gadids , 2019, Science Advances.

[15]  Heng Li,et al.  Fast and accurate long-read assembly with wtdbg2 , 2019, Nature Methods.

[16]  S. Kelly,et al.  OrthoFinder2: fast and accurate phylogenomic orthology analysis from gene sequences , 2018, bioRxiv.

[17]  Chi Zhang,et al.  PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format files , 2018, Bioinform..

[18]  A. Momohara,et al.  Fruit fossils of Rosoideae (Rosaceae) from the late Pliocene of northwestern Yunnan, Southwest China , 2018, Journal of Systematics and Evolution.

[19]  J. Tennessen,et al.  Plastid genomes reveal recurrent formation of allopolyploid Fragaria. , 2018, American journal of botany.

[20]  Kenneth L. McNally,et al.  Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza , 2018, Nature Genetics.

[21]  Qun Xu,et al.  Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice , 2018, Nature Genetics.

[22]  Wendy S. Schackwitz,et al.  Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure , 2017, Nature Communications.

[23]  Jeffrey P. Mower,et al.  Single-molecule sequencing and optical mapping yields an improved genome of woodland strawberry (Fragaria vesca) with chromosome-scale contiguity , 2017, GigaScience.

[24]  Jian Wang,et al.  SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data , 2017, GigaScience.

[25]  S. Isobe,et al.  Population genetic analysis of a global collection of Fragaria vesca using microsatellite markers , 2017, PloS one.

[26]  D. Soltis,et al.  Diversification of Rosaceae since the Late Cretaceous based on plastid phylogenomics. , 2017, The New phytologist.

[27]  Richard H. Ree,et al.  Uplift-driven diversification in the Hengduan Mountains, a temperate biodiversity hotspot , 2017, Proceedings of the National Academy of Sciences.

[28]  Hang Sun,et al.  Comparative Transcriptomics of Strawberries (Fragaria spp.) Provides Insights into Evolutionary Patterns , 2016, Front. Plant Sci..

[29]  Yi Hu,et al.  Evolution of Rosaceae Fruit Types Based on Nuclear Phylogeny in the Context of Geological Times and Genome Duplication , 2016, Molecular biology and evolution.

[30]  J. Sese,et al.  Sequencing of the genus Arabidopsis identifies a complex history of nonbifurcating speciation and abundant trans-specific polymorphism , 2016, Nature Genetics.

[31]  Zhongchi Liu,et al.  Genome-scale DNA variant analysis and functional validation of a SNP underlying yellow fruit color in wild strawberry , 2016, Scientific Reports.

[32]  D. Sargent,et al.  A High‐Density Linkage Map of the Ancestral Diploid Strawberry, Fragaria iinumae, Constructed with Single Nucleotide Polymorphism Markers from the IStraw90 Array and Genotyping by Sequencing , 2016, The plant genome.

[33]  M. Schatz,et al.  Phased diploid genome assembly with single-molecule real-time sequencing , 2016, Nature Methods.

[34]  James C. Schnable,et al.  SynFind: Compiling Syntenic Regions across Any Set of Genomes on Demand , 2015, Genome biology and evolution.

[35]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[36]  O. Kohany,et al.  Repbase Update, a database of repetitive elements in eukaryotic genomes , 2015, Mobile DNA.

[37]  Yun-Xin Fu,et al.  Exploring Population Size Changes Using SNP Frequency Spectra , 2015, Nature Genetics.

[38]  Md. Shamsuzzoha Bayzid,et al.  Whole-genome analyses resolve early branches in the tree of life of modern birds , 2014, Science.

[39]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[40]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[41]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[42]  A. Liston,et al.  Fragaria: a genus with deep historical roots and ripe for evolutionary and ecological insights. , 2014, American journal of botany.

[43]  Hong Ma,et al.  Resolution of deep angiosperm phylogeny using conserved nuclear genes and estimates of early divergence times , 2014, Nature Communications.

[44]  Alexandre Lomsadze,et al.  Identification of protein coding regions in RNA transcripts , 2014, BCB.

[45]  T. Ashman,et al.  Bioclimatic evaluation of geographical range in Fragaria (Rosaceae): consequences of variation in breeding system, ploidy and species age , 2014 .

[46]  W. Schwab,et al.  MYB10 plays a major role in the regulation of flavonoid/phenylpropanoid metabolism during ripening of Fragaria x ananassa fruits. , 2014, Journal of experimental botany.

[47]  M. A. Pedraza,et al.  Insights into the Maize Pan-Genome and Pan-Transcriptome[W][OPEN] , 2014, Plant Cell.

[48]  Jianying Yuan,et al.  Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects , 2013, 1308.2012.

[49]  Mira V. Han,et al.  Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. , 2013, Molecular biology and evolution.

[50]  R. Herrera,et al.  Increased accumulation of anthocyanins in Fragaria chiloensis fruits by transient suppression of FcMYB1 gene. , 2013, Phytochemistry.

[51]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[52]  Jun Wang,et al.  The genome of the pear (Pyrus bretschneideri Rehd.) , 2013, Genome research.

[53]  Sebastian Proost,et al.  Gamma paleohexaploidy in the stem lineage of core eudicots: significance for MADS-box gene and species diversification. , 2012, Molecular biology and evolution.

[54]  Thomas Nussbaumer,et al.  MIPS PlantsDB: a database framework for comparative plant genome research , 2012, Nucleic Acids Res..

[55]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[56]  Joseph K. Pickrell,et al.  Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data , 2012, PLoS genetics.

[57]  Jeremy D. DeBarry,et al.  MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity , 2012, Nucleic acids research.

[58]  Q. Xie,et al.  The endoplasmic reticulum-associated degradation is necessary for plant salt tolerance , 2011, Cell Research.

[59]  E. Grotewold,et al.  Evolutionary and comparative analysis of MYB and bHLH plant transcription factors. , 2011, The Plant journal : for cell and molecular biology.

[60]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[61]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[62]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[63]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[64]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[65]  B. Denoyes-Rothan,et al.  Tracking the evolutionary history of polyploidy in Fragaria L. (strawberry): new insights from phylogenetic analyses of low-copy nuclear genes. , 2009, Molecular phylogenetics and evolution.

[66]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[67]  Michael Freeling,et al.  The Value of Nonmodel Genomes and an Example Using SynMap Within CoGe to Dissect the Hexaploidy that Predates the Rosids , 2008, Tropical Plant Biology.

[68]  Jonathan E. Allen,et al.  Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments , 2007, Genome Biology.

[69]  Melanie A. Huntley,et al.  Evolution of genes and genomes on the Drosophila phylogeny , 2007, Nature.

[70]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[71]  B. Rost,et al.  SNAP: predict effect of non-synonymous polymorphisms on function , 2007, Nucleic acids research.

[72]  Jun Li,et al.  KaKs_Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging , 2007, Genom. Proteom. Bioinform..

[73]  Zhao Xu,et al.  LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons , 2007, Nucleic Acids Res..

[74]  Ryan D. Hernandez,et al.  Demographic Histories and Patterns of Linkage Disequilibrium in Chinese and Indian Rhesus Macaques , 2007, Science.

[75]  Stephen M. Mount,et al.  Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis , 2006, BMC Genomics.

[76]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[77]  Peer Bork,et al.  PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments , 2006, Nucleic Acids Res..

[78]  Rolf Apweiler,et al.  InterProScan: protein domains identifier , 2005, Nucleic Acids Res..

[79]  Steven Salzberg,et al.  TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders , 2004, Bioinform..

[80]  Nansheng Chen,et al.  Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences , 2009, Current protocols in bioinformatics.

[81]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[82]  Mario Stanke,et al.  Gene prediction with a hidden Markov model and a new intron submodel , 2003, ECCB.

[83]  V. Quesada,et al.  FY Is an RNA 3′ End-Processing Factor that Interacts with FCA to Control the Arabidopsis Floral Transition , 2003, Cell.

[84]  S. Serçe,et al.  Variation in the Horticultural Characteristics of Native Fragaria virginiana and F. chiloensis from North and South America , 2003 .

[85]  Enrique Blanco,et al.  Using geneid to Identify Genes , 2002, Current protocols in bioinformatics.

[86]  E. Grotewold,et al.  MYB transcription factors in Arabidopsis. , 2002, Trends in plant science.

[87]  R. Stracke,et al.  The R2R3-MYB gene family in Arabidopsis thaliana. , 2001, Current opinion in plant biology.

[88]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[89]  I. De Smet,et al.  The Strawberry Tales: Size Matters. , 2019, Trends in plant science.

[90]  A. Liston,et al.  Insights into phylogeny, sex function and age of Fragaria based on whole chloroplast genome sequencing. , 2013, Molecular phylogenetics and evolution.

[91]  Eugene W. Myers,et al.  PILER : identification and classification of genomic repeats , 2005 .

[92]  Pavel A. Pevzner,et al.  De novo identification of repeat families in large genomes , 2005, ISMB.