De novo assembly of the zucchini genome reveals a whole‐genome duplication associated with the origin of the Cucurbita genus

Summary The Cucurbita genus (squashes, pumpkins and gourds) includes important domesticated species such as C. pepo, C. maxima and C. moschata. In this study, we present a high‐quality draft of the zucchini (C. pepo) genome. The assembly has a size of 263 Mb, a scaffold N50 of 1.8 Mb and 34 240 gene models. It includes 92% of the conserved BUSCO core gene set, and it is estimated to cover 93.0% of the genome. The genome is organized in 20 pseudomolecules that represent 81.4% of the assembly, and it is integrated with a genetic map of 7718 SNPs. Despite the small genome size, three independent lines of evidence support that the C. pepo genome is the result of a whole‐genome duplication: the topology of the gene family phylogenies, the karyotype organization and the distribution of 4DTv distances. Additionally, 40 transcriptomes of 12 species of the genus were assembled and analysed together with all the other published genomes of the Cucurbitaceae family. The duplication was detected in all the Cucurbita species analysed, including C. maxima and C. moschata, but not in the more distant cucurbits belonging to the Cucumis and Citrullus genera, and it is likely to have occurred 30 ± 4 Mya in the ancestral species that gave rise to the genus.

[1]  Jian Wang,et al.  Genome sequence of foxtail millet (Setaria italica) provides insights into grass evolution and biofuel potential , 2012, Nature Biotechnology.

[2]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[3]  Susan R. Wessler,et al.  MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences , 2010, Nucleic acids research.

[4]  M. Gribskov,et al.  The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray) , 2006, Science.

[5]  Andrew J. Alverson,et al.  Chloroplast phylogeny of Cucurbita: Evolution of the domesticated and wild species , 2013 .

[6]  Stefan Kurtz,et al.  LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons , 2008, BMC Bioinformatics.

[7]  M. Donoghue,et al.  Rates of Molecular Evolution Are Linked to Life History in Flowering Plants , 2008, Science.

[8]  M. Mazourek,et al.  Cultivar-Based Introgression Mapping Reveals Wild Species-Derived Pm-0, the Major Powdery Mildew Resistance Locus in Squash , 2016, PloS one.

[9]  Julian Taylor,et al.  R package ASMap: Efficient genetic linkage map construction and diagnosis , 2017, 1705.06916.

[10]  Iain M. Wallace,et al.  M-Coffee: combining multiple sequence alignment methods with T-Coffee , 2006, Nucleic acids research.

[11]  Yasuko Takahashi,et al.  Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events , 2022 .

[12]  Stefan Götz,et al.  Blast2GO: A Comprehensive Suite for Functional Analysis in Plant Genomics , 2007, International journal of plant genomics.

[13]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[14]  H. Paris,et al.  Parallel Evolution Under Domestication and Phenotypic Differentiation of the Cultivated Subspecies of Cucurbita pepo (Cucurbitaceae) , 2012, Economic Botany.

[15]  KingsfordCarl,et al.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011 .

[16]  Rolf Apweiler,et al.  InterPro and InterProScan , 2007 .

[17]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[18]  Katharina J. Hoff,et al.  WebAUGUSTUS—a web service for training AUGUSTUS and predicting genes in eukaryotes , 2013, Nucleic Acids Res..

[19]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[20]  Dawn H. Nagel,et al.  The B73 Maize Genome: Complexity, Diversity, and Dynamics , 2009, Science.

[21]  Haibao Tang,et al.  Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps. , 2008, Genome research.

[22]  Jun Wang,et al.  A host plant genome (Zizania latifolia) after a century-long endophyte infection. , 2015, The Plant journal : for cell and molecular biology.

[23]  Genetic relationships in Cucurbita pepo (pumpkin, squash, gourd) as viewed with high frequency oligonucleotide–targeting active gene (HFO–TAG) markers , 2015, Genetic Resources and Crop Evolution.

[24]  Ben Shneiderman,et al.  Hawkeye: an interactive visual analytics tool for genome assemblies , 2007, Genome Biology.

[25]  Bruce D. Smith The Initial Domestication of Cucurbita pepo in the Americas 10,000 Years Ago , 1997 .

[26]  Xin Gao,et al.  Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups. , 2011, Current protocols in bioinformatics.

[27]  L. Mueller,et al.  An acorn squash (Cucurbita pepo ssp. ovifera) fruit and seed transcriptome as a resource for the study of fruit traits in Cucurbita , 2015, Horticulture Research.

[28]  Richard M. Leggett,et al.  NextClip: an analysis and read preparation tool for Nextera Long Mate Pair libraries , 2013, Bioinform..

[29]  E. Sonnhammer,et al.  Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features , 2008, Nucleic acids research.

[30]  M. Gouy,et al.  Genome-scale coestimation of species and gene trees , 2013, Genome research.

[31]  R. Kofler,et al.  Microsatellites for the genus Cucurbita and an SSR-based genetic linkage map of Cucurbita pepo L. , 2008, Theoretical and Applied Genetics.

[32]  H. D. Wilson,et al.  Allozyme differentiation in theCucurbita pepo complex:C. pepo var.medullosa vs.C. texana , 1985, Economic Botany.

[33]  C. Jung,et al.  Flowering time regulation in crops—what did we learn from Arabidopsis? , 2015, Current opinion in biotechnology.

[34]  Unraveling zucchini transcriptome response to aphids , 2016 .

[35]  Per Capita,et al.  About the authors , 1995, Machine Vision and Applications.

[36]  Gabor T. Marth,et al.  Haplotype-based variant detection from short-read sequencing , 2012, 1207.3907.

[37]  O. Gascuel,et al.  Estimating maximum likelihood phylogenies with PhyML. , 2009, Methods in molecular biology.

[38]  Bruce D. Smith,et al.  Eastern North America as an independent center of plant domestication , 2006, Proceedings of the National Academy of Sciences.

[39]  Robert J. Elshire,et al.  A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species , 2011, PloS one.

[40]  An SNP-based saturated genetic map and QTL analysis of fruit-related traits in Zucchini using Genotyping-by-sequencing , 2017, BMC Genomics.

[41]  D. Soltis,et al.  Evolutionary and domestication history of Cucurbita (pumpkin and squash) species inferred from 44 nuclear loci. , 2017, Molecular phylogenetics and evolution.

[42]  S. Kurtz,et al.  Fine-grained annotation and classification of de novo predicted LTR retrotransposons , 2009, Nucleic acids research.

[43]  V. Hemleben,et al.  Molecular evolution of the internal transcribed spacers (ITS1 and ITS2) and phylogenetic relationships among species of the family Cucurbitaceae. , 1998, Molecular phylogenetics and evolution.

[44]  C. V. Jongeneel,et al.  ESTScan: A Program for Detecting, Evaluating, and Reconstructing Potential Coding Regions in EST Sequences , 1999, ISMB.

[45]  J. Chen,et al.  The sacred lotus genome provides insights into the evolution of flowering plants. , 2013, The Plant journal : for cell and molecular biology.

[46]  M. Jamilena,et al.  Involvement of ethylene biosynthesis and signalling in fruit set and early fruit development in zucchini squash (Cucurbita pepo L.) , 2013, BMC Plant Biology.

[47]  Olga Chernomor,et al.  Terrace Aware Data Structure for Phylogenomic Inference from Supermatrices , 2016, Systematic biology.

[48]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[49]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[50]  R. Guigó,et al.  The genome of melon (Cucumis melo L.) , 2012, Proceedings of the National Academy of Sciences.

[51]  J. Cañizares,et al.  Transcriptome characterization and high throughput SSRs and SNPs discovery in Cucurbita pepo (Cucurbitaceae) , 2011, BMC Genomics.

[52]  Carolyn J. Lawrence-Dill,et al.  MAKER-P: A Tool Kit for the Rapid Creation, Management, and Quality Control of Plant Genome Annotations1[W][OPEN] , 2013, Plant Physiology.

[53]  S. Renner,et al.  Gourds afloat: a dated phylogeny reveals an Asian origin of the gourd family (Cucurbitaceae) and numerous oversea dispersal events , 2009, Proceedings of the Royal Society B: Biological Sciences.

[54]  Dolores R. Piperno,et al.  Phylogenetic relationships among domesticated and wild species of Cucurbita (Cucurbitaceae) inferred from a mitochondrial gene: Implications for crop plant evolution and areas of origin , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Belén Picó,et al.  High-throughput SNP genotyping in Cucurbita pepo for map construction and quantitative trait loci mapping , 2012, BMC Genomics.

[56]  S. Renner,et al.  A multi-locus chloroplast phylogeny for the Cucurbitaceae and its implications for character evolution and classification. , 2007, Molecular phylogenetics and evolution.

[57]  Brad A. Chapman,et al.  Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events , 2003, Nature.

[58]  J. Myers,et al.  A Genetic Map of Squash (Cucurbita sp.) with Randomly Amplified Polymorphic DNA Markers and Morphological Markers , 2002 .

[59]  T. Sakurai,et al.  Genome sequence of the palaeopolyploid soybean , 2010, Nature.

[60]  Matko Bosnjak,et al.  REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms , 2011, PloS one.

[61]  Hao Wu,et al.  R/qtl: QTL Mapping in Experimental Crosses , 2003, Bioinform..

[62]  J. Doyle,et al.  Isolation of plant DNA from fresh tissue , 1990 .

[63]  B. Bohanec,et al.  Genome size analysis in the genus Cucurbita and its use for determination of interspecific hybrids obtained using the embryo-rescue technique , 2003 .

[64]  Walter Pirovano,et al.  BIOINFORMATICS APPLICATIONS , 2022 .

[65]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[66]  T. Lelley,et al.  A consensus map for Cucurbita pepo , 2007, Molecular Breeding.

[67]  Paolo Di Tommaso,et al.  TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction. , 2014, Molecular biology and evolution.

[68]  N. Mulder,et al.  InterPro and InterProScan: tools for protein sequence classification and comparison. , 2007, Methods in molecular biology.

[69]  José M. Sempere,et al.  The Gypsy Database (GyDB) of mobile genetic elements: release 2.0 , 2010, Nucleic Acids Res..

[70]  Minh Anh Nguyen,et al.  Ultrafast Approximation for Phylogenetic Bootstrap , 2013, Molecular biology and evolution.

[71]  W. J. Lucas,et al.  The draft genome of watermelon (Citrullus lanatus) and resequencing of 20 diverse accessions , 2012, Nature Genetics.

[72]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[73]  J. Doebley,et al.  Chloroplast DNA diversity among wild and cultivated members of Cucurbita (Cucurbitaceae) , 1992, Theoretical and Applied Genetics.

[74]  D. Posada jModelTest: phylogenetic model averaging. , 2008, Molecular biology and evolution.

[75]  H. Paris,et al.  Genetic map for pumpkin Cucurbita pepo using random amplified polymorphic DNA markers. , 2004 .

[76]  G. Perry,et al.  Gourds and squashes (Cucurbita spp.) adapted to megafaunal extinction and ecological anachronism through domestication , 2015, Proceedings of the National Academy of Sciences.

[77]  Asan,et al.  The genome of the cucumber, Cucumis sativus L. , 2009, Nature Genetics.

[78]  K. Hong,et al.  Use of random amplified polymorphic DNAs for linkage group analysis in interspecific hybrid F2 generation of Cucurbita , 1995 .

[79]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[80]  Ian Korf,et al.  Gene finding in novel genomes , 2004, BMC Bioinformatics.

[81]  De novo transcriptome assembly of two contrasting pumpkin cultivars , 2016, Genomics data.

[82]  Sébastien Tempel Using and understanding RepeatMasker. , 2012, Methods in molecular biology.

[83]  W. J. Lucas,et al.  FLOWERING LOCUS T Protein May Act as the Long-Distance Florigenic Signal in the Cucurbits[W] , 2007, The Plant Cell Online.

[84]  R. W. Robinson,et al.  Biology and Utilization of the Cucurbitaceae , 1990 .

[85]  Carl Kingsford,et al.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011, Bioinform..

[86]  M. Ercolano,et al.  Genetic diversity of Spanish Cucurbita pepo landraces: an unexploited resource for summer squash breeding , 2012, Genetic Resources and Crop Evolution.