Genome-Wide Mapping of Structural Variations Reveals a Copy Number Variant That Determines Reproductive Morphology in Cucumber

Genome-wide scanning of large size sequence changes revealed a tandem duplication of a DNA segment that gives rise to cucumbers bearing only female flowers. Structural variations (SVs) represent a major source of genetic diversity. However, the functional impact and formation mechanisms of SVs in plant genomes remain largely unexplored. Here, we report a nucleotide-resolution SV map of cucumber (Cucumis sativas) that comprises 26,788 SVs based on deep resequencing of 115 diverse accessions. The largest proportion of cucumber SVs was formed through nonhomologous end-joining rearrangements, and the occurrence of SVs is closely associated with regions of high nucleotide diversity. These SVs affect the coding regions of 1676 genes, some of which are associated with cucumber domestication. Based on the map, we discovered a copy number variation (CNV) involving four genes that defines the Female (F) locus and gives rise to gynoecious cucumber plants, which bear only female flowers and set fruit at almost every node. The CNV arose from a recent 30.2-kb duplication at a meiotically unstable region, likely via microhomology-mediated break-induced replication. The SV set provides a snapshot of structural variations in plants and will serve as an important resource for exploring genes underlying key traits and for facilitating practical breeding in cucumber.

[1]  Zhao Xu,et al.  LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons , 2007, Nucleic Acids Res..

[2]  R. Hudson,et al.  Single-nucleotide mutation rate increases close to insertions/deletions in eukaryotes , 2008, Nature.

[3]  Kenny Q. Ye,et al.  Mapping copy number variation by population scale genome sequencing , 2010, Nature.

[4]  Dabing Zhang,et al.  Tuberculate fruit gene Tu encodes a C2 H2 zinc finger protein that is required for the warty fruit phenotype in cucumber (Cucumis sativus L.). , 2014, The Plant journal : for cell and molecular biology.

[5]  Jessica R. Wolff,et al.  Microduplications of 16p11.2 are Associated with Schizophrenia , 2009, Nature Genetics.

[6]  Nathan M. Springer,et al.  Distribution, functional impact, and origin mechanisms of copy number variation in the barley genome , 2013, Genome Biology.

[7]  R. R. Knopf,et al.  The female-specific Cs-ACS1G gene of cucumber. A case of gene duplication and recombination between the non-sex-specific 1-aminocyclopropane-1-carboxylate synthase gene and a branched-chain amino acid transaminase gene. , 2006, Plant & cell physiology.

[8]  Zhiwu Zhang,et al.  Mixed linear model approach adapted for genome-wide association studies , 2010, Nature Genetics.

[9]  Hugo Y. K. Lam,et al.  Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history. , 2008, Genome research.

[10]  Edward S. Buckler,et al.  TASSEL: software for association mapping of complex traits in diverse samples , 2007, Bioinform..

[11]  N. Siva 1000 Genomes project , 2008, Nature Biotechnology.

[12]  Dawn H. Nagel,et al.  The B73 Maize Genome: Complexity, Diversity, and Dynamics , 2009, Science.

[13]  Zhonghua Zhang,et al.  Genetic Diversity and Population Structure of Cucumber (Cucumis sativus L.) , 2012, PloS one.

[14]  Lovelace J. Luquette,et al.  Diverse Mechanisms of Somatic Structural Variations in Human Cancer Genomes , 2013, Cell.

[15]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[16]  Gary D Bader,et al.  Functional impact of global rare copy number variation in autism spectrum disorders , 2010, Nature.

[17]  Joshua M. Korn,et al.  Discovery and genotyping of genome structural polymorphism by sequencing on a population scale , 2011, Nature Genetics.

[18]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[19]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[20]  Hsien-Da Huang,et al.  Clusters of Nucleotide Substitutions and Insertion/Deletion Mutations Are Associated with Repeat Sequences , 2011, PLoS biology.

[21]  Rajeev K. Varshney,et al.  Structural variations in plant genomes , 2014, Briefings in functional genomics.

[22]  Alan Hodgkinson,et al.  Variation in the mutation rate across mammalian genomes , 2011, Nature Reviews Genetics.

[23]  M. Tanurdžić,et al.  Sex-Determining Mechanisms in Land Plants , 2004, The Plant Cell Online.

[24]  Monya Baker,et al.  Structural variation: the genome's hidden architecture , 2012, Nature Methods.

[25]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.

[26]  Huanming Yang,et al.  De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[27]  Emmanuel Barillot,et al.  SVDetect: a tool to identify genomic structural variations from paired-end and mate-pair sequencing data , 2010, Bioinform..

[28]  Peter J. Bradbury,et al.  Maize HapMap2 identifies extant variation from a genome in flux , 2012, Nature Genetics.

[29]  O. Shifriss Sex control in cucumbers , 1961 .

[30]  Thomas Zichner,et al.  Impact of genomic structural variation in Drosophila melanogaster based on population-scale sequencing , 2013, Genome research.

[31]  A. Cutter,et al.  Fine-Scale Signatures of Molecular Evolution Reconcile Models of Indel-Associated Mutation , 2013, Genome biology and evolution.

[32]  Kenny Q. Ye,et al.  Strong Association of De Novo Copy Number Mutations with Autism , 2007, Science.

[33]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[34]  J. Staub,et al.  Identification of a 1-Aminocyclopropane-1-Carboxylic Acid Synthase Gene Linked to the Female (F) Locus That Enhances Female Sex Expression in Cucumber , 1997, Plant physiology.

[35]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[36]  B. Sobral,et al.  Open Access Research Article Transcriptome Sequencing and Comparative Analysis of Cucumber Flowers with Different Sex Types , 2022 .

[37]  S. Hochreiter,et al.  cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate , 2012, Nucleic acids research.

[38]  Jeffrey Ross-Ibarra,et al.  Indel-associated mutation rate varies with mating system in flowering plants. , 2010, Molecular biology and evolution.

[39]  B. Yalcin,et al.  The fine-scale architecture of structural variants in 17 mouse genomes , 2012, Genome Biology.

[40]  Thomas M. Keane,et al.  Sequence-based characterization of structural variation in the mouse genome , 2011, Nature.

[41]  E. Stockinger,et al.  A Retrotransposon-Mediated Gene Duplication Underlies Morphological Variation of Tomato Fruit , 2008, Science.

[42]  Xinghua Wei,et al.  Genome-wide copy number variations in Oryza sativa L. , 2013, BMC Genomics.

[43]  Sharon J. Diskin,et al.  Copy number variation at 1q21.1 associated with neuroblastoma , 2009, Nature.

[44]  Thomas W. Mühleisen,et al.  Large recurrent microdeletions associated with schizophrenia , 2008, Nature.

[45]  Ying Li,et al.  A genomic variation map provides insights into the genetic basis of cucumber domestication and diversity , 2013, Nature Genetics.

[46]  Kui Lin,et al.  RNA-Seq improves annotation of protein-coding genes in the cucumber genome , 2011, BMC Genomics.

[47]  Justin E. Anderson,et al.  Structural Variants in the Soybean Genome Localize to Clusters of Biotic Stress-Response Genes1[W][OA] , 2012, Plant Physiology.

[48]  R. Cai,et al.  Identification and mapping of molecular markers linked to the tuberculate fruit gene in the cucumber (Cucumis sativus L.) , 2010, Theoretical and Applied Genetics.

[49]  Karsten M. Borgwardt,et al.  Whole-genome sequencing of multiple Arabidopsis thaliana populations , 2011, Nature Genetics.

[50]  Laurie G. Smith,et al.  Two Kinesins Are Involved in the Spatial Control of Cytokinesis in Arabidopsis thaliana , 2006, Current Biology.

[51]  D. Inzé,et al.  Impact of segmental chromosomal duplications on leaf size in the grandifolia-D mutants of Arabidopsis thaliana. , 2009, The Plant journal : for cell and molecular biology.

[52]  J. Lupski,et al.  A DNA Replication Mechanism for Generating Nonrecurrent Rearrangements Associated with Genomic Disorders , 2007, Cell.

[53]  Asan,et al.  The genome of the cucumber, Cucumis sativus L. , 2009, Nature Genetics.

[54]  Rod A Wing,et al.  Aluminum tolerance in maize is associated with higher MATE1 gene copy number , 2013, Proceedings of the National Academy of Sciences.