Sequence analysis of European maize inbred line F2 provides new insights into molecular and chromosomal characteristics of presence/absence variants

BackgroundMaize is well known for its exceptional structural diversity, including copy number variants (CNVs) and presence/absence variants (PAVs), and there is growing evidence for the role of structural variation in maize adaptation. While PAVs have been described in this important crop species, they have been only scarcely characterized at the sequence level and the extent of presence/absence variation and relative chromosomal landscape of inbred-specific regions remain to be elucidated.ResultsDe novo genome sequencing of the French F2 maize inbred line revealed 10,044 novel genomic regions larger than 1 kb, making up 88 Mb of DNA, that are present in F2 but not in B73 (PAV). This set of maize PAV sequences allowed us to annotate PAV content and to analyze sequence breakpoints. Using PAV genotyping on a collection of 25 temperate lines, we also analyzed Linkage Disequilibrium in PAVs and flanking regions, and PAV frequencies within maize genetic groups.ConclusionsWe highlight the possible role of MMEJ-type double strand break repair in maize PAV formation and discover 395 new genes with transcriptional support. Pattern of linkage disequilibrium within PAVs strikingly differs from this of flanking regions and is in accordance with the intuition that PAVs may recombine less than other genomic regions. We show that most PAVs are ancient, while some are found only in European Flint material, thus pinpointing structural features that may be at the origin of adaptive traits involved in the success of this material. Characterization of such PAVs will provide useful material for further association genetic studies in European and temperate maize.

[1]  M. Schatz,et al.  Genome assembly forensics: finding the elusive mis-assembly , 2008, Genome Biology.

[2]  J. Lupski Structural variation in the human genome. , 2007, The New England journal of medicine.

[3]  N. Young,et al.  Exploring structural variants in environmentally sensitive gene families. , 2016, Current opinion in plant biology.

[4]  Jian Wang,et al.  Genome-wide patterns of genetic variation among elite maize inbred lines , 2010, Nature Genetics.

[5]  Shuai Li,et al.  Genome-Wide Mapping of Structural Variations Reveals a Copy Number Variant That Determines Reproductive Morphology in Cucumber , 2015, Plant Cell.

[6]  Peter J. Bradbury,et al.  High-resolution genetic mapping of maize pan-genome sequence anchors , 2015, Nature Communications.

[7]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[8]  Kevin L. Childs,et al.  Draft Assembly of Elite Inbred Line PH207 Provides Insights into Genomic and Transcriptome Diversity in Maize[OPEN] , 2016, Plant Cell.

[9]  Justin E. Anderson,et al.  Structural Variants in the Soybean Genome Localize to Clusters of Biotic Stress-Response Genes1[W][OA] , 2012, Plant Physiology.

[10]  Vipin T. Sreedharan,et al.  Multiple reference genomes and transcriptomes for Arabidopsis thaliana , 2011, Nature.

[11]  André Beló,et al.  Allelic genome structural variations in maize detected by array comparative genome hybridization , 2009, Theoretical and Applied Genetics.

[12]  M. Tenaillon,et al.  A European perspective on maize history. , 2011, Comptes rendus biologies.

[13]  M. A. Pedraza,et al.  Insights into the Maize Pan-Genome and Pan-Transcriptome[W][OPEN] , 2014, Plant Cell.

[14]  J. Bennetzen,et al.  Analysis of retrotransposon structural diversity uncovers properties and propensities in angiosperm genome evolution , 2006, Proceedings of the National Academy of Sciences.

[15]  Walter Pirovano,et al.  BIOINFORMATICS APPLICATIONS , 2022 .

[16]  Mark Gerstein,et al.  AGE: defining breakpoints of genomic structural variants at single-nucleotide resolution, through optimal alignments with gap excision , 2011, Bioinform..

[17]  Brandon S Gaut,et al.  Molecular and functional diversity of maize. , 2006, Current opinion in plant biology.

[18]  J. Joets,et al.  Independent introductions and admixtures have contributed to adaptation of European maize and its American counterparts , 2017, PLoS genetics.

[19]  Joachim Messing,et al.  Organization and variability of the maize genome. , 2006, Current opinion in plant biology.

[20]  P. Stankiewicz,et al.  Structural variation in the human genome and its role in disease. , 2010, Annual review of medicine.

[21]  Ira M. Hall,et al.  YAHA: fast and flexible long-read alignment with optimal breakpoint detection , 2012, Bioinform..

[22]  O. Martin,et al.  A Large Maize (Zea mays L.) SNP Genotyping Array: Development and Germplasm Genotyping, and Genetic Mapping to Compare with the B73 Reference Genome , 2011, PloS one.

[23]  Andrew J Sharp,et al.  Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome , 2006, Nature Genetics.

[24]  Lin Fang,et al.  Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes , 2011, Nature Biotechnology.

[25]  M. McVey,et al.  MMEJ repair of double-strand breaks (director's cut): deleted sequences and alternative endings. , 2008, Trends in genetics : TIG.

[26]  D. Laurie,et al.  Copy Number Variation Affecting the Photoperiod-B1 and Vernalization-A1 Genes Is Associated with Altered Flowering Time in Wheat (Triticum aestivum) , 2012, PloS one.

[27]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[28]  Jeffrey Ross-Ibarra,et al.  Improved maize reference genome with single-molecule technologies , 2017, Nature.

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[30]  W. L. Brown Numbers and Distribution of Chromosome Knobs in United States Maize. , 1949, Genetics.

[31]  H. Dooner,et al.  Dynamic evolution of bz orthologous regions in the Andropogoneae and other grasses. , 2012, The Plant Journal.

[32]  Peter J. Bradbury,et al.  Maize HapMap2 identifies extant variation from a genome in flux , 2012, Nature Genetics.

[33]  Dawn H. Nagel,et al.  The B73 Maize Genome: Complexity, Diversity, and Dynamics , 2009, Science.

[34]  Robert A. Edwards,et al.  Quality control and preprocessing of metagenomic datasets , 2011, Bioinform..

[35]  M. Kreitman,et al.  A Genome-Wide Survey of R Gene Polymorphisms in Arabidopsis[W] , 2006, The Plant Cell Online.

[36]  Sébastien Lê,et al.  FactoMineR: An R Package for Multivariate Analysis , 2008 .

[37]  Nathan M. Springer,et al.  Distribution, functional impact, and origin mechanisms of copy number variation in the barley genome , 2013, Genome Biology.

[38]  Rajeev K. Varshney,et al.  Structural variations in plant genomes , 2014, Briefings in functional genomics.

[39]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[40]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[41]  Jiming Jiang,et al.  Fluorescence In Situ Hybridization Analysis Reveals Multiple Loci of Knob-associated DNA Elements in One-knob and Knobless Maize Lines , 2004, The journal of histochemistry and cytochemistry : official journal of the Histochemistry Society.

[42]  B. Mcclintock,et al.  Chromosome constitution of races of maize. Its significance in the interpretation of relationships between races and varieties in the Americas , 1981 .

[43]  D. Marie,et al.  A cytometric exercise in plant DNA histograms, with 2C values for 70 species , 1993, Biology of the cell.

[44]  Jian-Qun Chen,et al.  Unique Evolutionary Mechanism in R-Genes Under the Presence/Absence Polymorphism in Arabidopsis thaliana , 2006, Genetics.

[45]  Rod A Wing,et al.  Aluminum tolerance in maize is associated with higher MATE1 gene copy number , 2013, Proceedings of the National Academy of Sciences.

[46]  R. Edwards,et al.  Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets , 2011, PloS one.

[47]  Carson C Chow,et al.  Second-generation PLINK: rising to the challenge of larger and richer datasets , 2014, GigaScience.

[48]  H. Puchta,et al.  DNA recombination in somatic plant cells: mechanisms and evolutionary consequences , 2014, Chromosome Research.

[49]  W. Pirovano,et al.  Toward almost closed genomes with GapFiller , 2012, Genome Biology.

[50]  J. Batley,et al.  Towards plant pangenomics. , 2016, Plant biotechnology journal.

[51]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[52]  Michele Morgante,et al.  Evolution of DNA Sequence Nonhomologies among Maize Inbredsw⃞ , 2005, The Plant Cell Online.

[53]  H. Fu,et al.  Intraspecific violation of genetic colinearity and its implications in maize , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[55]  Keith Bradnam,et al.  CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes , 2007, Bioinform..

[56]  Steven Salzberg,et al.  BIOINFORMATICS ORIGINAL PAPER , 2004 .

[57]  P. SanMiguel,et al.  The LTR-Retrotransposons of Maize , 2009 .

[58]  Omer Gokcumen,et al.  Exploring the role of copy number variants in human adaptation. , 2012, Trends in genetics : TIG.

[59]  Doreen Ware,et al.  Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica , 2014, Genome Biology.

[60]  Michele Morgante,et al.  Transposable elements and the plant pan-genomes. , 2007, Current opinion in plant biology.

[61]  James C. Schnable,et al.  Following Tetraploidy in Maize, a Short Deletion Mechanism Removed Genes Preferentially from One of the Two Homeologs , 2010, PLoS biology.

[62]  P. Schnable,et al.  Unequal Sister Chromatid and Homolog Recombination at a Tandem Duplication of the a1 Locus in Maize , 2006, Genetics.

[63]  Sarah Hake,et al.  Handbook of Maize , 2009 .

[64]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[65]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[66]  T. Winzer,et al.  A Papaver somniferum 10-Gene Cluster for Synthesis of the Anticancer Alkaloid Noscapine , 2012, Science.

[67]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[68]  Michele Morgante,et al.  Structural variation and genome complexity: is dispensable really dispensable? , 2014, Current opinion in plant biology.

[69]  S. Dellaporta,et al.  Meiotic instability of the R-r complex arising from displaced intragenic exchange and intrachromosomal rearrangement. , 1991, Genetics.

[70]  O. Kohany,et al.  Repbase Update, a database of repetitive elements in eukaryotic genomes , 2015, Mobile DNA.

[71]  J. Veyrieras,et al.  Maize Adaptation to Temperate Climate: Relationship Between Population Structure and Polymorphism in the Dwarf8 Gene , 2006, Genetics.

[72]  D. K. Willis,et al.  Copy Number Variation of Multiple Genes at Rhg1 Mediates Nematode Resistance in Soybean , 2012, Science.

[73]  Martin Goodson,et al.  Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. , 2011, Genome research.

[74]  Patrick S. Schnable,et al.  Maize Inbreds Exhibit High Levels of Copy Number Variation (CNV) and Presence/Absence Variation (PAV) in Genome Content , 2009, PLoS genetics.

[75]  Matthew Fraser,et al.  InterProScan 5: genome-scale protein function classification , 2014, Bioinform..

[76]  High presence/absence gene variability in defense-related gene clusters of Cucumis melo , 2013, BMC Genomics.

[77]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[78]  Peter Tiffin,et al.  Pervasive gene content variation and copy number variation in maize and its undomesticated progenitor. , 2010, Genome research.