De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes

An a-maize-ing set of genomes Maize is an important crop that is cultivated worldwide. As maize spread across the world, selection for local environments resulted in variation, but the impact on differences between the genome has not been quantified. By producing high-quality genomic sequences of the 26 lines used in the maize nested association mapping panel, Hufford et al. map important traits and demonstrate the diversity of maize. Examining RNA and methylation of genes across accessions, the authors identified a core set of maize genes. Beyond this core set, comparative analysis across lines identified high levels of variation in the total set of genes, the maize pan-genome. The value of this resource was further exemplified by mapping quantitative traits of interest, including those related to pathogen resistance. Science, abg5289, this issue p. 655 A multigenome analysis of maize reveals variation in gene content, genome structure, and methylation. We report de novo genome assemblies, transcriptomes, annotations, and methylomes for the 26 inbreds that serve as the founders for the maize nested association mapping population. The number of pan-genes in these diverse genomes exceeds 103,000, with approximately a third found across all genotypes. The results demonstrate that the ancient tetraploid character of maize continues to degrade by fractionation to the present day. Excellent contiguity over repeat arrays and complete annotation of centromeres revealed additional variation in major cytological landmarks. We show that combining structural variation with single-nucleotide polymorphisms can improve the power of quantitative mapping studies. We also document variation at the level of DNA methylation and demonstrate that unmethylated regions are enriched for cis-regulatory elements that contribute to phenotypic variation.

[1]  William A. Ricci Unmethylated Regions Encompass The Functional Space Within The Maize Genome , 2021, bioRxiv.

[2]  Irene Papatheodorou,et al.  Gramene 2021: harnessing the power of comparative genomics and pathways for plant research , 2020, Nucleic Acids Res..

[3]  Andrew J. Olson,et al.  Ranked choice voting for representative transcripts with TRaCE , 2020, bioRxiv.

[4]  R. Dawe,et al.  Maize centromeric chromatin scales with changes in genome size , 2020, bioRxiv.

[5]  Nathan M. Springer,et al.  Chromosome-level genome assembly of a regenerable maize inbred line A188 , 2020, Genome Biology.

[6]  R. Dawe,et al.  Distinct kinesin motors drive two types of maize neocentromeres , 2020, Genes & development.

[7]  K. Mayer,et al.  European maize genomes highlight intraspecies variation in repeat and gene content , 2020, Nature Genetics.

[8]  Peter J. Bradbury,et al.  Constrained non-coding sequence provides insights into regulatory elements and loss of gene expression in maize , 2020, bioRxiv.

[9]  Eric S. Lander,et al.  Mapping and characterization of structural variation in 17,795 human genomes , 2020, Nature.

[10]  Robert J. Schmitz,et al.  Stable unmethylated DNA demarcates expressed genes and their cis-regulatory space in plant genomes , 2020, Proceedings of the National Academy of Sciences.

[11]  Arun S. Seetharam,et al.  Gapless assembly of maize chromosomes using long-read technologies , 2020, Genome Biology.

[12]  Nathan M. Springer,et al.  Evolutionary and functional genomics of DNA methylation in maize domestication and improvement , 2020, Nature Communications.

[13]  H. Kanamori,et al.  Evolutionary dynamics and impacts of chromosome regions carrying R-gene clusters in rice , 2020, Scientific Reports.

[14]  Eve Syrkin Wurtele,et al.  Maximizing prediction of orphan genes in assembled genomes , 2019 .

[15]  Arun S. Seetharam,et al.  Effect of sequence depth and length in long-read assembly of the maize inbred NC358 , 2019, bioRxiv.

[16]  R. Irizarry ggplot2 , 2019, Introduction to Data Science.

[17]  Robert J. Schmitz,et al.  Widespread Long-range Cis-Regulatory Elements in the Maize Genome , 2019, Nature Plants.

[18]  Shujun Ou,et al.  TEsorter: lineage-level classification of transposable elements using conserved protein domains , 2019, bioRxiv.

[19]  Carolyn J. Lawrence-Dill,et al.  GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations , 2019, BMC Genomics.

[20]  Jonathan D. G. Jones,et al.  A Species-Wide Inventory of NLR Genes and Alleles in Arabidopsis thaliana , 2019, Cell.

[21]  Shujun Ou,et al.  LTR_FINDER_parallel: parallelization of LTR_FINDER enabling rapid identification of long terminal repeat retrotransposons , 2019, Mobile DNA.

[22]  Thomas Peterson,et al.  Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline , 2019, Genome Biology.

[23]  Jianbing Yan,et al.  Genome assembly of a tropical maize inbred line provides insights into structural variation and crop improvement , 2019, Nature Genetics.

[24]  Candice N. Hirsch,et al.  Using multiple reference genomes to identify and resolve annotation inconsistencies , 2019, BMC Genomics.

[25]  Mei-Ju May Chen,et al.  The GFF3toolkit: QC and Merge Pipeline for Genome Annotation. , 2019, Methods in molecular biology.

[26]  Mario Stanke,et al.  Whole-Genome Annotation with BRAKER. , 2019, Methods in molecular biology.

[27]  John M. Gaspar,et al.  Improved peak-calling with MACS2 , 2018, bioRxiv.

[28]  D. Swarbreck,et al.  Efficient and accurate detection of splice junctions from RNA-seq with Portcullis , 2017, bioRxiv.

[29]  Chad L. Myers,et al.  Integrating Coexpression Networks with GWAS to Prioritize Causal Genes in Maize[OPEN] , 2017, Plant Cell.

[30]  Philipp W. Messer,et al.  SLiM 3: Forward Genetic Simulations Beyond the Wright–Fisher Model , 2018, bioRxiv.

[31]  Shujun Ou,et al.  Assessing genome assembly quality using the LTR Assembly Index (LAI) , 2018, Nucleic acids research.

[32]  Daniel L. Vera,et al.  The maize W22 genome provides a foundation for functional genomics and transposon biology , 2018, Nature Genetics.

[33]  P. Schnable,et al.  Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes , 2018, Nature Genetics.

[34]  Jonathan D. G. Jones,et al.  Physical and transcriptional organisation of the bread wheat intracellular immune receptor repertoire , 2018, bioRxiv.

[35]  Stéphane Deschamps,et al.  A chromosome-scale assembly of the sorghum genome using nanopore sequencing and optical mapping , 2018, Nature Communications.

[36]  Michelle C. Stitzer,et al.  A Kinesin-14 Motor Activates Neocentromeres to Promote Meiotic Drive in Maize , 2018, Cell.

[37]  M. Freeling,et al.  Gene retention, fractionation and subgenome differences in polyploid plants , 2018, Nature Plants.

[38]  W. McCombie,et al.  A comparative transcriptional landscape of maize and sorghum obtained by single-molecule sequencing , 2018, Genome research.

[39]  Jia Gu,et al.  fastp: an ultra-fast all-in-one FASTQ preprocessor , 2018, bioRxiv.

[40]  Ping Zhu,et al.  CGmapTools improves the precision of heterozygous SNV calls and supports allele‐specific methylation detection and visualization in bisulfite‐sequencing data , 2018, Bioinform..

[41]  Adam M. Phillippy,et al.  MUMmer4: A fast and versatile genome alignment system , 2018, PLoS Comput. Biol..

[42]  Chun Jimmie Ye,et al.  Multiplexed droplet single-cell RNA-sequencing using natural genetic variation , 2017, Nature Biotechnology.

[43]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[44]  Alex B. Brohammer,et al.  Limited role of differential fractionation in genome content variation and function in maize (Zea mays L.) inbred lines , 2017, bioRxiv.

[45]  W. Jin,et al.  ZmCCT9 enhances maize adaptation to higher latitudes , 2017, Proceedings of the National Academy of Sciences.

[46]  Shujun Ou,et al.  LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons1[OPEN] , 2017, Plant Physiology.

[47]  Mauricio O. Carneiro,et al.  Scaling accurate genetic variant discovery to tens of thousands of samples , 2017, bioRxiv.

[48]  Shabhonam Caim,et al.  Leveraging multiple transcriptome assembly methods for improved gene structure annotation , 2017, bioRxiv.

[49]  W. Haerty,et al.  Dominant integration locus drives continuous diversification of plant immune receptors with exogenous domain fusions , 2017, Genome Biology.

[50]  Michael C. Schatz,et al.  Accurate detection of complex structural variations using single molecule sequencing , 2017, Nature Methods.

[51]  Nathan M. Springer,et al.  Genome-wide mapping of transcriptional enhancer candidates using DNA and chromatin features in maize , 2017, Genome Biology.

[52]  J. Birchler,et al.  Parallel altitudinal clines reveal trends in adaptive evolution of genome size in Zea mays , 2017, bioRxiv.

[53]  Zhou Du,et al.  agriGO v2.0: a GO analysis toolkit for the agricultural community, 2017 update , 2017, Nucleic Acids Res..

[54]  Robert J. Schmitz,et al.  Gene body DNA methylation in plants. , 2017, Current opinion in plant biology.

[55]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[56]  Ryan F. McCormick,et al.  The Sorghum bicolor reference genome: improved assembly and annotations, a transcriptome atlas, and signatures of genome organization , 2017, bioRxiv.

[57]  Niranjan Nagarajan,et al.  Fast and accurate de novo genome assembly from long uncorrected reads. , 2017, Genome research.

[58]  Kiyoshi Asai,et al.  Training alignment parameters for arbitrary sequencers with LAST-TRAIN , 2016, Bioinform..

[59]  Kevin L. Childs,et al.  Draft Assembly of Elite Inbred Line PH207 Provides Insights into Genomic and Transcriptome Diversity in Maize[OPEN] , 2016, Plant Cell.

[60]  F. Balloux,et al.  Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast , 2016, Nature Communications.

[61]  Stefan R. Henz,et al.  Epigenomic Diversity in a Global Collection of Arabidopsis thaliana Accessions , 2016, Cell.

[62]  Tyson A. Clark,et al.  Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing , 2016, Nature Communications.

[63]  M. Schatz,et al.  Phased diploid genome assembly with single-molecule real-time sequencing , 2016, Nature Methods.

[64]  Peter J. Bradbury,et al.  Identification of genetic variants associated with maize flowering time using an extremely large multi-genetic background population. , 2016, The Plant journal : for cell and molecular biology.

[65]  Peer Bork,et al.  Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees , 2016, Nucleic Acids Res..

[66]  Julie A. Dickerson,et al.  Strawberry: Fast and accurate genome-guided transcript reconstruction and quantification from RNA-Seq , 2016, bioRxiv.

[67]  S. Sabunciyan,et al.  CLASS2: accurate and efficient splice variant annotation from RNA-seq reads , 2014, bioRxiv.

[68]  Jonathan D. G. Jones,et al.  Comparative analysis of plant immune receptor architectures uncovers host proteins likely targeted by pathogens , 2016, BMC Biology.

[69]  Kevin L. Schneider,et al.  Inbreeding drives maize centromere evolution , 2016, Proceedings of the National Academy of Sciences.

[70]  Timothy M. Beissinger,et al.  Recent demography drives changes in linked selection across the maize genome , 2015, Nature Plants.

[71]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[72]  Cheng He,et al.  Maize pan-transcriptome provides novel insights into genome complexity and quantitative trait variation , 2015, Scientific Reports.

[73]  A. Paterson,et al.  Genome Alignment Spanning Major Poaceae Lineages Reveals Heterogeneous Evolutionary Rates and Alters Inferred Dates for Key Evolutionary Events. , 2015, Molecular plant.

[74]  Risa Kawaguchi,et al.  Split-alignment of genomes finds orthologies more accurately , 2015, Genome Biology.

[75]  Peter J. Bradbury,et al.  High-resolution genetic mapping of maize pan-genome sequence anchors , 2015, Nature Communications.

[76]  S. Salzberg,et al.  StringTie enables improved reconstruction of a transcriptome from RNA-seq reads , 2015, Nature Biotechnology.

[77]  Edwin Cuppen,et al.  Sambamba: fast processing of NGS alignment formats , 2015, Bioinform..

[78]  Peter J. Bradbury,et al.  Joint-multiple family linkage analysis predicts within-family variation better than single-family analysis of the maize nested association mapping population , 2015, Heredity.

[79]  James C. Schnable,et al.  ALLMAPS: robust scaffold ordering based on multiple maps , 2015, Genome Biology.

[80]  Jikai Lei,et al.  Automated Update, Revision, and Quality Control of the Maize Genome Annotations Using MAKER-P Improves the B73 RefGen_v3 Gene Models and Identifies New Genes1[OPEN] , 2014, Plant Physiology.

[81]  Peter J. Bradbury,et al.  Association Mapping across Numerous Traits Reveals Patterns of Functional Variation in Maize , 2014, bioRxiv.

[82]  Fidel Ramírez,et al.  deepTools: a flexible platform for exploring deep-sequencing data , 2014, Nucleic Acids Res..

[83]  Martin C. Frith,et al.  Improved search heuristics find 20 000 new alignments between human and mouse genomes , 2014, Nucleic acids research.

[84]  Matthew Fraser,et al.  InterProScan 5: genome-scale protein function classification , 2014, Bioinform..

[85]  M. A. Pedraza,et al.  Insights into the Maize Pan-Genome and Pan-Transcriptome[W][OPEN] , 2014, Plant Cell.

[86]  Michael Q. Zhang,et al.  BS-Seeker2: a versatile aligning pipeline for bisulfite sequencing data , 2013, BMC Genomics.

[87]  Xiaohong Yang,et al.  CACTA-like transposable element in ZmCCT attenuated photoperiod sensitivity and accelerated the postdomestication spread of maize , 2013, Proceedings of the National Academy of Sciences.

[88]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[89]  Nathan M. Springer,et al.  Epigenetic and Genetic Influences on DNA Methylation Variation in Maize Populations[C][W] , 2013, Plant Cell.

[90]  Colin N. Dewey,et al.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis , 2013, Nature Protocols.

[91]  Angel Amores,et al.  Stacks: an analysis tool set for population genomics , 2013, Molecular ecology.

[92]  Xiaoyu Zhang,et al.  CHH islands: de novo DNA methylation in near-gene chromatin regulation in maize , 2013, Genome research.

[93]  Georgii A. Bazykin,et al.  Strong Mutational Bias Toward Deletions in the Drosophila melanogaster Genome Is Compensated by Selection , 2013, Genome biology and evolution.

[94]  Philipp W. Messer,et al.  SLiM: Simulating Evolution with Selection and Linkage , 2013, Genetics.

[95]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[96]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[97]  Matthew D. Schultz,et al.  'Leveling' the playing field for analyses of single-base resolution DNA methylomes. , 2012, Trends in genetics : TIG.

[98]  Sven Rahmann,et al.  Snakemake--a scalable bioinformatics workflow engine. , 2012, Bioinformatics.

[99]  C. Messina,et al.  A Gene Regulatory Network Model for Floral Transition of the Shoot Apex in Maize and Its Dynamic Modeling , 2012, PloS one.

[100]  Peter J. Bradbury,et al.  Maize HapMap2 identifies extant variation from a genome in flux , 2012, Nature Genetics.

[101]  Xun Xu,et al.  Comparative population genomics of maize domestication and improvement , 2012, Nature Genetics.

[102]  Doreen Ware,et al.  ZmCCT and the genetic basis of day-length adaptation underlying the postdomestication spread of maize , 2012, Proceedings of the National Academy of Sciences.

[103]  Katherine E. Guill,et al.  The relationship between parental genetic or phenotypic divergence and progeny variation in the maize nested association mapping population , 2011, Heredity.

[104]  James C. Schnable,et al.  Altered Patterns of Fractionation and Exon Deletions in Brassica rapa Support a Two-Step Model of Paleohexaploidy , 2012, Genetics.

[105]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[106]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[107]  James C. Schnable,et al.  Genome-Wide Analysis of Syntenic Gene Deletion in the Grasses , 2012, Genome biology and evolution.

[108]  Martin C. Frith,et al.  Gentle Masking of Low-Complexity Sequences Improves Homology Search , 2011, PloS one.

[109]  Edward S. Buckler,et al.  Distinct Genetic Architectures for Male and Female Inflorescence Traits of Maize , 2011, PLoS genetics.

[110]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[111]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[112]  Peter J. Bradbury,et al.  Genome-wide nested association mapping of quantitative resistance to northern leaf blight in maize , 2011, Proceedings of the National Academy of Sciences.

[113]  M. Frith,et al.  Adaptive seeds tame genomic sequence comparison. , 2011, Genome research.

[114]  James C. Schnable,et al.  Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss , 2011, Proceedings of the National Academy of Sciences.

[115]  Peter J. Bradbury,et al.  Genome-wide association study of quantitative resistance to southern leaf blight in the maize nested association mapping population , 2011, Nature Genetics.

[116]  Peter J. Bradbury,et al.  Genome-wide association study of leaf architecture in the maize nested association mapping population , 2011, Nature Genetics.

[117]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[118]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[119]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[120]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[121]  Martin Krzywinski,et al.  Fast Diploidization in Close Mesopolyploid Relatives of Arabidopsis[W][OA] , 2010, Plant Cell.

[122]  J. Birchler,et al.  Diversity of Chromosomal Karyotypes in Maize and Its Relatives , 2010, Cytogenetic and Genome Research.

[123]  James C. Schnable,et al.  Following Tetraploidy in Maize, a Short Deletion Mechanism Removed Genes Preferentially from One of the Two Homeologs , 2010, PLoS biology.

[124]  Carol Soderlund,et al.  Sequencing, Mapping, and Analysis of 27,455 Maize Full-Length cDNAs , 2009, PLoS genetics.

[125]  Cristian Chaparro,et al.  Exceptional Diversity, Non-Random Distribution, and Rapid Evolution of Retroelements in the B73 Maize Genome , 2009, PLoS genetics.

[126]  Patrick S. Schnable,et al.  Maize Inbreds Exhibit High Levels of Copy Number Variation (CNV) and Presence/Absence Variation (PAV) in Genome Content , 2009, PLoS genetics.

[127]  Peter J. Bradbury,et al.  The Genetic Architecture of Maize Flowering Time , 2009, Science.

[128]  M. McMullen,et al.  Genetic Properties of the Maize Nested Association Mapping Population , 2009, Science.

[129]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[130]  Xuehui Huang,et al.  High-throughput genotyping by whole-genome resequencing. , 2009, Genome research.

[131]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[132]  B. Gaut,et al.  Historical Divergence and Gene Flow in the Genus Zea , 2009, Genetics.

[133]  D. Piperno,et al.  The cultural and chronological context of early Holocene maize and squash domestication in the Central Balsas River Valley, Mexico , 2009, Proceedings of the National Academy of Sciences.

[134]  Karen Eilbeck,et al.  Quantitative measures for the management and comparison of annotated genomes , 2009, BMC Bioinformatics.

[135]  M. McMullen,et al.  Genetic Design and Statistical Power of Nested Association Mapping in Maize , 2008, Genetics.

[136]  Stefan Kurtz,et al.  LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons , 2008, BMC Bioinformatics.

[137]  Edward S. Buckler,et al.  TASSEL: software for association mapping of complex traits in diverse samples , 2007, Bioinform..

[138]  Richard M. Clark,et al.  Estimating a nucleotide substitution rate for maize from polymorphism at a major domestication locus. , 2005, Molecular biology and evolution.

[139]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[140]  Steven Salzberg,et al.  DAGchainer: a tool for mining segmental genome duplications and synteny , 2004, Bioinform..

[141]  Jianxin Ma,et al.  Close split of sorghum and maize genome progenitors. , 2004, Genome research.

[142]  J. Birchler,et al.  Chromosome painting using repetitive DNA sequences as probes for somatic chromosome identification in maize. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[143]  Stephen M. Mount,et al.  Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. , 2003, Nucleic acids research.

[144]  R. Wing,et al.  An improved method for plant BAC library construction. , 2003, Methods in molecular biology.

[145]  K. Fukui,et al.  Reduced Levels of Chloroplast FtsH Protein in Tobacco Mosaic Virus–Infected Tobacco Leaves Accelerate the Hypersensitive Reaction , 2000, Plant Cell.

[146]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[147]  A. Eyre-Walker,et al.  Investigation of the bottleneck leading to the domestication of maize. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[148]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[149]  E. Pahlich,et al.  A rapid DNA isolation procedure for small quantities of fresh leaf tissue , 1980 .