Maize (Zea mays L.) Genome Diversity as Revealed by RNA-Sequencing

Maize is rich in genetic and phenotypic diversity. Understanding the sequence, structural, and expression variation that contributes to phenotypic diversity would facilitate more efficient varietal improvement. RNA based sequencing (RNA-seq) is a powerful approach for transcriptional analysis, assessing sequence variation, and identifying novel transcript sequences, particularly in large, complex, repetitive genomes such as maize. In this study, we sequenced RNA from whole seedlings of 21 maize inbred lines representing diverse North American and exotic germplasm. Single nucleotide polymorphism (SNP) detection identified 351,710 polymorphic loci distributed throughout the genome covering 22,830 annotated genes. Tight clustering of two distinct heterotic groups and exotic lines was evident using these SNPs as genetic markers. Transcript abundance analysis revealed minimal variation in the total number of genes expressed across these 21 lines (57.1% to 66.0%). However, the transcribed gene set among the 21 lines varied, with 48.7% expressed in all of the lines, 27.9% expressed in one to 20 lines, and 23.4% expressed in none of the lines. De novo assembly of RNA-seq reads that did not map to the reference B73 genome sequence revealed 1,321 high confidence novel transcripts, of which, 564 loci were present in all 21 lines, including B73, and 757 loci were restricted to a subset of the lines. RT-PCR validation demonstrated 87.5% concordance with the computational prediction of these expressed novel transcripts. Intriguingly, 145 of the novel de novo assembled loci were present in lines from only one of the two heterotic groups consistent with the hypothesis that, in addition to sequence polymorphisms and transcript abundance, transcript presence/absence variation is present and, thereby, may be a mechanism contributing to the genetic basis of heterosis.

[1]  Nathan M. Springer,et al.  Natural Variation for Alleles Under Epigenetic Control by the Maize Chromomethylase Zmet2 , 2007, Genetics.

[2]  Dawn H. Nagel,et al.  The B73 Maize Genome: Complexity, Diversity, and Dynamics , 2009, Science.

[3]  Peter J. Bradbury,et al.  Genome-wide association study of quantitative resistance to southern leaf blight in the maize nested association mapping population , 2011, Nature Genetics.

[4]  Shawn M. Kaeppler,et al.  Genetic Diversity of a Maize Association Population with Restricted Phenology , 2011 .

[5]  Robert J. Elshire,et al.  A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species , 2011, PloS one.

[6]  Peter J. Bradbury,et al.  Genome-wide association study of leaf architecture in the maize nested association mapping population , 2011, Nature Genetics.

[7]  Justin S. Hogg,et al.  Characterization and modeling of the Haemophilus influenzae core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains , 2007, Genome Biology.

[8]  James G. Coors,et al.  Genetics and Exploitation of Heterosis in Crops , 1999 .

[9]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[10]  Peter Tiffin,et al.  Pervasive gene content variation and copy number variation in maize and its undomesticated progenitor. , 2010, Genome research.

[11]  R. Henry,et al.  Genome-wide DNA polymorphisms in elite indica rice inbreds discovered by whole-genome sequencing. , 2012, Plant biotechnology journal.

[12]  Roderic D. M. Page,et al.  TreeView: an application to display phylogenetic trees on personal computers , 1996, Comput. Appl. Biosci..

[13]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[14]  Mark Johnson,et al.  NCBI BLAST: a better web interface , 2008, Nucleic Acids Res..

[15]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[16]  G. G. Pohlman Soil Science Society of America , 1940 .

[17]  Robert J. Elshire,et al.  A First-Generation Haplotype Map of Maize , 2009, Science.

[18]  H. Tettelin,et al.  The microbial pan-genome. , 2005, Current opinion in genetics & development.

[19]  David R. Riley,et al.  Comparative genomics: the bacterial pan-genome. , 2008, Current opinion in microbiology.

[20]  M. McMullen,et al.  Genetic Design and Statistical Power of Nested Association Mapping in Maize , 2008, Genetics.

[21]  A. F. Troyer A Retrospective View of Corn Genetic Resources , 1990 .

[22]  David J. States,et al.  Identification of protein coding regions by database similarity search , 1993, Nature Genetics.

[23]  Patrick S. Schnable,et al.  Heritable Epigenetic Variation among Maize Inbreds , 2011, PLoS genetics.

[24]  G. Sprague,,et al.  Arnel R. Hallauer: An appreciation , 1992 .

[25]  John A. Hamilton,et al.  The TIGR Rice Genome Annotation Resource: improvements and new features , 2006, Nucleic Acids Res..

[26]  D. Duvick The Contribution of Breeding to Yield Advances in maize (Zea mays L.) , 2005 .

[27]  J. Dudley Seventy generations of selection for oil and protein in maize , 1974 .

[28]  Peter J. Bradbury,et al.  The Genetic Architecture of Maize Flowering Time , 2009, Science.

[29]  W. A. Compton,et al.  Twenty Cycles of Divergent Mass Selection for Seed Size in Corn 1 , 1987 .

[30]  Matthew D. Wilkerson,et al.  PlantGDB: a resource for comparative plant genomics , 2007, Nucleic Acids Res..

[31]  Jun Zheng,et al.  Genome-wide transcriptome analysis of two maize inbred lines under drought stress , 2010, Plant Molecular Biology.

[32]  R. Sekhon,et al.  Genome-wide atlas of transcription during maize development. , 2011, The Plant journal : for cell and molecular biology.

[33]  W. K. Russell Registration of KLS_30 and KSS_30 Populations of Maize , 2006 .

[34]  Kejun Liu,et al.  PowerMarker: an integrated analysis environment for genetic marker analysis , 2005, Bioinform..

[35]  R. Rappuoli,et al.  Genome Analysis Reveals Pili in Group B Streptococcus , 2005, Science.

[36]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[37]  Lars Bolund,et al.  Building the sequence map of the human pan-genome , 2010, Nature Biotechnology.

[38]  R. Mott,et al.  The 1001 Genomes Project for Arabidopsis thaliana , 2009, Genome Biology.

[39]  Patrick S. Schnable,et al.  Maize Inbreds Exhibit High Levels of Copy Number Variation (CNV) and Presence/Absence Variation (PAV) in Genome Content , 2009, PLoS genetics.

[40]  J. Hinds,et al.  Discovery of Stable and Variable Differences in the Mycobacterium avium subsp. paratuberculosis Type I, II, and III Genomes by Pan-Genome Microarray Analysis , 2008, Applied and Environmental Microbiology.

[41]  H. Tettelin,et al.  Identification of a Universal Group B Streptococcus Vaccine by Multiple Genome Screen , 2005, Science.

[42]  Jaideep P. Sundaram,et al.  Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[43]  R. Rappuoli,et al.  The pan-genome: towards a knowledge-based discovery of novel targets for vaccines and antibacterials. , 2007, Drug discovery today.

[44]  Byung Han Choi,et al.  The Genetics and Exploitation of Heterosis in Crops , 1997 .

[45]  N. Leon,et al.  Twenty-four cycles of mass selection for prolificacy in the Golden Glow maize population , 2002 .

[46]  Michele Morgante,et al.  Transposable elements and the plant pan-genomes. , 2007, Current opinion in plant biology.

[47]  Peter J. Bradbury,et al.  Genome-wide nested association mapping of quantitative resistance to northern leaf blight in maize , 2011, Proceedings of the National Academy of Sciences.

[48]  John Doebley,et al.  Maize association population: a high-resolution platform for quantitative trait locus dissection. , 2005, The Plant journal : for cell and molecular biology.

[49]  Richard M. Clark,et al.  Sequencing of natural strains of Arabidopsis thaliana with short reads. , 2008, Genome research.

[50]  Mihaela M. Martis,et al.  The Sorghum bicolor genome and the diversification of grasses , 2009, Nature.

[51]  Nathan M. Springer,et al.  Allelic variation and heterosis in maize: how do two halves make more than a whole? , 2007, Genome research.

[52]  Karsten M. Borgwardt,et al.  Whole-genome sequencing of multiple Arabidopsis thaliana populations , 2011, Nature Genetics.

[53]  Hugo Y. K. Lam,et al.  Performance comparison of exome DNA sequencing technologies , 2011, Nature Biotechnology.

[54]  Jian Wang,et al.  Genome-wide patterns of genetic variation among elite maize inbred lines , 2010, Nature Genetics.

[55]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[56]  S. Muse,et al.  Genetic structure and diversity among maize inbred lines as inferred from DNA microsatellites. , 2003, Genetics.

[57]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[58]  Peter B. McGarvey,et al.  UniRef: comprehensive and non-redundant UniProt reference clusters , 2007, Bioinform..

[59]  Jianbing Yan,et al.  Genetic Characterization and Linkage Disequilibrium Estimation of a Global Maize Collection Using SNP Markers , 2009, PloS one.

[60]  W. Tracy,et al.  The Historical and Biological Basis of the Concept of Heterotic Patterns in Corn Belt Dent Maize , 2008 .

[61]  Michele Morgante,et al.  Evolution of DNA Sequence Nonhomologies among Maize Inbredsw⃞ , 2005, The Plant Cell Online.

[62]  H. Fu,et al.  Intraspecific violation of genetic colinearity and its implications in maize , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..