Marker Density and Read Depth for Genotyping Populations Using Genotyping-by-Sequencing

Genotyping-by-sequencing (GBS) approaches provide low-cost, high-density genotype information. However, GBS has unique technical considerations, including a substantial amount of missing data and a nonuniform distribution of sequence reads. The goal of this study was to characterize technical variation using this method and to develop methods to optimize read depth to obtain desired marker coverage. To empirically assess the distribution of fragments produced using GBS, ∼8.69 Gb of GBS data were generated on the Zea mays reference inbred B73, utilizing ApeKI for genome reduction and single-end reads between 75 and 81 bp in length. We observed wide variation in sequence coverage across sites. Approximately 76% of potentially observable cut site-adjacent sequence fragments had no sequencing reads whereas a portion had substantially greater read depth than expected, up to 2369 times the expected mean. The methods described in this article facilitate determination of sequencing depth in the context of empirically defined read depth to achieve desired marker density for genetic mapping studies.

[1]  J. Birchler,et al.  Mitochondrial DNA Transfer to the Nucleus Generates Extensive Insertion Site Variation in Maize , 2008, Genetics.

[2]  E. Lander,et al.  Genomic mapping by fingerprinting random clones: a mathematical analysis. , 1988, Genomics.

[3]  A. Amores,et al.  Genome Evolution and Meiotic Maps by Massively Parallel DNA Sequencing: Spotted Gar, an Outgroup for the Teleost Genome Duplication , 2011, Genetics.

[4]  H. Piepho Optimal marker density for interval mapping in a backcross population , 2000, Heredity.

[5]  Kevin L. Gunderson,et al.  Highly parallel genomic assays , 2006, Nature Reviews Genetics.

[6]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[7]  Hao Wu,et al.  R/qtl: QTL Mapping in Experimental Crosses , 2003, Bioinform..

[8]  Gabor T. Marth,et al.  Whole-genome sequencing and variant discovery in C. elegans , 2008, Nature Methods.

[9]  Emily H Turner,et al.  Targeted Capture and Massively Parallel Sequencing of Twelve Human Exomes , 2009, Nature.

[10]  C. Robin Buell,et al.  Maize (Zea mays L.) Genome Diversity as Revealed by RNA-Sequencing , 2012, PloS one.

[11]  R. Jorgensen,et al.  Ribosomal DNA spacer-length polymorphisms in barley: mendelian inheritance, chromosomal location, and population dynamics. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[12]  D. Grant,et al.  Expanding the genetic map of maize with the intermated B73 × Mo17 (IBM) population , 2002, Plant Molecular Biology.

[13]  M. Wendl Occupancy Modeling of Coverage Distribution for Whole Genome Shotgun Dna Sequencing , 2006, Bulletin of mathematical biology.

[14]  N. Kyrpides,et al.  Direct Comparisons of Illumina vs. Roche 454 Sequencing Technologies on the Same Microbial Community DNA Sample , 2012, PloS one.

[15]  F. Teuscher,et al.  The Map Expansion Obtained With Recombinant Inbred Strains and Intermated Recombinant Inbred Populations for Finite Generation Designs , 2005, Genetics.

[16]  Dawn H. Nagel,et al.  The B73 Maize Genome: Complexity, Diversity, and Dynamics , 2009, Science.

[17]  Mark L. Blaxter,et al.  Linkage Mapping and Comparative Genomics Using Next-Generation RAD Sequencing of a Non-Model Organism , 2011, PloS one.

[18]  Broome,et al.  Literature cited , 1924, A Guide to the Carnivores of Central America.

[19]  Robert J. Elshire,et al.  A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species , 2011, PloS one.

[20]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[21]  Patrick M Hayes,et al.  Construction and application for QTL analysis of a Restriction Site Associated DNA (RAD) linkage map in barley , 2011, BMC Genomics.

[22]  Patrick S Schnable,et al.  Genetic Dissection of Intermated Recombinant Inbred Lines Using a New Genetic Map of Maize , 2006, Genetics.

[23]  Eric A. Johnson,et al.  Mapping with RAD (restriction-site associated DNA) markers to rapidly identify QTL for stem rust resistance in Lolium perenne , 2011, Theoretical and Applied Genetics.

[24]  P. Etter,et al.  Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers , 2008, PloS one.

[25]  Juliane C. Dohm,et al.  Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems , 2011, Genome Biology.

[26]  M. Blaxter,et al.  Genome-wide genetic marker discovery and genotyping using next-generation sequencing , 2011, Nature Reviews Genetics.

[27]  Nathan M. Springer,et al.  B73-Mo17 Near-Isogenic Lines Demonstrate Dispersed Structural Variation in Maize1[W][OA] , 2011, Plant Physiology.

[28]  S. Salzberg,et al.  Repetitive DNA and next-generation sequencing: computational challenges and solutions , 2011, Nature Reviews Genetics.

[29]  Jan van Oeveren,et al.  Complexity Reduction of Polymorphic Sequences (CRoPS™): A Novel Approach for Large-Scale Polymorphism Discovery in Complex Genomes , 2007, PloS one.

[30]  Timothy E. Reddy,et al.  Distinct DNA methylation patterns characterize differentiated human embryonic stem cells and developing human fetal liver. , 2009, Genome research.

[31]  J. Poland,et al.  Development of High-Density Genetic Maps for Barley and Wheat Using a Novel Two-Enzyme Genotyping-by-Sequencing Approach , 2012, PloS one.

[32]  Brandon S. Gaut,et al.  Patterns of DNA sequence polymorphism along chromosome 1 of maize (Zea mays ssp. mays L.) , 2001, Proceedings of the National Academy of Sciences of the United States of America.