RADcap: sequence capture of dual‐digest RADseq libraries with identifiable duplicates and reduced missing data

Molecular ecologists seek to genotype hundreds to thousands of loci from hundreds to thousands of individuals at minimal cost per sample. Current methods, such as restriction‐site‐associated DNA sequencing (RADseq) and sequence capture, are constrained by costs associated with inefficient use of sequencing data and sample preparation. Here, we introduce RADcap, an approach that combines the major benefits of RADseq (low cost with specific start positions) with those of sequence capture (repeatable sequencing of specific loci) to significantly increase efficiency and reduce costs relative to current approaches. RADcap uses a new version of dual‐digest RADseq (3RAD) to identify candidate SNP loci for capture bait design and subsequently uses custom sequence capture baits to consistently enrich candidate SNP loci across many individuals. We combined this approach with a new library preparation method for identifying and removing PCR duplicates from 3RAD libraries, which allows researchers to process RADseq data using traditional pipelines, and we tested the RADcap method by genotyping sets of 96–384 Wisteria plants. Our results demonstrate that our RADcap method: (i) methodologically reduces (to <5%) and allows computational removal of PCR duplicate reads from data, (ii) achieves 80–90% reads on target in 11 of 12 enrichments, (iii) returns consistent coverage (≥4×) across >90% of individuals at up to 99.8% of the targeted loci, (iv) produces consistently high occupancy matrices of genotypes across hundreds of individuals and (v) costs significantly less than current approaches.

[1]  W. Fung,et al.  Maximum likelihood estimates of two-locus recombination fractions under some natural inequality restrictions , 2008, BMC Genetics.

[2]  S. Linnarsson,et al.  Counting absolute numbers of molecules using unique molecular identifiers , 2011, Nature Methods.

[3]  Kristen Jepsen,et al.  Biased estimates of clonal evolution and subclonal heterogeneity can arise from PCR duplicates in deep sequencing experiments , 2014, Genome Biology.

[4]  Peilin Jia,et al.  Multi-species data integration and gene ranking enrich significant results in an alcoholism genome-wide association study , 2012, BMC Genomics.

[5]  P. Valder Wisterias: A Comprehensive Guide , 1995 .

[6]  David T. Okou,et al.  Microarray-based genomic selection for high-throughput resequencing , 2007, Nature Methods.

[7]  M. West,et al.  An integrative approach to characterize disease-specific pathways and their coordination: a case study in cancer , 2008, BMC Genomics.

[8]  C. Ponting,et al.  Sequencing depth and coverage: key considerations in genomic analyses , 2014, Nature Reviews Genetics.

[9]  M. Blaxter,et al.  RADSeq: next-generation population genetics. , 2010, Briefings in functional genomics.

[10]  Ning Ma,et al.  BLAST: a more efficient report with usability improvements , 2013, Nucleic Acids Res..

[11]  James A. Casbon,et al.  A method for counting PCR template molecules with application to next-generation sequencing , 2011, Nucleic acids research.

[12]  Troy J. Kieran,et al.  Impacts of degraded DNA on restriction enzyme associated DNA sequencing (RADSeq) , 2015, Molecular ecology resources.

[13]  J. Good,et al.  Targeted capture in evolutionary and ecological genomics , 2016, Molecular ecology.

[14]  Barry L. Stoddard,et al.  Natural and engineered nicking endonucleases—from cleavage mechanism to engineering of strand-specificity , 2010, Nucleic Acids Res..

[15]  L. Duret,et al.  Is RAD-seq suitable for phylogenetic inference? An in silico assessment and optimization , 2013, Ecology and evolution.

[16]  Travis C Glenn,et al.  Sequence Capture versus Restriction Site Associated DNA Sequencing for Shallow Systematics. , 2013, Systematic biology.

[17]  Jessica D. Stephens,et al.  Targeted DNA Region Re-sequencing , 2016 .

[18]  Hanbo Chen,et al.  VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R , 2011, BMC Bioinformatics.

[19]  Xun Xu,et al.  An Integrated Tool to Study MHC Region: Accurate SNV Detection and HLA Genes Typing in Human MHC Region Using Targeted High-Throughput Sequencing , 2013, PloS one.

[20]  Evandro Novaes,et al.  High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome , 2008, BMC Genomics.

[21]  Angel Amores,et al.  Stacks: an analysis tool set for population genomics , 2013, Molecular ecology.

[22]  Travis C Glenn,et al.  Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales. , 2012, Systematic biology.

[23]  David A. Wagner,et al.  A Generalized Birthday Problem , 2002, CRYPTO.

[24]  S. Begum,et al.  Sequence Alignment , 2018, Beginners Guide to Bioinformatics for High Throughput Sequencing.

[25]  D. Nonneman,et al.  SNP discovery in swine by reduced representation and high throughput pyrosequencing , 2008, BMC Genetics.

[26]  A. Amores,et al.  Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. , 2007, Genome research.

[27]  J. Maguire,et al.  Solution Hybrid Selection with Ultra-long Oligonucleotides for Massively Parallel Targeted Sequencing , 2009, Nature Biotechnology.

[28]  Florian Leese,et al.  Detection and Removal of PCR Duplicates in Population Genomic ddRAD Studies by Addition of a Degenerate Base Region (DBR) in Sequencing Adapters , 2014, The Biological Bulletin.

[29]  A. Amores,et al.  Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences , 2011, G3: Genes | Genomes | Genetics.

[30]  Romdhane Rekaya,et al.  Adapterama I: Universal Stubs and Primers for Thousands of Dual-Indexed Illumina Libraries (iTru & iNext) , 2016, bioRxiv.

[31]  Chittibabu Guda,et al.  A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference , 2015, BioMed research international.

[32]  B. G. Lockaby,et al.  Bi-parental cytoplasmic DNA inheritance in Wisteria (Fabaceae): evidence from a natural experiment. , 2007, Plant & cell physiology.

[33]  E. Pante,et al.  Use of RAD sequencing for delimiting species , 2014, Heredity.

[34]  D. Reich,et al.  Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture , 2012, Genome research.

[35]  Brant C. Faircloth,et al.  PHYLUCE is a software package for the analysis of conserved genomic loci , 2015, bioRxiv.

[36]  Detlef Weigel,et al.  Next Generation Molecular Ecology , 2010, Molecular ecology.

[37]  Eric S. Lander,et al.  An SNP map of the human genome generated by reduced representation shotgun sequencing , 2000, Nature.

[38]  M. Blaxter,et al.  Genome-wide genetic marker discovery and genotyping using next-generation sequencing , 2011, Nature Reviews Genetics.

[39]  A. Lemmon,et al.  Anchored hybrid enrichment for massively high-throughput phylogenomics. , 2012, Systematic biology.

[40]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[41]  Carlos D Bustamante,et al.  Ascertainment bias in studies of human genome-wide polymorphism. , 2005, Genome research.

[42]  R. Hansen,et al.  Molecular barcodes detect redundancy and contamination in hairpin-bisulfite PCR. , 2004, Nucleic acids research.

[43]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[44]  G. Luikart,et al.  RAD Capture (Rapture): Flexible and Efficient Sequence-Based Genotyping , 2015, Genetics.

[45]  J. Good,et al.  Transcriptome-based exon capture enables highly cost-effective comparative genomic data collection at moderate evolutionary scales , 2012, BMC Genomics.

[46]  Laurent Excoffier,et al.  PGDSpider: an automated data conversion tool for connecting population genetics and genomics programs , 2012, Bioinform..

[47]  Li-Pan Qi,et al.  De novo sequencing of sunflower genome for SNP discovery using RAD (Restriction site Associated DNA) approach , 2013, BMC Genomics.

[48]  Matthew J. Huentelman,et al.  IDENTIFICATION OF GENETIC VARIANTS USING BARCODED MULTIPLEXED SEQUENCING , 2008, Nature Methods.

[49]  Olivier Harismendy,et al.  Accurate detection and genotyping of SNPs utilizing population sequencing data. , 2010, Genome research.

[50]  G. Luikart,et al.  Harnessing the power of RADseq for ecological and evolutionary genomics , 2016, Nature Reviews Genetics.

[51]  Leslie R. Goertzen,et al.  Horticulture, hybrid cultivars and exotic plant invasion: a case study of Wisteria (Fabaceae) , 2008 .

[52]  R. Nielsen Estimation of population parameters and recombination rates from single nucleotide polymorphisms. , 2000, Genetics.

[53]  P. Etter,et al.  Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers , 2008, PloS one.

[54]  B. Emerson,et al.  Restriction site‐associated DNA sequencing, genotyping error estimation and de novo assembly optimization for population genetic inference , 2015, Molecular ecology resources.

[55]  T. Glenn,et al.  Isolating microsatellite DNA loci. , 2005, Methods in enzymology.

[56]  P. Smouse,et al.  genalex 6: genetic analysis in Excel. Population genetic software for teaching and research , 2006 .

[57]  H. Hoekstra,et al.  Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species , 2012, PloS one.

[58]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[59]  Rod Peakall,et al.  GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research—an update , 2012, Bioinform..

[60]  C. Saintenac,et al.  Targeted analysis of nucleotide and copy number variation by exon capture in allotetraploid wheat genome , 2011, Genome Biology.

[61]  Y. Rogers,et al.  Genomics: Massively parallel sequencing , 2005, Nature.

[62]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[63]  Gordon Luikart,et al.  Trade‐offs and utility of alternative RADseq methods: Reply to Puritz et al. , 2014, Molecular ecology.

[64]  T. Glenn Field guide to next‐generation DNA sequencers , 2011, Molecular ecology resources.

[65]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[66]  Nils Arrigo,et al.  Hybridization Capture Using RAD Probes (hyRAD), a New Tool for Performing Genomic Analyses on Collection Specimens , 2016, bioRxiv.

[67]  F. Rheindt,et al.  Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy , 2015, Molecular ecology resources.