CRISPR/Cas9-based repeat depletion for the high-throughput genotyping of complex plant genomes

High-throughput genotyping enables the large-scale analysis of genetic diversity in population genomics and genome-wide association studies that combine the genotypic and phenotypic characterization of large collections of accessions. Genotyping by sequencing is progressively replacing traditional genotyping methods due to the lower ascertainment bias. However, genome-wide genotyping by sequencing becomes expensive in species with large genomes and a high proportion of repetitive DNA. Here we describe the use of CRISPR/Cas9 technology to deplete repetitive elements in the 3.76-Gb genome of lentil (Lens culinaris), 84% consisting of repeats, thus concentrating the sequencing data on coding and regulatory regions (unique regions). We designed a custom set of 566,722 gRNAs targeting 2.9 Gbp of repeats and excluding repetitive regions overlapping annotated genes and putative regulatory elements based on ATAC-Seq data. The novel depletion method removed 40% of reads mapping to repeats, increasing those mapping to unique regions by 2.6-fold. This repeat-to-unique shift in the sequencing data increased the number of genotyped bases by up to 17-fold compared to non-depleted libraries. We were also able to identify up to 18-fold more genetic variants in the unique regions and increased the genotyping accuracy by rescuing thousands of heterozygous variants that otherwise would be missed due to low coverage. The method performed similarly regardless of the multiplexing level, type of library or genotypes, including different cultivars and a closely-related species (L. orientalis). Our results demonstrated that CRISPR/Cas9-driven repeat depletion focuses sequencing data on meaningful genomic regions, thus improving high-density and genome-wide genotyping in large and repetitive genomes.

[1]  L. Holm,et al.  The giant diploid faba genome unlocks variation in a global protein crop , 2022, bioRxiv.

[2]  D. Haak,et al.  Exploring transposable element-based markers to identify allelic variations underlying agronomic traits in rice , 2021, Plant communications.

[3]  Jacob R Heldenbrand,et al.  CROPSR: an automated platform for complex genome-wide CRISPR gRNA design and validation , 2021, BMC Bioinform..

[4]  Huijiang Gao,et al.  Comparison of Genotype Imputation for SNP Array and Low-Coverage Whole-Genome Sequencing Data , 2022, Frontiers in Genetics.

[5]  A. Bombarely,et al.  Comparative Analysis of Genotyping by Sequencing and Whole-Genome Sequencing Methods in Diversity Studies of Olea europaea L. , 2021, Plants.

[6]  R. Varshney,et al.  The INCREASE project: Intelligent Collections of food‐legume genetic resources for European agrofood systems , 2021, The Plant journal : for cell and molecular biology.

[7]  B. Rosen,et al.  Genomic rearrangements have consequences for introgression breeding as revealed by genome assemblies of wild and cultivated lentil species , 2021, bioRxiv.

[8]  S. Qin,et al.  The extremely large chloroplast genome of the green alga Haematococcus pluvialis: Genome structure, and comparative analysis , 2021 .

[9]  E. Bitocchi,et al.  Intelligent Characterization of Lentil Genetic Resources: Evolutionary History, Genetic Diversity of Germplasm, and the Need for Well‐Represented Collections , 2021, Current protocols.

[10]  H. Kanamori,et al.  Investigation of the Genetic Diversity of a Rice Core Collection of Japanese Landraces using Whole-Genome Sequencing , 2020, Plant & cell physiology.

[11]  R. Gargiulo,et al.  Effective double‐digest RAD sequencing and genotyping despite large genome size , 2020, Molecular ecology resources.

[12]  E. Ciani,et al.  Recommendations for Choosing the Genotyping Method and Best Practices for Quality Control in Crop Genome-Wide Association Studies , 2020, Frontiers in Genetics.

[13]  A. Westermann,et al.  Improved bacterial RNA-seq by Cas9-based depletion of ribosomal RNA reads , 2020, RNA.

[14]  Jinpu Jin,et al.  PlantRegMap: charting functional regulatory maps in plants , 2019, Nucleic Acids Res..

[15]  Alvaro G. Hernandez,et al.  RipTide High Throughput NGS Library Prep for Genotyping in Populations. , 2019, Journal of biomolecular techniques : JBT.

[16]  Robert J. Schmitz,et al.  Widespread Long-range Cis-Regulatory Elements in the Maize Genome , 2019, Nature Plants.

[17]  P. Wincker,et al.  A reference genome for pea provides insight into legume genome evolution , 2019, Nature Genetics.

[18]  Ying Yang,et al.  A key variant in the cis-regulatory element of flowering gene Ghd8 associated with cold tolerance in rice , 2019, Scientific Reports.

[19]  G. Spangenberg,et al.  Exome sequencing highlights the role of wild-relative introgression in shaping the adaptive landscape of the wheat genome , 2019, Nature Genetics.

[20]  T. Abe,et al.  An improved and robust method to efficiently deplete repetitive elements from complex plant genomes. , 2019, Plant science : an international journal of experimental plant biology.

[21]  Ö. Carlborg,et al.  Genotyping by low-coverage whole-genome sequencing in intercross pedigrees from outbred founders: a cost-efficient approach , 2018, Genetics Selection Evolution.

[22]  K. Bett,et al.  Capturing variation in Lens (Fabaceae): Development and utility of an exome capture array for lentil , 2018, Applications in plant sciences.

[23]  W. Sakamoto,et al.  Chloroplast DNA Dynamics: Copy Number, Quality Control and Degradation. , 2018, Plant & cell physiology.

[24]  Joshua P. Jahner,et al.  RADseq approaches and applications for forest tree genetics , 2018, Tree Genetics & Genomes.

[25]  Y. Gilad,et al.  Reducing mitochondrial reads in ATAC-seq using CRISPR/Cas9 , 2017, Scientific Reports.

[26]  Genlou Sun,et al.  QTL underlying some agronomic traits in barley detected by SNP markers , 2016, BMC Genetics.

[27]  Omar E. Cornejo,et al.  GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data , 2016, PLoS genetics.

[28]  Meagan E. Sullender,et al.  Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9 , 2015, Nature Biotechnology.

[29]  E. Crawford,et al.  Depletion of Abundant Sequences by Hybridization (DASH): using Cas9 to remove unwanted high-abundance species in sequencing libraries and molecular counting applications , 2015, bioRxiv.

[30]  Nam-Soo Kim,et al.  Transposable Elements and Genome Size Variations in Plants , 2014, Genomics & informatics.

[31]  C. Perou,et al.  Comparison of RNA-Seq by poly (A) capture, ribosomal RNA depletion, and DNA microarray for expression profiling , 2014, BMC Genomics.

[32]  J. Bennetzen,et al.  The contributions of transposable elements to the structure, function, and evolution of plant genomes. , 2014, Annual review of plant biology.

[33]  Haixu Tang,et al.  A new method for stranded whole transcriptome RNA-seq. , 2013, Methods.

[34]  S. Tishkoff,et al.  SNP ascertainment bias in population genetic analyses: Why it is important, and how to correct it , 2013, BioEssays : news and reviews in molecular, cellular and developmental biology.

[35]  R. Michelmore,et al.  Consequences of Normalizing Transcriptomic and Genomic Libraries of Plant Genomes Using a Duplex-Specific Nuclease and Tetramethylammonium Chloride , 2013, PloS one.

[36]  H. Hoekstra,et al.  Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species , 2012, PloS one.

[37]  Antoine Janssen,et al.  Sequence-Based Genotyping for Marker Discovery and Co-Dominant Scoring in Germplasm and Populations , 2012, PloS one.

[38]  M. Blaxter,et al.  Genome-wide genetic marker discovery and genotyping using next-generation sequencing , 2011, Nature Reviews Genetics.

[39]  J. Rogers,et al.  Crop genome sequencing: lessons and rationales. , 2011, Trends in plant science.

[40]  Z. Xuan,et al.  Genome-wide in situ exon capture for selective resequencing , 2007, Nature Genetics.

[41]  S. Lukyanov,et al.  Simple cDNA normalization using kamchatka crab duplex-specific nuclease. , 2004, Nucleic acids research.