HISS: Snakemake-based workflows for performing SMRT-RenSeq assembly, AgRenSeq and dRenSeq for the discovery of novel plant disease resistance genes

Background In the ten years since the initial publication of the RenSeq protocol, the method has proved to be a powerful tool for studying disease resistance in plants and providing target genes for breeding programmes. Since the initial publication of the methodology, it has continued to be developed as new technologies have become available and the increased availability of computing power has made new bioinformatic approaches possible. Most recently, this has included the development of a k-mer based association genetics approach, the use of PacBio HiFi data, and graphical genotyping with diagnostic RenSeq. However, there is not yet a unified workflow available and researchers must instead configure approaches from various sources themselves. This makes reproducibility and version control a challenge and limits the ability to perform these analyses to those with bioinformatics expertise. Results Here we present HISS, consisting of three workflows which take a user from raw RenSeq reads to the identification of candidates for disease resistance genes. These workflows conduct the assembly of enriched HiFi reads from an accession with the resistance phenotype of interest. A panel of accessions both possessing and lacking the resistance are then used in an association genetics approach (AgRenSeq) to identify contigs positively associated with the resistance phenotype. Candidate genes are then identified on these contigs and assessed for their presence or absence in the panel with a graphical genotyping approach that uses dRenSeq. These workflows are implemented via Snakemake, a python-based workflow manager. Software dependencies are either shipped with the release or handled with conda. All code is freely available and is distributed under the GNU GPL-3.0 license. Conclusions HISS provides a user-friendly, portable, and easily customised approach for identifying novel disease resistance genes in plants. It is easily installed with all dependencies handled internally or shipped with the release and represents a significant improvement in the ease of use of these bioinformatics analyses.

[1]  B. Steuernagel,et al.  Discovery of Resistance Genes in Rye by Targeted Long-Read Sequencing and Association Genetics , 2022, Cells.

[2]  P. Fariselli,et al.  SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files , 2021, Bioengineering.

[3]  Sven Rahmann,et al.  Sustainable data analysis with Snakemake , 2021, F1000Research.

[4]  Thomas M. Keane,et al.  Twelve years of SAMtools and BCFtools , 2020, GigaScience.

[5]  Jiming Jiang,et al.  Construction of a chromosome-scale long-read reference genome assembly for potato , 2020, GigaScience.

[6]  Jonathan D. G. Jones,et al.  The NLR-Annotator Tool Enables Annotation of the Intracellular Immune Receptor Repertoire1[OPEN] , 2020, Plant Physiology.

[7]  Sergey Koren,et al.  HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads , 2020, bioRxiv.

[8]  Jonathan D. G. Jones,et al.  Resistance gene cloning from a wild crop relative by sequence capture and association genetics , 2019, Nature Biotechnology.

[9]  J. Vossen,et al.  Tracking disease resistance deployment in potato breeding by enrichment sequencing , 2018, bioRxiv.

[10]  Jia Gu,et al.  fastp: an ultra-fast all-in-one FASTQ preprocessor , 2018, bioRxiv.

[11]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[12]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[13]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[14]  R. Visser,et al.  The Solanum demissumR8 late blight resistance gene is an Sw-5 homologue that has been deployed worldwide in late blight resistant varieties , 2016, Theoretical and Applied Genetics.

[15]  G. Bryan,et al.  Utilizing “Omic” Technologies to Identify and Prioritize Novel Sources of Resistance to the Oomycete Pathogen Phytophthora infestans in Potato Germplasm Collections , 2016, Front. Plant Sci..

[16]  William Stafford Noble,et al.  The MEME Suite , 2015, Nucleic Acids Res..

[17]  Edwin Cuppen,et al.  Sambamba: fast processing of NGS alignment formats , 2015, Bioinform..

[18]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[19]  Kwang-Ryong Jo Unveiling and deploying durability of late blight resistance in potato : from natural stacking to cisgenic stacking , 2013 .

[20]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[21]  Leighton Pritchard,et al.  Identification and localisation of the NB-LRR gene family within the potato genome , 2012, BMC Genomics.

[22]  D. Qu,et al.  Cloning and characterization of r3b; members of the r3 superfamily of late blight resistance genes show sequence and functional divergence. , 2011, Molecular plant-microbe interactions : MPMI.

[23]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[24]  Carl Kingsford,et al.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011, Bioinform..

[25]  Cedric E. Ginestet ggplot2: Elegant Graphics for Data Analysis , 2011 .

[26]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[27]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[28]  R. Visser,et al.  Exploiting knowledge of R/Avr genes to rapidly clone a new LZ-NBS-LRR family of late blight resistance genes from potato linkage group IV. , 2009, Molecular plant-microbe interactions : MPMI.

[29]  R. Visser,et al.  Comparative genomics enabled the isolation of the R3a late blight resistance gene in potato. , 2005, The Plant journal : for cell and molecular biology.

[30]  M. Ercolano,et al.  The R1 gene for potato resistance to late blight (Phytophthora infestans) belongs to the leucine zipper/NBS/LRR class of plant resistance genes. , 2002, The Plant journal : for cell and molecular biology.

[31]  E. Jacobsen,et al.  Mapping of the cyst nematode resistance locus Gpa2 in potato using a strategy based on comigrating AFLP markers , 1997, Theoretical and Applied Genetics.

[32]  D. Baulcombe,et al.  High-resolution genetical and physical mapping of the Rx gene for extreme resistance to potato virus X in tetraploid potato , 1997, Theoretical and Applied Genetics.

[33]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.