gmRAD: an integrated SNP calling pipeline for genetic mapping with RADseq across a hybrid population

Restriction site-associated DNA sequencing (RADseq) is a powerful technology that has been extensively applied in population genetics, phylogenetics and genetic mapping. Although many software packages are available for ecological and evolutionary studies, a few effective tools are available for extracting genotype data with RADseq for genetic mapping, a prerequisite for quantitative trait locus mapping, comparative genomics and genome scaffold assembly. Here, we present an integrated pipeline called gmRAD for generating single nucleotide polymorphism (SNP) genotypes from RADseq data, de novo, across a genetic mapping population derived by crossing two parents. As an analytical strategy, the software takes five steps to implement the whole algorithms, including clustering the first (forward) reads of each parent, building two parental references, generating parental SNP catalogs, calling SNP genotypes across all individuals and filtering the genotype data for genetic linkage mapping. All the steps can be completed with a simple command line, but they can be also performed optionally if prerequisite files are available. To validate its application, we also performed a real data analysis with RADseq data from an F1 hybrid population derived by crossing Populus deltoides and Prunus simonii. The software gmRAD is freely available at https://github.com/tongchf/gmRAD.

[1]  Anete P. Souza,et al.  OneMap: software for genetic mapping in outcrossing species. , 2007, Hereditas.

[2]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[3]  S. Lin,et al.  A high-density rice genetic linkage map with 2275 markers using a single F2 population. , 1998, Genetics.

[4]  Mukesh Jain,et al.  NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data , 2012, PloS one.

[5]  Robert J. Elshire,et al.  TASSEL-GBS: A High Capacity Genotyping by Sequencing Analysis Pipeline , 2014, PloS one.

[6]  Rongling Wu,et al.  Simultaneous maximum likelihood estimation of linkage and linkage phases in outcrossing species. , 2002, Theoretical population biology.

[7]  Eric A. Johnson,et al.  Mapping with RAD (restriction-site associated DNA) markers to rapidly identify QTL for stem rust resistance in Lolium perenne , 2011, Theoretical and Applied Genetics.

[8]  Christophe Klopp,et al.  High-resolution genetic maps of Eucalyptus improve Eucalyptus grandis genome assembly. , 2015, The New phytologist.

[9]  C. Tong,et al.  Construction of High-Density Linkage Maps of Populus deltoides × P. simonii Using Restriction-Site Associated DNA Sequencing , 2016, PloS one.

[10]  Angel Amores,et al.  Stacks: an analysis tool set for population genomics , 2013, Molecular ecology.

[11]  Nicolas C Rochette,et al.  Deriving genotypes from RAD-seq short-read data using Stacks , 2017, Nature Protocols.

[12]  Travis C Glenn,et al.  Sequence Capture versus Restriction Site Associated DNA Sequencing for Shallow Systematics. , 2013, Systematic biology.

[13]  Josephine R. Paris,et al.  Lost in parameter space: a road map for stacks , 2017 .

[14]  Katsutoshi Watanabe,et al.  A RAD-based linkage map and comparative genomics in the gudgeons (genus Gnathopogon, Cyprinidae) , 2013, BMC Genomics.

[15]  Chunfa Tong,et al.  De novo SNP discovery and genetic linkage mapping in poplar using restriction site associated DNA and whole-genome sequencing technologies , 2016, BMC Genomics.

[16]  M. Daly,et al.  MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. , 1987, Genomics.

[17]  Detlef Weigel,et al.  Paired-end RAD-seq for de novo assembly and marker design without available reference , 2011, Bioinform..

[18]  T. S. Cox Expectations of means and genetic variances in backcross populations , 1984, Theoretical and Applied Genetics.

[19]  Zechen Chong,et al.  Rainbow: an integrated tool for efficient clustering and assembling RAD-seq reads , 2012, Bioinform..

[20]  H. Gibbs,et al.  AftrRAD: a pipeline for accurate and efficient de novo assembly of RADseq data , 2015, Molecular ecology resources.

[21]  H. Hoekstra,et al.  Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species , 2012, PloS one.

[22]  C. Tong,et al.  MVQTLCIM: composite interval mapping of multivariate traits in a hybrid F1 population of outbred species , 2017, BMC Bioinformatics.

[23]  Chunfa Tong,et al.  A hidden Markov model approach to multilocus linkage analysis in a full-sib family , 2010, Tree Genetics & Genomes.

[24]  Huanming Yang,et al.  De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[25]  Robert J. Elshire,et al.  A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species , 2011, PloS one.

[26]  Dongyuan Liu,et al.  SLAF-seq: An Efficient Method of Large-Scale De Novo SNP Discovery and Genotyping Using High-Throughput Sequencing , 2013, PloS one.

[27]  Deren A. R. Eaton,et al.  PyRAD: assembly of de novo RADseq loci for phylogenetic analyses , 2013, bioRxiv.

[28]  R. Wilson,et al.  A High-Resolution SNP Array-Based Linkage Map Anchors a New Domestic Cat Draft Genome Assembly and Provides Detailed Patterns of Recombination , 2016, G3: Genes, Genomes, Genetics.

[29]  G. King,et al.  Comparative genomics of Eucalyptus and Corymbia reveals low rates of genome structural rearrangement , 2017, BMC Genomics.

[30]  G. Luikart,et al.  Harnessing the power of RADseq for ecological and evolutionary genomics , 2016, Nature Reviews Genetics.

[31]  Z. Zeng Precision mapping of quantitative trait loci. , 1994, Genetics.

[32]  E. Lander,et al.  Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. , 1989, Genetics.

[33]  J. Batley,et al.  Genotyping‐by‐sequencing approaches to characterize crop genomes: choosing the right tool for the right application , 2017, Plant biotechnology journal.

[34]  C. Tong,et al.  Identification of recombination events in outbred species with next-generation sequencing data , 2018, BMC Genomics.

[35]  Janna L. Fierst,et al.  Using linkage maps to correct and scaffold de novo genome assemblies: methods, challenges, and computational tools , 2015, Front. Genet..

[36]  Roeland E. Voorrips,et al.  Software for the calculation of genetic linkage maps , 2001 .

[37]  C. Nusbaum,et al.  ALLPATHS: de novo assembly of whole-genome shotgun microreads. , 2008, Genome research.

[38]  M. Gribskov,et al.  The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray) , 2006, Science.

[39]  J. Puritz,et al.  dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms , 2014, PeerJ.

[40]  P. Etter,et al.  Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers , 2008, PloS one.

[41]  Patrick M Hayes,et al.  Construction and application for QTL analysis of a Restriction Site Associated DNA (RAD) linkage map in barley , 2011, BMC Genomics.

[42]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[43]  A. Amores,et al.  Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences , 2011, G3: Genes | Genomes | Genetics.

[44]  J. Jansen,et al.  Linkage analysis in a full-sib family of an outbreeding plant species: overview and consequences for applications , 1997 .

[45]  H. Xin,et al.  Construction of a high-density genetic map for grape using next generation restriction-site associated DNA sequencing , 2012, BMC Plant Biology.