A fully automated pipeline for quantitative genotype calling from next generation sequencing data in autopolyploids

BackgroundGenotyping-by-sequencing (GBS) has been used broadly in genetic studies for several species, especially those with agricultural importance. However, its use is still limited in autopolyploid species because genotype calling software generally fails to properly distinguish heterozygous classes based on allele dosage.ResultsVCF2SM is a Python script that integrates sequencing depth information of polymorphisms in variant call format (VCF) files and SuperMASSA software for quantitative genotype calling. VCFs can be obtained from any variant discovery software that outputs exact allele sequencing depth, such as a modified version of the Tassel-GBS pipeline provided here. VCF2SM was successfully applied in analyzing GBS data from diverse panels (alfalfa and potato) and full-sib mapping populations (alfalfa and switchgrass) of polyploid species.ConclusionsWe demonstrate that our approach can help plant geneticists working with autopolyploid species to advance their studies by distinguishing allele dosage from GBS data.

[1]  R. Sederoff,et al.  Genetic linkage maps of Eucalyptus grandis and Eucalyptus urophylla using a pseudo-testcross: mapping strategy and RAPD markers. , 1994, Genetics.

[2]  P. Muñoz,et al.  AGHmatrix: R Package to Construct Relationship Matrices for Autotetraploid and Diploid Species: A Blueberry Example , 2016, The plant genome.

[3]  A. Seguro,et al.  Human Hemorrhagic Pulmonary Leptospirosis: Pathological Findings and Pathophysiological Correlations , 2013, PloS one.

[4]  Long-Xi Yu,et al.  Genome-Wide Association Study Identifies Loci for Salt Tolerance during Germination in Autotetraploid Alfalfa (Medicago sativa L.) Using Genotyping-by-Sequencing , 2016, Front. Plant Sci..

[5]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[6]  Jeffrey B. Endelman,et al.  Software for Genome‐Wide Association Studies in Autopolyploids and Its Application to Potato , 2016, The plant genome.

[7]  C. B. Cardoso-Silva,et al.  GBS-based single dosage markers for linkage and QTL mapping allow gene mining for yield-related traits in sugarcane , 2017, BMC Genomics.

[8]  Rongling Wu,et al.  Simultaneous maximum likelihood estimation of linkage and linkage phases in outcrossing species. , 2002, Theoretical population biology.

[9]  Joachim Selbig,et al.  pcaMethods - a bioconductor package providing PCA methods for incomplete data , 2007, Bioinform..

[10]  J. Ooijen,et al.  JoinMap® 4, Software for the calculation of genetic linkage maps in experimental populations , 2006 .

[11]  Shelby L. Bidwell,et al.  An improved genome release (version Mt4.0) for the model legume Medicago truncatula , 2014, BMC Genomics.

[12]  Liuda Ziaugra,et al.  SNP Genotyping Using the Sequenom MassARRAY iPLEX Platform , 2009, Current protocols in human genetics.

[13]  Robert J. Elshire,et al.  A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species , 2011, PloS one.

[14]  Gabor T. Marth,et al.  Haplotype-based variant detection from short-read sequencing , 2012, 1207.3907.

[15]  Trevor W. Rife,et al.  Genotyping‐by‐Sequencing for Plant Breeding and Genetics , 2012 .

[16]  O. Serang,et al.  SNP genotyping allows an in-depth characterisation of the genome of sugarcane and other complex autopolyploids , 2013, Scientific Reports.

[17]  Jiangfeng He,et al.  Genotyping-by-sequencing (GBS), an ultimate marker-assisted selection (MAS) tool to accelerate plant breeding , 2014, Front. Plant Sci..

[18]  Gary F. Egan,et al.  Multichannel Compressive Sensing MRI Using Noiselet Encoding , 2014, PloS one.

[19]  S. Narum,et al.  Genotyping‐by‐sequencing in ecological and conservation genomics , 2013, Molecular ecology.

[20]  R. Visser,et al.  Correction: A Next-Generation Sequencing Method for Genotyping-by-Sequencing of Highly Heterozygous Autotetraploid Potato , 2015, PloS one.

[21]  Nick C Fox,et al.  Gene-Wide Analysis Detects Two New Susceptibility Genes for Alzheimer's Disease , 2014, PLoS ONE.

[22]  R. Tingley,et al.  Desiccation Risk Drives the Spatial Ecology of an Invasive Anuran (Rhinella marina) in the Australian Semi-Desert , 2011, PloS one.

[23]  O. Serang,et al.  Quantitative SNP genotyping of polyploids with MassARRAY and other platforms. , 2015, Methods in molecular biology.

[24]  Noriko Yoshimura,et al.  Descriptive Epidemiology of Somatising Tendency: Findings from the CUPID Study , 2016, PloS one.

[25]  A. A. Garcia,et al.  Molecular polymorphism and linkage analysis in sweet passion fruit, an outcrossing species , 2013 .

[26]  Robert J. Elshire,et al.  Switchgrass Genomic Diversity, Ploidy, and Evolution: Novel Insights from a Network-Based SNP Discovery Protocol , 2013, PLoS genetics.

[27]  Robert J. Elshire,et al.  TASSEL-GBS: A High Capacity Genotyping by Sequencing Analysis Pipeline , 2014, PloS one.

[28]  Margaret A. Strong,et al.  Extreme Telomere Length Dimorphism in the Tasmanian Devil and Related Marsupials Suggests Parental Control of Telomere Length , 2012, PloS one.

[29]  Anete P. Souza,et al.  Development of an integrated genetic map of a sugarcane (Saccharum spp.) commercial cross, based on a maximum-likelihood approach for estimation of linkage and linkage phases , 2005, Theoretical and Applied Genetics.

[30]  Liwei Sun,et al.  Proteomic Analyses Provide Novel Insights into Plant Growth and Ginsenoside Biosynthesis in Forest Cultivated Panax ginseng (F. Ginseng) , 2016, Front. Plant Sci..

[31]  Roeland E. Voorrips,et al.  Genotype calling in tetraploid species from bi-allelic marker data using mixture models , 2011, BMC Bioinformatics.

[32]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[33]  G. Covarrubias-Pazaran Genome-Assisted Prediction of Quantitative Traits Using the R Package sommer , 2016, PloS one.

[34]  E. Brummer,et al.  A Saturated Genetic Linkage Map of Autotetraploid Alfalfa (Medicago sativa L.) Developed Using Genotyping-by-Sequencing Is Highly Syntenous with the Medicago truncatula Genome , 2014, G3: Genes, Genomes, Genetics.

[35]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[36]  Sarah Hearne,et al.  Novel Methods to Optimize Genotypic Imputation for Low‐Coverage, Next‐Generation Sequence Data in Crop Plants , 2014 .

[37]  Anete P. Souza,et al.  OneMap: software for genetic mapping in outcrossing species. , 2007, Hereditas.

[38]  RNA-Seq study reveals genetic responses of diverse wild soybean accessions to increased ozone levels , 2017, BMC Genomics.

[39]  Long-Xi Yu,et al.  Identification of Loci Associated with Drought Resistance Traits in Heterozygous Autotetraploid Alfalfa (Medicago sativa L.) Using Genome-Wide Association Studies with Genotyping by Sequencing , 2015, PloS one.

[40]  Roeland E. Voorrips,et al.  Software for the calculation of genetic linkage maps , 2001 .

[41]  Ivan D. Rukhlenko,et al.  Quantum-dot supercrystals for future nanophotonics , 2013, Scientific Reports.

[42]  J. Jansen,et al.  Linkage analysis in a full-sib family of an outbreeding plant species: overview and consequences for applications , 1997 .

[43]  C. Hackett,et al.  QTL mapping in autotetraploids using SNP dosage information , 2014, Theoretical and Applied Genetics.

[44]  Andrew H. Paterson,et al.  Application of genotyping by sequencing technology to a variety of crop breeding programs. , 2016, Plant science : an international journal of experimental plant biology.

[45]  Samuel S. Wu,et al.  Linkage mapping of sex-specific differences. , 2002, Genetical research.

[46]  Oliver Serang,et al.  Efficient Exact Maximum a Posteriori Computation for Bayesian SNP Genotyping in Polyploids , 2012, PloS one.