Parentage assignment with genotyping‐by‐sequencing data

Abstract In this paper, we evaluate using genotype‐by‐sequencing (GBS) data to perform parentage assignment in lieu of traditional array data. The use of GBS data raises two issues: First, for low‐coverage (e.g., <2×) GBS data, it may not be possible to call the genotype at many loci, a critical first step for detecting opposing homozygous markers. Second, the amount of sequencing coverage may vary across individuals, making it challenging to directly compare the likelihood scores between putative parents. To address these issues, we extend the probabilistic framework of Huisman (Molecular Ecology Resources, 2017, 17, 1009) and evaluate putative parents by comparing their (potentially noisy) genotypes to a series of proposal distributions. These distributions describe the expected genotype probabilities for the relatives of an individual. We assign putative parents as a parent if they are classified as a parent (as opposed to e.g., an unrelated individual), and if the assignment score passes a threshold. We evaluated this method on simulated data and found that (a) high‐coverage (>2×) GBS data performs similarly to array data and requires only a small number of markers to correctly assign parents and (b) low‐coverage GBS data (as low as 0.1×) can also be used, provided that it is obtained across a large number of markers. When analysing the low‐coverage GBS data, we also found a high number of false positives if the true parent is not contained within the list of candidate parents, but that this false positive rate can be greatly reduced by hand tuning the assignment threshold. We provide this parentage assignment method as a standalone program called AlphaAssign.

[1]  C. Maltecca,et al.  Assessment of alternative genotyping strategies to maximize imputation accuracy at minimal cost , 2012, Genetics Selection Evolution.

[2]  J. Grefenstette,et al.  High-resolution haplotype block structure in the cattle genome , 2009, BMC Genetics.

[3]  A. Whalen,et al.  Hybrid peeling for fast and accurate calling, phasing, and imputation with sequence data of any coverage in pedigrees , 2017, Genetics Selection Evolution.

[4]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[5]  J. Woolliams,et al.  Persistence of accuracy of genome-wide breeding values over generations when including a polygenic effect , 2009, Genetics Selection Evolution.

[6]  I Misztal,et al.  A relationship matrix including full pedigree and genomic information. , 2009, Journal of dairy science.

[7]  R. Brauning,et al.  Exclusion and Genomic Relatedness Methods for Assignment of Parentage Using Genotyping-by-Sequencing Data , 2019, G3: Genes, Genomes, Genetics.

[8]  Jeffrey R. O’Connell,et al.  Fast imputation using medium or low-coverage sequence data , 2014, BMC Genetics.

[9]  Development of a SNP panel dedicated to parentage assignment in French sheep populations , 2017, BMC Genetics.

[10]  T. Meuwissen,et al.  Using genomic relationship likelihood for parentage assignment , 2018, Genetics, selection, evolution : GSE.

[11]  T. C. Marshall,et al.  Statistical confidence for likelihood‐based paternity inference in natural populations , 1998, Molecular ecology.

[12]  Sunday O. Peters,et al.  Genotyping-by-Sequencing (GBS): A Novel, Efficient and Cost-Effective Genotyping Method for Cattle Using Next-Generation Sequencing , 2013, PloS one.

[13]  J. Woolliams,et al.  Genetic contributions and their optimization. , 2015, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[14]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[15]  T. Morin,et al.  Genome-wide association and genomic prediction of resistance to viral nervous necrosis in European sea bass ( Dicentrarchus labrax ) using RAD sequencing , 2018 .

[16]  Peter F. Stadler,et al.  FRANz: reconstruction of wild multi-generation pedigrees , 2009, Bioinform..

[17]  C. Gondro,et al.  How many markers are enough? Factors influencing parentage testing in different livestock populations. , 2016, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[18]  L. Bargelloni,et al.  Applications of genotyping by sequencing in aquaculture breeding and genetics , 2017, Reviews in aquaculture.

[19]  Elizabeth A. Thompson,et al.  The relationship between single parent and parent pair genetic likelihoods in genealogy reconstruction , 1986 .

[20]  Gary K. Chen,et al.  Fast and flexible simulation of DNA sequence data. , 2008, Genome research.

[21]  V. Loeschcke,et al.  Effectiveness of microsatellite and SNP markers for parentage and identity analysis in species with low genetic diversity: the case of European bison , 2009, Heredity.

[22]  P. Etter,et al.  Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers , 2008, PloS one.

[23]  R. Spelman,et al.  The number of single nucleotide polymorphisms and on-farm data required for whole-herd parentage testing in dairy cattle herds. , 2009, Journal of dairy science.

[24]  S. Kalinowski,et al.  Revising how the computer program cervus accommodates genotyping error increases success in paternity assignment , 2007, Molecular ecology.

[25]  Low-depth genotyping-by-sequencing (GBS) in a bovine population: strategies to maximize the selection of high quality genotypes and the accuracy of imputation , 2017, BMC Genetics.

[26]  Robert J. Elshire,et al.  A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species , 2011, PloS one.

[27]  H. Schat,et al.  A general model for the genetic control of copper tolerance in Silene vulgaris: evidence from crosses between plants from different tolerant populations , 1993, Heredity.

[28]  R. Elston,et al.  A general model for the genetic analysis of pedigree data. , 1971, Human heredity.

[29]  Jean-Luc Jannink,et al.  Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data , 2016, PloS one.

[30]  J. Huisman,et al.  Pedigree reconstruction from SNP data: parentage assignment, sibship clustering and beyond , 2017, Molecular ecology resources.

[31]  M. Goddard,et al.  The Use of Family Relationships and Linkage Disequilibrium to Impute Phase and Missing Genotypes in Up to Whole-Genome Sequence Density Genotypic Data , 2010, Genetics.