A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses

BackgroundTo obtain predictions that are not biased by selection, the conditional mean of the breeding values must be computed given the data that were used for selection. When single nucleotide polymorphism (SNP) effects have a normal distribution, it can be argued that single-step best linear unbiased prediction (SS-BLUP) yields a conditional mean of the breeding values. Obtaining SS-BLUP, however, requires computing the inverse of the dense matrix G of genomic relationships, which will become infeasible as the number of genotyped animals increases. Also, computing G requires the frequencies of SNP alleles in the founders, which are not available in most situations. Furthermore, SS-BLUP is expected to perform poorly relative to variable selection models such as BayesB and BayesC as marker densities increase.MethodsA strategy is presented for Bayesian regression models (SSBR) that combines all available data from genotyped and non-genotyped animals, as in SS-BLUP, but accommodates a wider class of models. Our strategy uses imputed marker covariates for animals that are not genotyped, together with an appropriate residual genetic effect to accommodate deviations between true and imputed genotypes. Under normality, one formulation of SSBR yields results identical to SS-BLUP, but does not require computing G or its inverse and provides richer inferences. At present, Bayesian regression analyses are used with a few thousand genotyped individuals. However, when SSBR is applied to all animals in a breeding program, there will be a 100 to 200-fold increase in the number of animals and an associated 100 to 200-fold increase in computing time. Parallel computing strategies can be used to reduce computing time. In one such strategy, a 58-fold speedup was achieved using 120 cores.DiscussionIn SSBR and SS-BLUP, phenotype, genotype and pedigree information are combined in a single-step. Unlike SS-BLUP, SSBR is not limited to normally distributed marker effects; it can be used when marker effects have a t distribution, as in BayesA, or mixture distributions, as in BayesB or BayesC π. Furthermore, it has the advantage that matrix inversion is not required. We have investigated parallel computing to speedup SSBR analyses so they can be used for routine applications.

[1]  Daniel Gianola,et al.  Bayesian Methods in Animal Breeding Theory , 1986 .

[2]  M. Goddard,et al.  Genomic selection. , 2007, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[3]  Cedric Gondro,et al.  Genome-Wide Association Studies and Genomic Prediction , 2013, Methods in Molecular Biology.

[4]  I. Misztal,et al.  A recursive algorithm for decomposition and creation of the inverse of the genomic relationship matrix. , 2012, Journal of dairy science.

[5]  C. R. Henderson Applications of linear models in animal breeding , 1984 .

[6]  Daniel Gianola,et al.  Additive Genetic Variability and the Bayesian Alphabet , 2009, Genetics.

[7]  A Legarra,et al.  Computational strategies for national integration of phenotypic, genomic, and pedigree data in a single-step best linear unbiased prediction. , 2012, Journal of dairy science.

[8]  L. García-Cortés,et al.  On a multivariate implementation of the Gibbs sampler , 1996, Genetics Selection Evolution.

[9]  D. Garrick,et al.  Technical note: Derivation of equivalent computing algorithms for genomic predictions and reliabilities of animal merit. , 2009, Journal of dairy science.

[10]  I Misztal,et al.  Computing procedures for genetic evaluation including phenotypic, full pedigree, and genomic information. , 2009, Journal of dairy science.

[11]  C. R. Henderson,et al.  Best linear unbiased estimation and prediction under a selection model. , 1975, Biometrics.

[12]  Selection on selected records , 1983, Génétique, sélection, évolution.

[13]  R. Fernando,et al.  The Impact of Genetic Relationship Information on Genome-Assisted Breeding Values , 2007, Genetics.

[14]  Daniel Gianola,et al.  Advances in Statistical Methods for Genetic Improvement of Livestock , 1990, Advanced Series in Agricultural Sciences.

[15]  P. VanRaden,et al.  Efficient methods to compute genomic predictions. , 2008, Journal of dairy science.

[16]  I Misztal,et al.  A relationship matrix including full pedigree and genomic information. , 2009, Journal of dairy science.

[17]  R. Fernando,et al.  A Two-Stage Approximation for Analysis of Mixture Genetic Models in Large Pedigrees , 2010, Genetics.

[18]  D. Gianola,et al.  Statistical Inferences in Populations Undergoing Selection or Non-Random Mating , 1990 .

[19]  M Grossman,et al.  Marker assisted selection using best linear unbiased prediction , 1989, Genetics Selection Evolution.

[20]  C. R. Henderson A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding values , 1976 .

[21]  J. Woolliams,et al.  The unified approach to the use of genomic and pedigree information in genomic evaluations revisited. , 2011, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[22]  M. Lund,et al.  Genomic prediction when some animals are not genotyped , 2010, Genetics Selection Evolution.

[23]  I Misztal,et al.  Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. , 2010, Journal of dairy science.

[24]  M. Goddard,et al.  Invited review: Genomic selection in dairy cattle: progress and challenges. , 2009, Journal of dairy science.

[25]  Rohan L. Fernando,et al.  Extension of the bayesian alphabet for genomic selection , 2011, BMC Bioinformatics.

[26]  P. VanRaden,et al.  Invited review: reliability of genomic predictions for North American Holstein bulls. , 2009, Journal of dairy science.

[27]  R. Fernando,et al.  Accuracies of genomic breeding values in American Angus beef cattle using K-means clustering for cross-validation , 2011, Genetics Selection Evolution.

[28]  José Crossa,et al.  Predicting Quantitative Traits With Regression Models for Dense Molecular Markers and Pedigree , 2009, Genetics.

[29]  D. Gianola,et al.  Likelihood inferences in animal breeding under selection: a missing-data theory view point , 1989, Genetics Selection Evolution.

[30]  A. Nejati-Javaremi,et al.  Effect of total allelic relationship on accuracy of evaluation and response to selection. , 1997, Journal of animal science.

[31]  R. Fernando,et al.  Breeding value prediction for production traits in layer chickens using pedigree or genomic relationships in a reduced animal model , 2011, Genetics Selection Evolution.

[32]  D Gianola,et al.  Inferring the trajectory of genetic variance in the course of artificial selection. , 2001, Genetical research.

[33]  R. Fernando,et al.  Genomic prediction of simulated multibreed and purebred performance using observed fifty thousand single nucleotide polymorphism genotypes. , 2010, Journal of animal science.

[34]  B. Kinghorn,et al.  A phasing and imputation method for pedigreed populations that results in a single-stage genomic evaluation , 2012, Genetics Selection Evolution.

[35]  Ismo Strandén,et al.  Allele coding in genomic evaluation , 2011, Genetics Selection Evolution.

[36]  I Misztal,et al.  Technical note: Computing strategies in genome-wide selection. , 2008, Journal of dairy science.

[37]  Dorian Garrick,et al.  Bayesian methods applied to GWAS. , 2013, Methods in molecular biology.

[38]  M. Goddard,et al.  Prediction of total genetic value using genome-wide dense marker maps. , 2001, Genetics.

[39]  M. Goddard,et al.  The Use of Family Relationships and Linkage Disequilibrium to Impute Phase and Missing Genotypes in Up to Whole-Genome Sequence Density Genotypic Data , 2010, Genetics.

[40]  S. R. Searle,et al.  The estimation of environmental and genetic trends from records subject to culling. , 1959 .

[41]  M. Calus,et al.  Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding , 2013, Genetics.

[42]  D. Garrick The nature, scope and impact of genomic prediction in beef cattle in the United States , 2011, Genetics Selection Evolution.

[43]  Karl Pearson,et al.  Mathematical Contributions to the Theory of Evolution. XI. On the Influence of Natural Selection on the Variability and Correlation of Organs , 1903 .