Using the Pareto principle in genome-wide breeding value estimation

Genome-wide breeding value (GWEBV) estimation methods can be classified based on the prior distribution assumptions of marker effects. Genome-wide BLUP methods assume a normal prior distribution for all markers with a constant variance, and are computationally fast. In Bayesian methods, more flexible prior distributions of SNP effects are applied that allow for very large SNP effects although most are small or even zero, but these prior distributions are often also computationally demanding as they rely on Monte Carlo Markov chain sampling. In this study, we adopted the Pareto principle to weight available marker loci, i.e., we consider that x% of the loci explain (100 - x)% of the total genetic variance. Assuming this principle, it is also possible to define the variances of the prior distribution of the 'big' and 'small' SNP. The relatively few large SNP explain a large proportion of the genetic variance and the majority of the SNP show small effects and explain a minor proportion of the genetic variance. We name this method MixP, where the prior distribution is a mixture of two normal distributions, i.e. one with a big variance and one with a small variance. Simulation results, using a real Norwegian Red cattle pedigree, show that MixP is at least as accurate as the other methods in all studied cases. This method also reduces the hyper-parameters of the prior distribution from 2 (proportion and variance of SNP with big effects) to 1 (proportion of SNP with big effects), assuming the overall genetic variance is known. The mixture of normal distribution prior made it possible to solve the equations iteratively, which greatly reduced computation loads by two orders of magnitude. In the era of marker density reaching million(s) and whole-genome sequence data, MixP provides a computationally feasible Bayesian method of analysis.

[1]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[2]  Robert Tibshirani,et al.  An Introduction to the Bootstrap CHAPMAN & HALL/CRC , 1993 .

[3]  A. Verbyla,et al.  Genetics Selection Evolution , 2009 .

[4]  P. VanRaden,et al.  Invited review: reliability of genomic predictions for North American Holstein bulls. , 2009, Journal of dairy science.

[5]  M. Goddard,et al.  The distribution of the effects of genes affecting quantitative traits in livestock , 2001, Genetics Selection Evolution.

[6]  R. Fernando,et al.  The Impact of Genetic Relationship Information on Genome-Assisted Breeding Values , 2007, Genetics.

[7]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[8]  T. Meuwissen,et al.  Accuracy of breeding values of 'unrelated' individuals predicted by dense SNP genotyping , 2009, Genetics Selection Evolution.

[9]  C. R. Henderson Applications of linear models in animal breeding , 1984 .

[10]  Joseph Moses Juran,et al.  The Non-Pareto Principle; Mea Culpa , 1994 .

[11]  José Crossa,et al.  Prediction of Genetic Values of Quantitative Traits in Plant Breeding Using Pedigree and Molecular Markers , 2010, Genetics.

[12]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[13]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[14]  M. Goddard,et al.  Invited review: Genomic selection in dairy cattle: progress and challenges. , 2009, Journal of dairy science.

[15]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[16]  J. Woolliams,et al.  The Impact of Genetic Architecture on Genome-Wide Evaluation Methods , 2010, Genetics.

[17]  Hans D. Daetwyler,et al.  Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach , 2008, PloS one.

[18]  R. Fernando,et al.  Genomic Selection Using Low-Density Marker Panels , 2009, Genetics.

[19]  M. Goddard,et al.  Prediction of total genetic value using genome-wide dense marker maps. , 2001, Genetics.

[20]  John A Woolliams,et al.  A fast algorithm for BayesB type of prediction of genome-wide estimates of genetic value , 2009, Genetics Selection Evolution.

[21]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[22]  J. Woolliams,et al.  The Accuracy of Genomic Selection in Norwegian Red Cattle Assessed by Cross-Validation , 2009, Genetics.

[23]  M. Goddard Genomic selection: prediction of accuracy and maximisation of long term response , 2009, Genetica.

[24]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[25]  G. Casella,et al.  The Bayesian Lasso , 2008 .