Genomic selection and complex trait prediction using a fast EM algorithm applied to genome-wide markers

BackgroundThe information provided by dense genome-wide markers using high throughput technology is of considerable potential in human disease studies and livestock breeding programs. Genome-wide association studies relate individual single nucleotide polymorphisms (SNP) from dense SNP panels to individual measurements of complex traits, with the underlying assumption being that any association is caused by linkage disequilibrium (LD) between SNP and quantitative trait loci (QTL) affecting the trait. Often SNP are in genomic regions of no trait variation. Whole genome Bayesian models are an effective way of incorporating this and other important prior information into modelling. However a full Bayesian analysis is often not feasible due to the large computational time involved.ResultsThis article proposes an expectation-maximization (EM) algorithm called emBayesB which allows only a proportion of SNP to be in LD with QTL and incorporates prior information about the distribution of SNP effects. The posterior probability of being in LD with at least one QTL is calculated for each SNP along with estimates of the hyperparameters for the mixture prior. A simulated example of genomic selection from an international workshop is used to demonstrate the features of the EM algorithm. The accuracy of prediction is comparable to a full Bayesian analysis but the EM algorithm is considerably faster. The EM algorithm was accurate in locating QTL which explained more than 1% of the total genetic variation. A computational algorithm for very large SNP panels is described.ConclusionsemBayesB is a fast and accurate EM algorithm for implementing genomic selection and predicting complex traits by mapping QTL in genome-wide dense SNP marker data. Its accuracy is similar to Bayesian methods but it takes only a fraction of the time.

[1]  Mark I McCarthy,et al.  Type 2 diabetes: new genes, new understanding. , 2008, Trends in genetics : TIG.

[2]  Judy H. Cho,et al.  Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease , 2008, Nature Genetics.

[3]  J. Woolliams,et al.  The Impact of Genetic Architecture on Genome-Wide Evaluation Methods , 2010, Genetics.

[4]  M. Lund,et al.  Bmc Proceedings Comparison of Analyses of the Qtlmas Xii Common Dataset. I: Genomic Selection , 2022 .

[5]  David M. Evans,et al.  Genome-wide association analysis identifies 20 loci that influence adult height , 2008, Nature Genetics.

[6]  N. Yi,et al.  Bayesian LASSO for Quantitative Trait Loci Mapping , 2008, Genetics.

[7]  John A Woolliams,et al.  A fast algorithm for BayesB type of prediction of genome-wide estimates of genetic value , 2009, Genetics Selection Evolution.

[8]  I. Johnstone,et al.  Empirical Bayes selection of wavelet thresholds , 2005, math/0508281.

[9]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[10]  Comparison of analyses of the QTLMAS XIV common dataset. I: genomic selection , 2009 .

[11]  Takeshi Hayashi,et al.  EM algorithm for Bayesian estimation of genomic breeding values , 2010, BMC Genetics.

[12]  I Misztal,et al.  Technical note: Computing strategies in genome-wide selection. , 2008, Journal of dairy science.

[13]  M. Goddard,et al.  Prediction of total genetic value using genome-wide dense marker maps. , 2001, Genetics.

[14]  José Crossa,et al.  Predicting Quantitative Traits With Regression Models for Dense Molecular Markers and Pedigree , 2009, Genetics.

[15]  Dean Phillips Foster,et al.  Calibration and Empirical Bayes Variable Selection , 1997 .

[16]  G. McLachlan,et al.  The EM Algorithm and Extensions: Second Edition , 2008 .

[17]  Shizhong Xu Estimating polygenic effects using markers of the entire genome. , 2003, Genetics.

[18]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[19]  P. VanRaden,et al.  Efficient methods to compute genomic predictions. , 2008, Journal of dairy science.

[20]  M. Goddard,et al.  Genomic selection. , 2007, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[21]  Örjan Carlborg,et al.  Overview – dataset comparison II Comparison of analyses of the QTLMAS XII common dataset . II : genome-wide association and fine mapping , 2009 .

[22]  S. Leeder,et al.  A population based study , 1993, The Medical journal of Australia.

[23]  Cajo J. F. ter Braak,et al.  Bayesian sigmoid shrinkage with improper variance priors and an application to wavelet denoising , 2006, Comput. Stat. Data Anal..

[24]  B. J. Hayes,et al.  Genomic selection: Genomic selection , 2007 .

[25]  M. Goddard Genomic selection: prediction of accuracy and maximisation of long term response , 2009, Genetica.

[26]  A. Cecile J.W. Janssens,et al.  Predicting Type 2 Diabetes Based on Polymorphisms From Genome-Wide Association Studies , 2008, Diabetes.