Modeling correlated marker effects in genome-wide prediction via Gaussian concentration graph models.

In genome-wide prediction, independence of marker allele substitution effects is typically assumed; however, since early stages in the evolution of this technology it has been known that nature points to correlated effects. In statistics, graphical models have been identified as a useful and powerful tool for covariance estimation in high dimensional problems and it is an area that has recently experienced a great expansion. In particular, Gaussian concentration graph models (GCGM) have been widely studied. These are models in which the distribution of a set of random variables, the marker effects in this case, is assumed to be Markov with respect to an undirected graph G. In this paper, Bayesian (Bayes G and Bayes G-D) and frequentist (GML-BLUP) methods adapting the theory of GCGM to genome-wide prediction were developed. Different approaches to define the graph G based on domain-specific knowledge were proposed, and two propositions and a corollary establishing conditions to find decomposable graphs were proven. These methods were implemented in small simulated and real datasets. In our simulations, scenarios where correlations among allelic substitution effects were expected to arise due to various causes were considered, and graphs were defined on the basis of physical marker positions. Results showed improvements in correlation between phenotypes and predicted additive genetic values and accuracies of predicted additive genetic values when accounting for partially correlated allele substitution effects. Extensions to the multiallelic loci case were described and some possible refinements incorporating more flexible priors in the Bayesian setting were discussed. Our models are promising because they allow incorporation of biological information in the prediction process, and because they are more flexible and general than other models accounting for correlated marker effects that have been proposed previously.

[1]  M. Goddard,et al.  Prediction of total genetic value using genome-wide dense marker maps. , 2001, Genetics.

[2]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[3]  K. Khare,et al.  Sparse Matrix Decompositions and Graph Characterizations , 2011, 1111.6845.

[4]  Hsun-Hsien Chang,et al.  Phenotype prediction by integrative network analysis of SNP and gene expression microarrays , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[5]  Tianxi Li,et al.  High dimensional Bayesian inference for Gaussian directed acyclic graph models , 2011, 1109.4371.

[6]  G'erard Letac,et al.  Wishart distributions for decomposable graphs , 2007, 0708.2380.

[7]  Zoubin Ghahramani,et al.  The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models , 2009, J. Mach. Learn. Res..

[8]  K. Weigel,et al.  Inferring Quantitative Trait Pathways Associated with Bull Fertility from a Genome-Wide Association Study , 2013, Front. Gene..

[9]  A. Rao,et al.  Estimation of Genetic Parameters: principles , 2003 .

[10]  Alex Lenkoski,et al.  A direct sampler for G‐Wishart variates , 2013, 1304.1350.

[11]  R. Fernando,et al.  The Impact of Genetic Relationship Information on Genome-Assisted Breeding Values , 2007, Genetics.

[12]  P. VanRaden,et al.  Efficient methods to compute genomic predictions. , 2008, Journal of dairy science.

[13]  D. Gianola,et al.  On marker-assisted prediction of genetic value: beyond the ridge. , 2003, Genetics.

[14]  Milt G. Thomas,et al.  Genomic-polygenic evaluation for ultrasound and weight traits in Angus–Brahman multibreed cattle with the Illumina3k chip , 2013 .

[15]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[16]  Daniel Gianola,et al.  Additive Genetic Variability and the Bayesian Alphabet , 2009, Genetics.

[17]  R. Tempelman,et al.  A Bayesian Antedependence Model for Whole Genome Prediction , 2012, Genetics.

[18]  H. Grüneberg,et al.  Introduction to quantitative genetics , 1960 .

[19]  David J Balding,et al.  Multiple Quantitative Trait Analysis Using Bayesian Networks , 2014, Genetics.

[20]  Carlos M. Carvalho,et al.  FLEXIBLE COVARIANCE ESTIMATION IN GRAPHICAL GAUSSIAN MODELS , 2008, 0901.3267.

[21]  M. Goddard Genomic selection: prediction of accuracy and maximisation of long term response , 2009, Genetica.

[22]  Karl J. Friston,et al.  Variance Components , 2003 .

[23]  D. Balding,et al.  Improving the efficiency of genomic selection , 2013, Statistical applications in genetics and molecular biology.

[24]  C. R. Henderson Applications of linear models in animal breeding , 1984 .

[25]  I Misztal,et al.  Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. , 2010, Journal of dairy science.

[26]  C. R. Henderson A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding values , 1976 .

[27]  P. Diaconis,et al.  Conjugate Priors for Exponential Families , 1979 .

[28]  A. Roverato Cholesky decomposition of a hyper inverse Wishart matrix , 2000 .

[29]  M. Goddard,et al.  Optimisation of response using molecular data. , 2002 .

[30]  Jean-Luc Jannink,et al.  Factors Affecting Accuracy From Genomic Selection in Populations Derived From Multiple Inbred Lines: A Barley Case Study , 2009, Genetics.

[31]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[32]  D. O. Rae,et al.  Genomic-polygenic evaluation of Angus-Brahman multibreed cattle for feed efficiency and postweaning growth using the Illumina 3K chip. , 2012, Journal of animal science.

[33]  Mauro Piccioni,et al.  Independence Structure of Natural Conjugate Densities to Exponential Families and the Gibbs' Sampler , 2000 .

[34]  B. Mallick,et al.  Bayesian Low Rank and Sparse Covariance Matrix Decomposition , 2013, 1310.4195.

[35]  D. Gianola Priors in Whole-Genome Regression: The Bayesian Alphabet Returns , 2013, Genetics.

[36]  T. Speed,et al.  Gaussian Markov Distributions over Finite Graphs , 1986 .

[37]  D. Falconer Introduction to quantitative genetics. 1. ed. , 1984 .

[38]  Vern I. Paulsen,et al.  Schur Products and Matrix Completions , 1989 .

[39]  B. J. Hayes,et al.  Genomic selection: Genomic selection , 2007 .

[40]  Kshitij Khare,et al.  Wishart distributions for decomposable covariance graph models , 2011, 1103.1768.

[41]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[42]  Nicholas Mitsakakis,et al.  A Metropolis-Hastings based method for sampling from the G-Wishart distribution in Gaussian graphical models , 2011 .

[43]  A. Roverato Hyper Inverse Wishart Distribution for Non-decomposable Graphs and its Application to Bayesian Inference for Gaussian Graphical Models , 2002 .

[44]  Robin Thompson,et al.  Estimation of genetic parameters. , 2005 .

[45]  K. Khare,et al.  A convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees , 2013, 1307.5381.

[46]  A. Dawid,et al.  Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models , 1993 .

[47]  Hua Xu,et al.  Genetic studies of complex human diseases: Characterizing SNP-disease associations using Bayesian networks , 2012, BMC Systems Biology.

[48]  M. West,et al.  Simulation of hyper-inverse Wishart distributions in graphical models , 2007 .

[49]  M. Calus,et al.  Accuracy of Genomic Selection Using Different Methods to Define Haplotypes , 2008, Genetics.

[50]  M. Goddard,et al.  Genomic selection. , 2007, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[51]  Angelo Nuzzo,et al.  Phenotype forecasting with SNPs data through gene-based Bayesian networks , 2009, BMC Bioinformatics.

[52]  Andrés Legarra,et al.  Performance of Genomic Selection in Mice , 2008, Genetics.