Priors in Whole-Genome Regression: The Bayesian Alphabet Returns

Whole-genome enabled prediction of complex traits has received enormous attention in animal and plant breeding and is making inroads into human and even Drosophila genetics. The term “Bayesian alphabet” denotes a growing number of letters of the alphabet used to denote various Bayesian linear regressions that differ in the priors adopted, while sharing the same sampling model. We explore the role of the prior distribution in whole-genome regression models for dissecting complex traits in what is now a standard situation with genomic data where the number of unknown parameters (p) typically exceeds sample size (n). Members of the alphabet aim to confront this overparameterization in various manners, but it is shown here that the prior is always influential, unless n ≫ p. This happens because parameters are not likelihood identified, so Bayesian learning is imperfect. Since inferences are not devoid of the influence of the prior, claims about genetic architecture from these methods should be taken with caution. However, all such procedures may deliver reasonable predictions of complex traits, provided that some parameters (“tuning knobs”) are assessed via a properly conducted cross-validation. It is concluded that members of the alphabet have a room in whole-genome prediction of phenotypes, but have somewhat doubtful inferential value, at least when sample size is such that n ≪ p.

[1]  Daniel Gianola,et al.  Sensitivity to prior specification in Bayesian genome-based prediction models , 2013, Statistical applications in genetics and molecular biology.

[2]  M. Calus,et al.  Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding , 2013, Genetics.

[3]  D. Allison,et al.  A Comprehensive Genetic Approach for Improving Prediction of Skin Cancer Risk in Humans , 2012, Genetics.

[4]  Jean-Luc Jannink,et al.  Multiple-Trait Genomic Selection Methods Increase Genetic Value Prediction Accuracy , 2012, Genetics.

[5]  Dorian J. Garrick,et al.  A Fast EM Algorithm for BayesA-Like Prediction of Genomic Breeding Values , 2012, PloS one.

[6]  Gustavo de los Campos,et al.  Inferences from Genomic Models in Stratified Populations , 2012, Genetics.

[7]  Matthias Dehmer,et al.  Statistical and Machine Learning Approaches for Network Analysis , 2012 .

[8]  D. Allison,et al.  Prediction of Expected Years of Life Using Whole-Genome Markers , 2012, PloS one.

[9]  M Erbe,et al.  Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. , 2012, Journal of dairy science.

[10]  Mikko J. Sillanpää,et al.  Back to Basics for Bayesian Model Building in Genomic Selection , 2012, Genetics.

[11]  J. E. Cairns,et al.  Genome-enabled prediction of genetic values using radial basis function neural networks , 2012, Theoretical and Applied Genetics.

[12]  W. G. Hill Quantitative Genetics in the Genomics Era , 2012, Current genomics.

[13]  A Legarra,et al.  Genomic selection in the French Lacaune dairy sheep breed. , 2012, Journal of dairy science.

[14]  Daniel Gianola,et al.  Using Whole-Genome Sequence Data to Predict Quantitative Trait Phenotypes in Drosophila melanogaster , 2012, PLoS genetics.

[15]  R. Jewkes,et al.  Perceptions and Experiences of Research Participants on Gender-Based Violence Community Based Survey: Implications for Ethical Guidelines , 2012, PloS one.

[16]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[17]  R. Wellmann,et al.  Bayesian models with dominance effects for genomic evaluation of quantitative traits. , 2012, Genetics research.

[18]  王重龙,et al.  Bayesian methods for estimating GEBVs of threshold traits , 2012 .

[19]  Hsiao-Pei Yang,et al.  Genomic Selection in Plant Breeding: A Comparison of Models , 2012 .

[20]  Simon Rogers,et al.  A First Course in Machine Learning , 2011, Chapman and Hall / CRC machine learning and pattern recognition series.

[21]  L. Held,et al.  Sensitivity analysis in Bayesian generalized linear mixed models for binary data , 2011 .

[22]  D. Allison,et al.  Beyond Missing Heritability: Prediction of Complex Traits , 2011, PLoS genetics.

[23]  C. Robert-Granié,et al.  Improved Lasso for genomic selection. , 2011, Genetics research.

[24]  Aaron J. Lorenz,et al.  Genomic Selection in Plant Breeding , 2011 .

[25]  Rohan L. Fernando,et al.  Extension of the bayesian alphabet for genomic selection , 2011, BMC Bioinformatics.

[26]  Daniel Gianola,et al.  "Likelihood, Bayesian, and Mcmc Methods in Quantitative Genetics" , 2010 .

[27]  D Gianola,et al.  Predictive ability of subsets of single nucleotide polymorphisms with and without parent average in US Holsteins. , 2010, Journal of dairy science.

[28]  Daniel Gianola,et al.  Predicting genetic predisposition in humans: the promise of whole-genome markers , 2010, Nature Reviews Genetics.

[29]  Crispin M. Mutshinda,et al.  Extended Bayesian LASSO for Multiple Quantitative Trait Loci Mapping and Unobserved Phenotype Prediction , 2010, Genetics.

[30]  José Crossa,et al.  Prediction of Genetic Values of Quantitative Traits in Plant Breeding Using Pedigree and Molecular Markers , 2010, Genetics.

[31]  José Crossa,et al.  Genomic‐Enabled Prediction Based on Molecular Markers and Pedigree Using the Bayesian Linear Regression Package in R , 2010, The plant genome.

[32]  Michael E Goddard,et al.  Sensitivity of genomic selection to using different prior distributions , 2010, BMC proceedings.

[33]  D Gianola,et al.  Predictive ability of direct genomic values for lifetime net merit of Holstein sires using selected subsets of single nucleotide polymorphism markers. , 2009, Journal of dairy science.

[34]  Daniel Gianola,et al.  Additive Genetic Variability and the Bayesian Alphabet , 2009, Genetics.

[35]  D Gianola,et al.  Reproducing kernel Hilbert spaces regression: a general framework for genetic evaluation. , 2009, Journal of animal science.

[36]  José Crossa,et al.  Predicting Quantitative Traits With Regression Models for Dense Molecular Markers and Pedigree , 2009, Genetics.

[37]  John A Woolliams,et al.  A fast algorithm for BayesB type of prediction of genome-wide estimates of genetic value , 2009, Genetics Selection Evolution.

[38]  Didier Boichard,et al.  GSE is now an open access journal published by BioMed Central , 2009, Genetics Selection Evolution.

[39]  P. VanRaden,et al.  Efficient methods to compute genomic predictions. , 2008, Journal of dairy science.

[40]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[41]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[42]  D. Gianola,et al.  On the Quantitative Genetics of Mixture Characters , 2006, Genetics.

[43]  L. Wasserman All of Nonparametric Statistics , 2005 .

[44]  R. Waagepetersen,et al.  Normal linear models with genetically structured residual variance heterogeneity: a case study. , 2003, Genetical research.

[45]  D. Sengupta Linear models , 2003 .

[46]  David Ruppert,et al.  Semiparametric Regression: Computational Issues , 2003 .

[47]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[48]  M. Goddard,et al.  Prediction of total genetic value using genome-wide dense marker maps. , 2001, Genetics.

[49]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[50]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[51]  C. Robert,et al.  Computational and Inferential Difficulties with Mixture Posterior Distributions , 2000 .

[52]  A. Gelfand,et al.  Identifiability, Improper Priors, and Gibbs Sampling for Generalized Linear Models , 1999 .

[53]  A. Rukhin Bayes and Empirical Bayes Methods for Data Analysis , 1997 .

[54]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[55]  Raphael Mrode,et al.  Linear models for the prediction of animal breeding values , 1996 .

[56]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[57]  P. J. Mason,et al.  Comparison of models , 1996 .

[58]  Seymour Geisser,et al.  8. Predictive Inference: An Introduction , 1995 .

[59]  G. Robinson That BLUP is a Good Thing: The Estimation of Random Effects , 1991 .

[60]  Daniel Gianola,et al.  Bayesian Methods in Animal Breeding Theory , 1986 .

[61]  C. R. Henderson Applications of linear models in animal breeding , 1984 .

[62]  A. Dawid Conditional Independence in Statistical Theory , 1979 .

[63]  C. R. Henderson Best Linear Unbiased Prediction of Breeding Values Not in the Model for Records , 1977 .

[64]  H. Grüneberg,et al.  Introduction to quantitative genetics , 1960 .

[65]  A. Robertson Prediction Equations in Quantitative Genetics , 1955 .

[66]  Frank Sandon,et al.  The Advanced Theory of Statistics. II , 1947, The Mathematical Gazette.