Genomic prediction of dichotomous traits with Bayesian logistic models

Bayesian methods are a popular choice for genomic prediction of genotypic values. The methodology is well established for traits with approximately Gaussian phenotypic distribution. However, numerous important traits are of dichotomous nature and the phenotypic counts observed follow a Binomial distribution. The standard Gaussian generalized linear models (GLM) are not statistically valid for this type of data. Therefore, we implemented Binomial GLM with logit link function for the BayesB and Bayesian GBLUP genomic prediction methods. We compared these models with their standard Gaussian counterparts using two experimental data sets from plant breeding, one on female fertility in wheat and one on haploid induction in maize, as well as a simulated data set. With the aid of the simulated data referring to a bi-parental population of doubled haploid lines, we further investigated the influence of training set size (N), number of independent Bernoulli trials for trait evaluation (ni) and genetic architecture of the trait on genomic prediction accuracies and abilities in general and on the relative performance of our models. For BayesB, we in addition implemented finite mixture Binomial GLM to account for overdispersion. We found that prediction accuracies increased with increasing N and ni. For the simulated and experimental data sets, we found Binomial GLM to be superior to Gaussian models for small ni, but that for large ni Gaussian models might be used as ad hoc approximations. We further show with simulated and real data sets that accounting for overdispersion in Binomial data can markedly increase the prediction accuracy.

[1]  Dipak K. Dey,et al.  Overdispersed Generalized Linear Models , 1997 .

[2]  W. Jin,et al.  Morphological and molecular evidences for DNA introgression in haploid induction via a high oil inducer CAUHOI in maize , 2009, Planta.

[3]  P. Barret,et al.  A major locus expressed in the male gametophyte with incomplete penetrance is responsible for in situ gynogenesis in maize , 2008, Theoretical and Applied Genetics.

[4]  P. Visscher,et al.  Estimating missing heritability for disease from genome-wide association studies. , 2011, American journal of human genetics.

[5]  M. Lund,et al.  The importance of haplotype length and heritability using genomic selection in dairy cattle. , 2009, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[6]  R. Frankham Introduction to quantitative genetics (4th edn): by Douglas S. Falconer and Trudy F.C. Mackay Longman, 1996. £24.99 pbk (xv and 464 pages) ISBN 0582 24302 5 , 1996 .

[7]  F. Wang,et al.  Efficient mapping of a female sterile gene in wheat (Triticum aestivum L.). , 2009, Genetics research.

[8]  Yuan-Ming Zhang,et al.  Further mapping of quantitative trait loci for female sterility in wheat (Triticum aestivum L.). , 2010, Genetics research.

[9]  X L Meng,et al.  The EM algorithm and medical studies: a historical linik , 1997, Statistical methods in medical research.

[10]  R. Tempelman,et al.  A Bayesian Antedependence Model for Whole Genome Prediction , 2012, Genetics.

[11]  J. Hickey,et al.  Different models of genetic variation and their effect on genomic evaluation , 2011, Genetics Selection Evolution.

[12]  A. Melchinger,et al.  Haploid fertility in temperate and tropical maize germplasm , 2012 .

[13]  Sylvia Frühwirth-Schnatter,et al.  Efficient MCMC for Binomial Logit Models , 2013, TOMC.

[14]  Leonhard Held,et al.  Improved auxiliary mixture sampling for hierarchical models of non-Gaussian data , 2009, Stat. Comput..

[15]  Peter M Visscher,et al.  Prediction of individual genetic risk of complex disease. , 2008, Current opinion in genetics & development.

[16]  A. Gelfand,et al.  Identifiability, Improper Priors, and Gibbs Sampling for Generalized Linear Models , 1999 .

[17]  S. Xu,et al.  Generalized linear mixed models for mapping multiple quantitative trait loci , 2012, Heredity.

[18]  P. VanRaden,et al.  Efficient methods to compute genomic predictions. , 2008, Journal of dairy science.

[19]  P. Lashermes,et al.  Genetic control of maternal haploidy in maize (Zea mays L.) and selection of haploid inducing lines , 1988, Theoretical and Applied Genetics.

[20]  Liang Li,et al.  New Insights into the Genetics of in Vivo Induction of Maternal Haploids, the Backbone of Doubled Haploid Technology in Maize , 2012, Genetics.

[21]  M. Goddard,et al.  Prediction of total genetic value using genome-wide dense marker maps. , 2001, Genetics.

[22]  M. Calus,et al.  Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding , 2013, Genetics.

[23]  Ling Jiang,et al.  Identification of a new hybrid sterility gene in rice (bi Oryza sativa L.) , 2006, Euphytica.

[24]  R. Serraj,et al.  Genetic analysis and validation of quantitative trait loci associated with reproductive-growth traits and grain yield under drought stress in a doubled haploid line population of rice (Oryza sativa L.) , 2011 .

[25]  Mikko J. Sillanpää,et al.  Back to Basics for Bayesian Model Building in Genomic Selection , 2012, Genetics.

[26]  D. Bates,et al.  Output Analysis and Diagnostics for MCMC , 2015 .

[27]  A. Melchinger,et al.  Genomic prediction of hybrid performance in maize with models incorporating dominance and population specific marker effects , 2012, Theoretical and Applied Genetics.

[28]  L. Pollak,et al.  Impact of early seed quality selection on maize inbreds and hybrids , 2007 .

[29]  Martyn Plummer,et al.  JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling , 2003 .

[30]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[31]  M. Goddard,et al.  Genetic Architecture of Complex Traits and Accuracy of Genomic Prediction: Coat Colour, Milk-Fat Percentage, and Type in Holstein Cattle as Contrasting Model Traits , 2010, PLoS genetics.

[32]  Frank Technow,et al.  R Package hypred : Simulation of Genomic Data in Applied Genetics , 2011 .

[33]  Sylvia Frühwirth-Schnatter,et al.  Finite Mixture and Markov Switching Models , 2006 .

[34]  Jean-Luc Jannink,et al.  Factors Affecting Accuracy From Genomic Selection in Populations Derived From Multiple Inbred Lines: A Barley Case Study , 2009, Genetics.