Multiple Quantitative Trait Analysis Using Bayesian Networks

Models for genome-wide prediction and association studies usually target a single phenotypic trait. However, in animal and plant genetics it is common to record information on multiple phenotypes for each individual that will be genotyped. Modeling traits individually disregards the fact that they are most likely associated due to pleiotropy and shared biological basis, thus providing only a partial, confounded view of genetic effects and phenotypic interactions. In this article we use data from a Multiparent Advanced Generation Inter-Cross (MAGIC) winter wheat population to explore Bayesian networks as a convenient and interpretable framework for the simultaneous modeling of multiple quantitative traits. We show that they are equivalent to multivariate genetic best linear unbiased prediction (GBLUP) and that they are competitive with single-trait elastic net and single-trait GBLUP in predictive performance. Finally, we discuss their relationship with other additive-effects models and their advantages in inference and interpretation. MAGIC populations provide an ideal setting for this kind of investigation because the very low population structure and large sample size result in predictive models with good power and limited confounding due to relatedness.

[1]  Radhakrishnan Nagarajan,et al.  Identifying significant edges in graphical models of molecular networks , 2011, Artif. Intell. Medicine.

[2]  Paola Sebastiani,et al.  Complex Genetic Models , 2008 .

[3]  J. Zidek,et al.  Adaptive Multivariate Ridge Regression , 1980 .

[4]  Hsun-Hsien Chang,et al.  Phenotype prediction by integrative network analysis of SNP and gene expression microarrays , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[5]  James Cockram,et al.  An Eight-Parent Multiparent Advanced Generation Inter-Cross Population for Winter-Sown Wheat: Creation, Properties, and Validation , 2014, G3: Genes, Genomes, Genetics.

[6]  Hua Xu,et al.  Genetic studies of complex human diseases: Characterizing SNP-disease associations using Bayesian networks , 2012, BMC Systems Biology.

[7]  Nir Friedman,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004, Science.

[8]  Steffen L. Lauritzen,et al.  Graphical Models for Genetic Analyses , 2003 .

[9]  M. Stephens A Unified Framework for Association Analysis with Multiple Related Phenotypes , 2013, PloS one.

[10]  Angelo Nuzzo,et al.  Phenotype forecasting with SNPs data through gene-based Bayesian networks , 2009, BMC Bioinformatics.

[11]  Constantin F. Aliferis,et al.  Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation , 2010, J. Mach. Learn. Res..

[12]  H. Hotelling New Light on the Correlation Coefficient and its Transforms , 1953 .

[13]  Sudipto Banerjee,et al.  Hierarchical spatial modeling of additive and dominance genetic variance for large spatial trial datasets. , 2009, Biometrics.

[14]  William J. Astle,et al.  Population Structure and Cryptic Relatedness in Genetic Association Studies , 2009, 1010.4681.

[15]  Philippe Leray,et al.  A hierarchical Bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies , 2011, BMC Bioinformatics.

[16]  D. Balding,et al.  Improving the efficiency of genomic selection , 2013, Statistical applications in genetics and molecular biology.

[17]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[18]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[19]  J. Ogutu,et al.  Efficient Computation of Ridge‐Regression Best Linear Unbiased Prediction in Genomic Selection in Plant Breeding , 2012 .

[20]  Marco Scutari,et al.  Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.

[21]  B. Yandell,et al.  Bayesian Quantitative Trait Loci Mapping for Multiple Traits , 2008, Genetics.

[22]  Judea Pearl,et al.  Chapter 2 – BAYESIAN INFERENCE , 1988 .

[23]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[24]  A. Bader miR-34 – a microRNA replacement therapy is headed to the clinic , 2012, Front. Gene..

[25]  P. O’Reilly,et al.  MultiPhen: Joint Model of Multiple Phenotypes Can Increase Discovery in GWAS , 2012, PloS one.

[26]  Roger P Wise,et al.  Towards systems genetic analyses in barley: Integration of phenotypic, expression and genotype data into GeneNetwork , 2008, BMC Genetics.

[27]  R. L. Quaas,et al.  Multiple Trait Evaluation Using Relatives' Records , 1976 .

[28]  Daniel Shriner,et al.  Moving toward System Genetics through Multiple Trait Analysis in Genome-Wide Association Studies , 2011, Front. Gene..

[29]  R. Fernando,et al.  The Impact of Genetic Relationship Information on Genome-Assisted Breeding Values , 2007, Genetics.

[30]  Paola Sebastiani,et al.  Genetic dissection and prognostic modeling of overt stroke in sickle cell anemia , 2005, Nature Genetics.

[31]  R. Singh,et al.  The adult plant rust resistance loci Lr34/Yr18 and Lr46/Yr29 are important determinants of partial resistance to powdery mildew in bread wheat line Saar , 2008, Theoretical and Applied Genetics.

[32]  A. Börner,et al.  Optimizing wheat grain yield: effects of Rht (gibberellin-insensitive) dwarfing genes , 1997, The Journal of Agricultural Science.

[33]  David Huard,et al.  PyMC: Bayesian Stochastic Modelling in Python. , 2010, Journal of statistical software.

[34]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[35]  J. Snape,et al.  Waiting for fine times: genetics of flowering time in wheat , 2001, Euphytica.

[36]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[37]  H. Piepho Ridge Regression and Extensions for Genomewide Selection in Maize , 2009 .

[38]  Nanny Wermuth,et al.  Multivariate Dependencies: Models, Analysis and Interpretation , 1996 .

[39]  Morten Lillemo,et al.  Characterization of polyploid wheat genomic diversity using a high-density 90 000 single nucleotide polymorphism array , 2014, Plant biotechnology journal.

[40]  Olivier Pourret,et al.  Bayesian networks : a practical guide to applications , 2008 .

[41]  W. Spielmeyer,et al.  Powdery mildew resistance and Lr34/Yr18 genes for durable resistance to leaf and stripe rust cosegregate at a locus on the short arm of chromosome 7D of wheat , 2005, Theoretical and Applied Genetics.

[42]  J. Hooper THE SAMPLING VARIANCE OF CORRELATION COEFFICIENTS UNDER ASSUMPTIONS OF FIXED AND MIXED VARIATES , 1958 .

[43]  Andrew O. Finley,et al.  Hierarchical Spatial Process Models for Multiple Traits in Large Genetic Trials , 2010, Journal of the American Statistical Association.

[44]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[45]  Doug Speed,et al.  Improved heritability estimation from genome-wide SNPs. , 2012, American journal of human genetics.

[46]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[47]  F van den Bosch,et al.  Disease-weather relationships for powdery mildew and yellow rust on winter wheat. , 2008, Phytopathology.

[48]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[49]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[50]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[51]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[52]  Paola Sebastiani,et al.  Bayesian Methods for Multivariate Modeling of Pleiotropic SNP Associations and Genetic Risk Prediction , 2012, Front. Gene..

[53]  Chris-Carolin Schön,et al.  synbreed: a framework for the analysis of genomic prediction data using R , 2012, Bioinform..

[54]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[55]  D Gianola,et al.  An assessment of linkage disequilibrium in Holstein cattle using a Bayesian network. , 2012, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[56]  F. A. van Eeuwijk,et al.  Multi-trait and multi-environment QTL analyses of yield and a set of physiological traits in pepper , 2013, Theoretical and Applied Genetics.

[57]  Keith Shockley,et al.  Structural Model Analysis of Multiple Quantitative Traits , 2006, PLoS genetics.

[58]  H. Grüneberg,et al.  Introduction to quantitative genetics , 1960 .

[59]  A. Steed,et al.  Semi-dwarfing Rht-B1 and Rht-D1 loci of wheat differ significantly in their influence on resistance to Fusarium head blight , 2009, Theoretical and Applied Genetics.

[60]  M. Goddard,et al.  Prediction of total genetic value using genome-wide dense marker maps. , 2001, Genetics.

[61]  Laurent Falquet,et al.  The Mycoplasma conjunctivae genome sequencing, annotation and analysis , 2009, BMC Bioinformatics.