Simultaneous Discovery, Estimation and Prediction Analysis of Complex Traits Using a Bayesian Mixture Model

Gene discovery, estimation of heritability captured by SNP arrays, inference on genetic architecture and prediction analyses of complex traits are usually performed using different statistical models and methods, leading to inefficiency and loss of power. Here we use a Bayesian mixture model that simultaneously allows variant discovery, estimation of genetic variance explained by all variants and prediction of unobserved phenotypes in new samples. We apply the method to simulated data of quantitative traits and Welcome Trust Case Control Consortium (WTCCC) data on disease and show that it provides accurate estimates of SNP-based heritability, produces unbiased estimators of risk in new samples, and that it can estimate genetic architecture by partitioning variation across hundreds to thousands of SNPs. We estimated that, depending on the trait, 2,633 to 9,411 SNPs explain all of the SNP-based heritability in the WTCCC diseases. The majority of those SNPs (>96%) had small effects, confirming a substantial polygenic component to common diseases. The proportion of the SNP-based variance explained by large effects (each SNP explaining 1% of the variance) varied markedly between diseases, ranging from almost zero for bipolar disorder to 72% for type 1 diabetes. Prediction analyses demonstrate that for diseases with major loci, such as type 1 diabetes and rheumatoid arthritis, Bayesian methods outperform profile scoring or mixed model approaches.

[1]  E. Dempster,et al.  Heritability of Threshold Characters. , 1950, Genetics.

[2]  D. Falconer,et al.  Introduction to Quantitative Genetics. , 1962 .

[3]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[4]  Andrew H. Paterson,et al.  Molecular Dissection of Complex Traits , 1997 .

[5]  M. Goddard,et al.  Prediction of total genetic value using genome-wide dense marker maps. , 2001, Genetics.

[6]  S. Richardson,et al.  Variable selection and Bayesian model averaging in case‐control studies , 2001, Statistics in medicine.

[7]  M. Sillanpää,et al.  Model choice in gene mapping: what and why. , 2002, Trends in genetics : TIG.

[8]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[9]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[10]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[11]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[12]  M. Goddard Genomic selection: prediction of accuracy and maximisation of long term response , 2009, Genetica.

[13]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[14]  C. Hoggart,et al.  Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies , 2008, PLoS genetics.

[15]  Peter M Visscher,et al.  Prediction of individual genetic risk of complex disease. , 2008, Current opinion in genetics & development.

[16]  P. Visscher,et al.  Using the realized relationship matrix to disentangle confounding factors for the estimation of genetic variance components of complex traits , 2010, Genetics Selection Evolution.

[17]  Joseph T. Glessner,et al.  From Disease Association to Risk Assessment: An Optimistic View from Genome-Wide Association Studies on Type 1 Diabetes , 2009, PLoS genetics.

[18]  P. Visscher,et al.  Common polygenic variation contributes to risk of schizophrenia and bipolar disorder , 2009, Nature.

[19]  Peter M Visscher,et al.  Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk. , 2009, Human molecular genetics.

[20]  Naomi R. Wray,et al.  Estimating Effects and Making Predictions from Genome-Wide Marker Data , 2010, 1010.4710.

[21]  Theo H. E. Meuwissen,et al.  Genomic selection and complex trait prediction using a fast EM algorithm applied to genome-wide markers , 2010, BMC Bioinformatics.

[22]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[23]  M. Goddard,et al.  Accurate Prediction of Genetic Values for Complex Traits by Whole-Genome Resequencing , 2010, Genetics.

[24]  Rohan L. Fernando,et al.  Extension of the bayesian alphabet for genomic selection , 2011, BMC Bioinformatics.

[25]  P. Visscher,et al.  The Genetic Interpretation of Area under the ROC Curve in Genomic Profiling , 2010, PLoS genetics.

[26]  Valerie Obenchain,et al.  Risk prediction using genome‐wide association studies , 2010, Genetic epidemiology.

[27]  Daniel Gianola,et al.  Predicting genetic predisposition in humans: the promise of whole-genome markers , 2010, Nature Reviews Genetics.

[28]  W. G. Hill,et al.  Genome partitioning of genetic variation for complex traits using common SNPs , 2011, Nature Genetics.

[29]  M. Stephens,et al.  Bayesian variable selection regression for genome-wide association studies and other large-scale problems , 2011, 1110.6019.

[30]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[31]  Fuli Yu,et al.  Characterizing linkage disequilibrium and evaluating imputation power of human genomic insertion-deletion polymorphisms , 2012, Genome Biology.

[32]  P. Visscher,et al.  Estimating missing heritability for disease from genome-wide association studies. , 2011, American journal of human genetics.

[33]  Peter Kraft,et al.  Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis , 2012, Nature Genetics.

[34]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[35]  Aki Vehtari,et al.  Bayesian Variable Selection in Searching for Additive and Dominant Effects in Genome-Wide Data , 2012, PloS one.

[36]  M Erbe,et al.  Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. , 2012, Journal of dairy science.

[37]  Guosheng Su,et al.  Genome position specific priors for genomic prediction , 2012, BMC Genomics.

[38]  Doug Speed,et al.  Improved heritability estimation from genome-wide SNPs. , 2012, American journal of human genetics.

[39]  Stephan Ripke,et al.  Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs , 2012, Nature Genetics.

[40]  Qianqian Zhu,et al.  Leveraging Prior Information to Detect Causal Variants via Multi-Variant Regression , 2013, PLoS Comput. Biol..

[41]  Xiang Zhou,et al.  Polygenic Modeling with Bayesian Sparse Linear Mixed Models , 2012, PLoS genetics.

[42]  M. Sillanpää,et al.  Fast Genomic Predictions via Bayesian G-BLUP and Multilocus Models of Threshold Traits Including Censored Gaussian Data , 2013, G3: Genes, Genomes, Genetics.

[43]  Justin Zobel,et al.  Performance and Robustness of Penalized and Unpenalized Methods for Genetic Prediction of Complex Human Disease , 2013, Genetic epidemiology.

[44]  Naomi R. Wray,et al.  Estimation and partitioning of polygenic variation captured by common SNPs for Alzheimer's disease, multiple sclerosis and endometriosis , 2012, Human molecular genetics.

[45]  D. Gianola Priors in Whole-Genome Regression: The Bayesian Alphabet Returns , 2013, Genetics.

[46]  Nilanjan Chatterjee,et al.  Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies , 2013, Nature Genetics.

[47]  R. Fernando,et al.  Prediction of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor , 2013, PLoS genetics.

[48]  F. Dudbridge Power and Predictive Accuracy of Polygenic Risk Scores , 2013, PLoS genetics.

[49]  张静,et al.  Banana Ovate family protein MaOFP1 and MADS-box protein MuMADS1 antagonistically regulated banana fruit ripening , 2015 .