Mixture SNPs effect on phenotype in genome-wide association studies

BackgroundRecently mixed linear models are used to address the issue of “missing" heritability in traditional Genome-wide association studies (GWAS). The models assume that all single-nucleotide polymorphisms (SNPs) are associated with the phenotypes of interest. However, it is more common that only a small proportion of SNPs have significant effects on the phenotypes, while most SNPs have no or very small effects. To incorporate this feature, we propose an efficient Hierarchical Bayesian Model (HBM) that extends the existing mixed models to enforce automatic selection of significant SNPs. The HBM models the SNP effects using a mixture distribution of a point mass at zero and a normal distribution, where the point mass corresponds to those non-associative SNPs.ResultsWe estimate the HBM using Gibbs sampling. The estimation performance of our method is first demonstrated through two simulation studies. We make the simulation setups realistic by using parameters fitted on the Framingham Heart Study (FHS) data. The simulation studies show that our method can accurately estimate the proportion of SNPs associated with the simulated phenotype and identify these SNPs, as well as adapt to certain model mis-specification than the standard mixed models. In addition, we analyze data from the FHS and the Health and Retirement Study (HRS) to study the association between Body Mass Index (BMI) and SNPs on Chromosome 16, and replicate the identified genetic associations. The analysis of the FHS data identifies 0.3% SNPs on Chromosome 16 that affect BMI, including rs9939609 and rs9939973 on the FTO gene. These two SNPs are in strong linkage disequilibrium with rs1558902 (Rsq =0.901 for rs9939609 and Rsq =0.905 for rs9939973), which has been reported to be linked with obesity in previous GWAS. We then replicate the findings using the HRS data: the analysis finds 0.4% of SNPs associated with BMI on Chromosome 16. Furthermore, around 25% of the genes that are identified to be associated with BMI are common between the two studies.ConclusionsThe results demonstrate that the HBM and the associated estimation algorithm offer a powerful tool for identifying significant genetic associations with phenotypes of interest, among a large number of SNPs that are common in modern genetics studies.

[1]  Benjamin A. Logsdon,et al.  A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis , 2010, BMC Bioinformatics.

[2]  Sang Hong Lee,et al.  Predicting Unobserved Phenotypes for Complex Traits from Whole-Genome SNP Data , 2008, PLoS genetics.

[3]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[4]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[5]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[6]  M. Stephens,et al.  Bayesian variable selection regression for genome-wide association studies and other large-scale problems , 2011, 1110.6019.

[7]  Xiang Zhou,et al.  Polygenic Modeling with Bayesian Sparse Linear Mixed Models , 2012, PLoS genetics.

[8]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[9]  Jong-Young Lee,et al.  Effects of common FTO gene variants associated with BMI on dietary intake and physical activity in Koreans. , 2010, Clinica chimica acta; international journal of clinical chemistry.

[10]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[11]  H. Snieder,et al.  FTO variant rs9939609 is associated with body mass index and waist circumference, but not with energy intake or physical activity in European- and African-American youth , 2010, BMC Medical Genetics.

[12]  Yusuke Nakamura,et al.  Variations in the FTO gene are associated with severe obesity in the Japanese , 2008, Journal of Human Genetics.

[13]  Rachel R. Stoiko The Health and Retirement Study , 2014, International journal of aging & human development.

[14]  T. Wong,et al.  FTO Variants Are Associated With Obesity in the Chinese and Malay Populations in Singapore , 2008, Diabetes.

[15]  Ling Wang,et al.  Least squares sieve estimation of mixture distributions with boundary effects , 2015 .

[16]  Jianqing Fan,et al.  Journal of the American Statistical Association Estimating False Discovery Proportion under Arbitrary Covariance Dependence Estimating False Discovery Proportion under Arbitrary Covariance Dependence , 2022 .

[17]  J. Geweke,et al.  Variable selection and model comparison in regression , 1994 .

[18]  Li-sheng Liu,et al.  The common rs9939609 variant of the fat mass and obesity-associated gene is associated with obesity risk in children and adolescents of Beijing, China , 2010, BMC Medical Genetics.

[19]  Tanya M. Teslovich,et al.  Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index , 2010 .

[20]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[21]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[22]  Peter M Visscher,et al.  Sizing up human height variation , 2008, Nature Genetics.

[23]  Marina Vannucci,et al.  Gene selection: a Bayesian variable selection approach , 2003, Bioinform..

[24]  V. Mohan,et al.  Genetic variations in the FTO gene are associated with type 2 diabetes and obesity in south Indians (CURES-79). , 2011, Diabetes technology & therapeutics.

[25]  W. G. Hill,et al.  Genome partitioning of genetic variation for complex traits using common SNPs , 2011, Nature Genetics.