A Bayesian Partitioning Model for the Detection of Multilocus Effects in Case-Control Studies

Background: Genome-wide association studies (GWASs) have identified hundreds of genetic variants associated with complex diseases, but these variants appear to explain very little of the disease heritability. The typical single-locus association analysis in a GWAS fails to detect variants with small effect sizes and to capture higher-order interaction among these variants. Multilocus association analysis provides a powerful alternative by jointly modeling the variants within a gene or a pathway and by reducing the burden of multiple hypothesis testing in a GWAS. Methods: Here, we propose a powerful and flexible dimension reduction approach to model multilocus association. We use a Bayesian partitioning model which clusters SNPs according to their direction of association, models higher-order interactions using a flexible scoring scheme and uses posterior marginal probabilities to detect association between the SNP set and the disease. Results: We illustrate our method using extensive simulation studies and applying it to detect multilocus interaction in Atherosclerosis Risk in Communities (ARIC) GWAS with type 2 diabetes. Conclusion: We demonstrate that our approach has better power to detect multilocus interactions than several existing approaches. When applied to the ARIC study dataset with 9,328 individuals to study gene-based associations for type 2 diabetes, our method identified some novel variants not detected by conventional single-locus association analyses.

[1]  Wei Pan,et al.  A Dimension Reduction Approach for Modeling Multi-Locus Interaction in Case-Control Studies , 2011, Human Heredity.

[2]  Anbupalam Thalamuthu,et al.  Association tests using kernel‐based measures of multi‐locus genotype similarity between individuals , 2009, Genetic epidemiology.

[3]  David J. Lunn,et al.  A Bayesian toolkit for genetic association studies , 2006, Genetic epidemiology.

[4]  F. Morón,et al.  A method for detecting epistasis in genome-wide studies using case-control multi-locus association analysis , 2008, BMC Genomics.

[5]  Jun S. Liu,et al.  Bayesian inference of epistatic interactions in case-control studies , 2007, Nature Genetics.

[6]  Guifang Fu,et al.  The Bayesian lasso for genome-wide association studies , 2011, Bioinform..

[7]  Scott A. Sisson,et al.  Transdimensional Markov Chains , 2005 .

[8]  Lin S. Chen,et al.  Insights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data. , 2010, American journal of human genetics.

[9]  Philip Heidelberger,et al.  Simulation Run Length Control in the Presence of an Initial Transient , 1983, Oper. Res..

[10]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[11]  Jon Wakefield,et al.  Bayes factors for genome‐wide association studies: comparison with P‐values , 2009, Genetic epidemiology.

[12]  P. Visscher,et al.  A versatile gene-based test for genome-wide association studies. , 2010, American journal of human genetics.

[13]  D. Schaid,et al.  A Kernel Regression Approach to Gene‐Gene Interaction Detection for Case‐Control Studies , 2013, Genetic epidemiology.

[14]  Tanya M. Teslovich,et al.  Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes , 2012, Nature Genetics.

[15]  Andrew G. Clark,et al.  Gene-Based Testing of Interactions in Association Studies of Quantitative Traits , 2013, PLoS genetics.

[16]  M. Stephens,et al.  Bayesian variable selection regression for genome-wide association studies and other large-scale problems , 2011, 1110.6019.

[17]  R. Chapman,et al.  Digenic inheritance of mutations in HAMP and HFE results in different types of haemochromatosis. , 2003, Human molecular genetics.

[18]  A. Folsom,et al.  The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators. , 1989, American journal of epidemiology.

[19]  Ayellet V. Segrè,et al.  Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis , 2010, Nature Genetics.

[20]  Jukka Corander,et al.  Efficient Bayesian approach for multilocus association mapping including gene-gene interactions , 2010, BMC Bioinformatics.

[21]  Jon Wakefield,et al.  Bayesian mixture modeling of gene‐environment and gene‐gene interactions , 2009, Genetic epidemiology.

[22]  Marylyn D. Ritchie,et al.  Biofilter: A Knowledge-Integration System for the Multi-Locus Analysis of Genome-Wide Association Studies , 2008, Pacific Symposium on Biocomputing.

[23]  Mee Young Park,et al.  Penalized logistic regression for detecting gene interactions. , 2008, Biostatistics.

[24]  K. Lunetta,et al.  Screening large-scale association study data: exploiting interactions using random forests , 2004, BMC Genetics.

[25]  M. Stephens,et al.  Imputation-Based Analysis of Association Studies: Candidate Regions and Quantitative Traits , 2007, PLoS genetics.

[26]  Jing Zhang,et al.  BLOCK-BASED BAYESIAN EPISTASIS ASSOCIATION MAPPING WITH APPLICATION TO WTCCC TYPE 1 DIABETES DATA. , 2011, The annals of applied statistics.

[27]  Yu Zhang,et al.  A novel bayesian graphical model for genome‐wide multi‐SNP association mapping , 2012, Genetic epidemiology.

[28]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[29]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[30]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[31]  Ingrid B. Borecki,et al.  Pacific Symposium on Biocomputing 13:190-200(2008) GATHERING THE GOLD DUST: METHODS FOR ASSESSING THE AGGREGATE IMPACT OF SMALL EFFECT GENES IN GENOMIC SCANS ∗ , 2022 .

[32]  E A Thompson,et al.  A likelihood-based trait-model-free approach for linkage detection of binary trait. , 2010, Biometrics.

[33]  Wei Pan,et al.  A Unified Framework for Detecting Genetic Association with Multiple SNPs in a Candidate Gene or Region: Contrasting Genotype Scores and LD Patterns between Cases and Controls , 2009, Human Heredity.

[34]  Sylvia Richardson,et al.  Exploring Data From Genetic Association Studies Using Bayesian Variable Selection and the Dirichlet Process: Application to Searching for Gene × Gene Patterns , 2012, Genetic epidemiology.

[35]  D. Conti,et al.  SNPs, haplotypes, and model selection in a candidate gene region: The SIMPle analysis for multilocus data , 2004, Genetic epidemiology.

[36]  B. Fridley Bayesian variable and model selection methods for genetic association studies , 2009, Genetic epidemiology.

[37]  B. McKinney,et al.  Capturing the Spectrum of Interaction Effects in Genetic Association Studies by Simulated Evaporative Cooling Network Analysis , 2009, PLoS genetics.

[38]  Deanne M. Taylor,et al.  Powerful SNP-set analysis for case-control genome-wide association studies. , 2010, American journal of human genetics.