A Flexible Bayesian Model for Studying Gene–Environment Interaction

An important follow-up step after genetic markers are found to be associated with a disease outcome is a more detailed analysis investigating how the implicated gene or chromosomal region and an established environment risk factor interact to influence the disease risk. The standard approach to this study of gene–environment interaction considers one genetic marker at a time and therefore could misrepresent and underestimate the genetic contribution to the joint effect when one or more functional loci, some of which might not be genotyped, exist in the region and interact with the environment risk factor in a complex way. We develop a more global approach based on a Bayesian model that uses a latent genetic profile variable to capture all of the genetic variation in the entire targeted region and allows the environment effect to vary across different genetic profile categories. We also propose a resampling-based test derived from the developed Bayesian model for the detection of gene–environment interaction. Using data collected in the Environment and Genetics in Lung Cancer Etiology (EAGLE) study, we apply the Bayesian model to evaluate the joint effect of smoking intensity and genetic variants in the 15q25.1 region, which contains a cluster of nicotinic acetylcholine receptor genes and has been shown to be associated with both lung cancer and smoking behavior. We find evidence for gene–environment interaction (P-value = 0.016), with the smoking effect appearing to be stronger in subjects with a genetic profile associated with a higher lung cancer risk; the conventional test of gene–environment interaction based on the single-marker approach is far from significant.

[1]  E. Moltchanova,et al.  Potts model for haplotype associations , 2005, BMC Genetics.

[2]  Ying Wang,et al.  A genome-wide association study of lung cancer identifies a region of chromosome 5p15 associated with risk for adenocarcinoma. , 2009, American journal of human genetics.

[3]  S. RichardsonINSERM,et al.  Bayesian analysis of case-control studies with categorical covariates , 2001 .

[4]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[5]  Robert N Hoover,et al.  Methods for etiologic and early marker investigations in the PLCO trial. , 2005, Mutation research.

[6]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[7]  R. Carroll,et al.  Advanced Markov Chain Monte Carlo Methods: Learning from Past Samples , 2010 .

[8]  Ming D. Li,et al.  Genome-wide meta-analyses identify multiple loci associated with smoking behavior , 2010, Nature Genetics.

[9]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[10]  Peter Muller,et al.  Alternatives to the Gibbs Sampling Scheme , 1992 .

[11]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[12]  Mark C. Reed,et al.  Advanced Markov Chain Monte Carlo Methods for Iterative (Turbo) Multiuser Detection , 2006 .

[13]  Daniel F. Gudbjartsson,et al.  A variant associated with nicotine dependence, lung cancer and peripheral arterial disease , 2008, Nature.

[14]  A. Staicu,et al.  On the equivalence of prospective and retrospective likelihood methods in case-control studies , 2010 .

[15]  Alan M. Frieze,et al.  Torpid mixing of some Monte Carlo Markov chain algorithms in statistical physics , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[16]  Sylvia Richardson,et al.  Bayesian profile regression with an application to the National Survey of Children's Health. , 2010, Biostatistics.

[17]  R. Jiang,et al.  Epistatic Module Detection for Case-Control Studies: A Bayesian Model with a Gibbs Sampling Strategy , 2009, PLoS genetics.

[18]  C. Gieger,et al.  Sequence variants at CHRNB 3 – CHRNA 6 and CYP 2 A 6 affect smoking behavior , 2010 .

[19]  G. Mills,et al.  Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1 , 2008, Nature Genetics.

[20]  Jason H. Moore,et al.  BIOINFORMATICS REVIEW , 2005 .

[21]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[22]  N. Chatterjee,et al.  Powerful multilocus tests of genetic association in the presence of gene-gene and gene-environment interactions. , 2006, American journal of human genetics.

[23]  Y. Ogata,et al.  Likelihood Analysis of Spatial Point Patterns , 1984 .

[24]  William Wheeler,et al.  Genome-Wide and Candidate Gene Association Study of Cigarette Smoking Behaviors , 2009, PloS one.

[25]  P. Green,et al.  Hidden Markov Models and Disease Mapping , 2002 .

[26]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[27]  Deborah A Costain,et al.  Bayesian Partitioning for Modeling and Mapping Spatial Case–Control Data , 2009, Biometrics.

[28]  Adrian E. Raftery,et al.  Model Selection for Generalized Linear Models via GLIB, with Application to Epidemiology , 1993 .

[29]  Peter Kraft,et al.  Characterizing Associations and SNP-Environment Interactions for GWAS-Identified Prostate Cancer Risk Markers—Results from BPC3 , 2011, PloS one.

[30]  William Wheeler,et al.  A multi-stage genome-wide association study of bladder cancer identifies multiple susceptibility loci , 2010, Nature Genetics.

[31]  Sylvia Richardson,et al.  Equivalence of prospective and retrospective models in the Bayesian analysis of case-control studies , 2004 .

[32]  F. Liang CLUSTERING GENE EXPRESSION PROFILES USING MIXTURE MODEL ENSEMBLE AVERAGING APPROACH , 2008 .

[33]  P. Rosenberg,et al.  Pathway analysis by adaptive combination of P‐values , 2009, Genetic epidemiology.

[34]  Tim Hesterberg,et al.  Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.

[35]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[36]  Inês Barroso,et al.  Meta-analysis and imputation refines the association of 15q25 with smoking quantity , 2010, Nature Genetics.

[37]  C. Gieger,et al.  Sequence variants at CHRNB3–CHRNA6 and CYP2A6 affect smoking behavior , 2010, Nature Genetics.

[38]  William Wheeler,et al.  Multiple Independent Loci at Chromosome 15q25.1 Affect Smoking Quantity: a Meta-Analysis and Comparison with Lung Cancer and COPD , 2010, PLoS genetics.

[39]  R. B. Potts Some generalized order-disorder transformations , 1952, Mathematical Proceedings of the Cambridge Philosophical Society.

[40]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[41]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[42]  Duncan C Thomas,et al.  Bayesian Spatial Modeling of Haplotype Associations , 2003, Human Heredity.

[43]  Christopher I Amos,et al.  The CHRNA5-A3 region on chromosome 15q24-25.1 is a risk factor both for nicotine dependence and for lung cancer. , 2008, Journal of the National Cancer Institute.

[44]  M. García-Closas,et al.  Misclassification in case-control studies of gene-environment interactions: assessment of bias and sample size. , 1999, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.