Testing for gene–environment interaction under exposure misspecification

Complex interplay between genetic and environmental factors characterizes the etiology of many diseases. Modeling gene-environment (GxE) interactions is often challenged by the unknown functional form of the environment term in the true data-generating mechanism. We study the impact of misspecification of the environmental exposure effect on inference for the GxE interaction term in linear and logistic regression models. We first examine the asymptotic bias of the GxE interaction regression coefficient, allowing for confounders as well as arbitrary misspecification of the exposure and confounder effects. For linear regression, we show that under gene-environment independence and some confounder-dependent conditions, when the environment effect is misspecified, the regression coefficient of the GxE interaction can be unbiased. However, inference on the GxE interaction is still often incorrect. In logistic regression, we show that the regression coefficient is generally biased if the genetic factor is associated with the outcome directly or indirectly. Further, we show that the standard robust sandwich variance estimator for the GxE interaction does not perform well in practical GxE studies, and we provide an alternative testing procedure that has better finite sample properties.

[1]  D. Thomas,et al.  Gene–environment-wide association studies: emerging approaches , 2010, Nature Reviews Genetics.

[2]  Peter Kraft,et al.  Exploiting Gene-Environment Interaction to Detect Genetic Associations , 2007, Human Heredity.

[3]  Xihong Lin,et al.  Test for interactions between a genetic marker set and environment in generalized linear models. , 2013, Biostatistics.

[4]  R. Carroll,et al.  A Note on the Efficiency of Sandwich Covariance Matrix Estimation , 2001 .

[5]  Thomas Lumley,et al.  Behavior of QQ-Plots and Genomic Control in Studies of Gene-Environment Interaction , 2011, PloS one.

[6]  S W Lagakos,et al.  Effects of mismodelling and mismeasuring explanatory variables on tests of their association with a response variable. , 1988, Statistics in medicine.

[7]  Rosalind J Wright,et al.  Association between birth weight and DNA methylation of IGF2, glucocorticoid receptor and repetitive elements LINE-1 and Alu. , 2013, Epigenomics.

[8]  Peter Kraft,et al.  Gene-environment interactions in genome-wide association studies: a comparative study of tests applied to empirical studies of type 2 diabetes. , 2012, American journal of epidemiology.

[9]  Debashis Ghosh,et al.  Correcting systematic inflation in genetic association tests that consider interaction effects: application to a genome-wide association study of posttraumatic stress disorder. , 2014, JAMA psychiatry.

[10]  Stephen W. Lagakos,et al.  Effects of Mismodeling on Tests of Association Based on Logistic Regression Models , 1992 .

[11]  Yun Li,et al.  METAL: fast and efficient meta-analysis of genomewide association scans , 2010, Bioinform..

[12]  Joel Schwartz,et al.  Associations of Early Childhood Manganese and Lead Coexposure with Neurodevelopment , 2011, Environmental health perspectives.

[13]  Eric J Tchetgen Tchetgen,et al.  On the robustness of tests of genetic associations incorporating gene-environment interaction when the environmental exposure is misspecified. , 2011, Epidemiology.

[14]  M. Gail,et al.  Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates , 1984 .

[15]  Peter Kraft,et al.  Gene‐Environment Interactions in Cancer Epidemiology: A National Cancer Institute Think Tank Report , 2013, Genetic epidemiology.

[16]  Greg Gibson,et al.  Rare and common variants: twenty arguments , 2012, Nature Reviews Genetics.

[17]  Peter Kraft,et al.  Challenges and opportunities in genome-wide environmental interaction (GWEI) studies , 2012, Human Genetics.

[18]  Michael Rosenblum,et al.  Using Regression Models to Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models , 2009, Biometrics.

[19]  James M. Robins,et al.  Multiply Robust Inference for Statistical Interactions , 2008, Journal of the American Statistical Association.

[20]  D. Christiani,et al.  A prospective cohort study of the association between drinking water arsenic exposure and self-reported maternal health symptoms during pregnancy in Bangladesh , 2014, Environmental Health.

[21]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[22]  Arnab Maity,et al.  Testing in semiparametric models with interaction, with applications to gene-environment interactions. , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.