Set-Based Tests for the Gene–Environment Interaction in Longitudinal Studies

ABSTRACT We propose a generalized score type test for set-based inference for the gene–environment interaction with longitudinally measured quantitative traits. The test is robust to misspecification of within subject correlation structure and has enhanced power compared to existing alternatives. Unlike tests for marginal genetic association, set-based tests for the gene–environment interaction face the challenges of a potentially misspecified and high-dimensional main effect model under the null hypothesis. We show that our proposed test is robust to main effect misspecification of environmental exposure and genetic factors under the gene–environment independence condition. When genetic and environmental factors are dependent, the method of sieves is further proposed to eliminate potential bias due to a misspecified main effect of a continuous environmental exposure. A weighted principal component analysis approach is developed to perform dimension reduction when the number of genetic variants in the set is large relative to the sample size. The methods are motivated by an example from the Multi-Ethnic Study of Atherosclerosis (MESA), investigating interaction between measures of neighborhood environment and genetic regions on longitudinal measures of blood pressure over a study period of about seven years with four exams. Supplementary materials for this article are available online.

[1]  Wei Pan,et al.  Comparison of statistical tests for disease association with rare variants , 2011, Genetic epidemiology.

[2]  Colin L. Mallows,et al.  Some Comments on Cp , 2000, Technometrics.

[3]  James M. Robins,et al.  Multiply Robust Inference for Statistical Interactions , 2008, Journal of the American Statistical Association.

[4]  T. VanderWeele,et al.  Environmental confounding in gene-environment interaction studies. , 2013, American journal of epidemiology.

[5]  Tyler J. VanderWeele,et al.  On the definition of a confounder , 2013, Annals of statistics.

[6]  S. Harrap,et al.  Antihypertensive Treatments Obscure Familial Contributions to Blood Pressure Variation , 2003, Hypertension.

[7]  Josée Dupuis,et al.  Incorporating Gene-Environment Interaction in Testing for Association with Rare Genetic Variants , 2014, Human Heredity.

[8]  Ying Wang,et al.  Genomewide association study of leprosy. , 2009, The New England journal of medicine.

[9]  Seunggeun Lee,et al.  Test for rare variants by environment interactions in sequencing association studies , 2016, Biometrics.

[10]  Andriy Derkach,et al.  Pooled Association Tests for Rare Genetic Variants: A Review and Some New Results , 2012 .

[11]  S. Adar,et al.  Neighborhood Environments and Incident Hypertension in the Multi-Ethnic Study of Atherosclerosis. , 2016, American journal of epidemiology.

[12]  Fei Zou,et al.  Control of Population Stratification by Correlation‐Selected Principal Components , 2011, Biometrics.

[13]  Amy H Auchincloss,et al.  Longitudinal Associations Between Neighborhood Physical and Social Environments and Incident Type 2 Diabetes Mellitus: The Multi-Ethnic Study of Atherosclerosis (MESA). , 2015, JAMA internal medicine.

[14]  J. Sallis,et al.  Role of Built Environments in Physical Activity, Obesity, and Cardiovascular Disease , 2012, Circulation.

[15]  Tom R. Gaunt,et al.  Genetic Variants in Novel Pathways Influence Blood Pressure and Cardiovascular Disease Risk , 2011, Nature.

[16]  Michael Boehnke,et al.  LocusZoom: regional visualization of genome-wide association scan results , 2010, Bioinform..

[17]  Christian Gieger,et al.  Genome-wide association study of PR interval , 2010, Nature Genetics.

[18]  J. Naylor,et al.  Mendelian inheritance in man: A catalog of human genes and genetic disorders , 1996 .

[19]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[20]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[21]  Deanne M. Taylor,et al.  Powerful SNP-set analysis for case-control genome-wide association studies. , 2010, American journal of human genetics.

[22]  Eric J Tchetgen Tchetgen,et al.  On the robustness of tests of genetic associations incorporating gene-environment interaction when the environmental exposure is misspecified. , 2011, Epidemiology.

[23]  Thomas Lumley,et al.  Behavior of QQ-Plots and Genomic Control in Studies of Gene-Environment Interaction , 2011, PloS one.

[24]  Jaeil Ahn,et al.  Testing gene-environment interaction in large-scale case-control association studies: possible choices and comparisons. , 2012, American journal of epidemiology.

[25]  Jingyuan Fu,et al.  Common variants in 22 loci are associated with QRS duration and cardiac ventricular conduction , 2010, Nature Genetics.

[26]  Jaeil Ahn,et al.  Tests for Gene-Environment Interactions and Joint Effects With Exposure Misclassification. , 2016, American journal of epidemiology.

[27]  Donald Hedeker,et al.  Longitudinal Data Analysis , 2006 .

[28]  W. Newey,et al.  Convergence rates and asymptotic normality for series estimators , 1997 .

[29]  Ana V Diez Roux,et al.  Associations of the local food environment with diet quality--a comparison of assessments based on surveys and geographic information systems: the multi-ethnic study of atherosclerosis. , 2008, American journal of epidemiology.

[30]  Anne-Laure Boulesteix,et al.  Partial least squares: a versatile tool for the analysis of high-dimensional genomic data , 2006, Briefings Bioinform..

[31]  Sara van de Geer,et al.  Testing against a high dimensional alternative , 2006 .

[32]  D. Thomas,et al.  Gene–environment-wide association studies: emerging approaches , 2010, Nature Reviews Genetics.

[33]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[34]  Eden R Martin,et al.  A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms , 2008, Genetic epidemiology.

[35]  Christian Gieger,et al.  Genetic Variants in Novel Pathways Influence Blood Pressure and Cardiovascular Disease Risk , 2011, Nature.

[36]  Wei Pan,et al.  Asymptotic tests of association with multiple SNPs in linkage disequilibrium , 2009, Genetic epidemiology.

[37]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[38]  Lan Wang,et al.  GEE analysis of clustered binary data with diverging number of covariates , 2011, 1103.1795.

[39]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[40]  R. Ewing,et al.  The built environment and obesity. , 2007, Epidemiologic reviews.

[41]  Jung-Ying Tzeng,et al.  Studying gene and gene-environment effects of uncommon and common variants on continuous traits: a marker-set approach using gene-trait similarity regression. , 2011, American journal of human genetics.

[42]  Hanyu Ni,et al.  Neighborhood Characteristics and Hypertension , 2008, Epidemiology.

[43]  Stephen R. Williams,et al.  A Fast Multiple‐Kernel Method With Applications to Detect Gene‐Environment Interaction , 2015, Genetic epidemiology.

[44]  Bhramar Mukherjee,et al.  Set‐based tests for genetic association in longitudinal studies , 2015, Biometrics.

[45]  Xihong Lin,et al.  Test for interactions between a genetic marker set and environment in generalized linear models. , 2013, Biostatistics.

[46]  Yuehua Cui,et al.  Gene-centric gene–gene interaction: A model-based kernel machine method , 2012, 1209.6502.

[47]  Peter Kraft,et al.  Gene-environment interactions in genome-wide association studies: a comparative study of tests applied to empirical studies of type 2 diabetes. , 2012, American journal of epidemiology.

[48]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[49]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[50]  R. Kronmal,et al.  Multi-Ethnic Study of Atherosclerosis: objectives and design. , 2002, American journal of epidemiology.

[51]  C. L. Mallows Some Comments onCp , 1973 .

[52]  Xiaowei Zhan,et al.  Modeling and testing for joint association using a genetic random field model. , 2014, Biometrics.