Two-stage testing procedures with independent filtering for genome-wide gene-environment interaction.

Several two-stage multiple testing procedures have been proposed to detect gene-environment interaction in genome-wide association studies. In this article, we elucidate general conditions that are required for validity and power of these procedures, and we propose extensions of two-stage procedures using the case-only estimator of gene-treatment interaction in randomized clinical trials. We develop a unified estimating equation approach to proving asymptotic independence between a filtering statistic and an interaction test statistic in a range of situations, including marginal association and interaction in a generalized linear model with a canonical link. We assess the performance of various two-stage procedures in simulations and in genetic studies from Women's Health Initiative clinical trials.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Bhramar Mukherjee,et al.  Exploiting Gene-Environment Independence for Analysis of Case-Control Studies: An Empirical Bayes Approach to Trade Off between Bias and Efficiency , 2006 .

[3]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[4]  N. Hjort,et al.  Frequentist Model Average Estimators , 2003 .

[5]  William Wheeler,et al.  A multi-stage genome-wide association study of bladder cancer identifies multiple susceptibility loci , 2010, Nature Genetics.

[6]  D. Cox,et al.  Variation in the FGFR2 Gene and the Effects of Postmenopausal Hormone Therapy on Invasive Breast Cancer , 2009, Cancer Epidemiology, Biomarkers & Prevention.

[7]  R. Gentleman,et al.  Independent filtering increases detection power for high-throughput experiments , 2010, Proceedings of the National Academy of Sciences.

[8]  Christoph Lange,et al.  Genomic screening and replication using the same data set in family-based association testing , 2005, Nature Genetics.

[9]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[10]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[11]  Juan Pablo Lewinger,et al.  Efficient genome-wide association testing of gene-environment interaction in case-parent trios. , 2010, American journal of epidemiology.

[12]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[13]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[14]  P S Albert,et al.  Limitations of the case-only design for identifying gene-environment interactions. , 2001, American journal of epidemiology.

[15]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[16]  H. White Asymptotic theory for econometricians , 1985 .

[17]  Peter Kraft,et al.  Gene-environment interactions in genome-wide association studies: a comparative study of tests applied to empirical studies of type 2 diabetes. , 2012, American journal of epidemiology.

[18]  Juan Pablo Lewinger,et al.  Sample size requirements to detect gene‐environment interactions in genome‐wide association studies , 2011, Genetic epidemiology.

[19]  D. Cox,et al.  Variation in the FGFR2 Gene and the Effect of a Low-Fat Dietary Pattern on Invasive Breast Cancer , 2010, Cancer Epidemiology, Biomarkers & Prevention.

[20]  James Y Dai,et al.  Semiparametric Estimation Exploiting Covariate Independence in Two‐Phase Randomized Trials , 2009, Biometrics.

[21]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[22]  D. Hunter Gene–environment interactions in human diseases , 2005, Nature Reviews Genetics.

[23]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[24]  James Y. Dai,et al.  Genetic variants in the MRPS30 region and postmenopausal breast cancer risk , 2011, Genome Medicine.

[25]  M. LeBlanc,et al.  Increasing the power of identifying gene × gene interactions in genome‐wide association studies , 2008, Genetic epidemiology.

[26]  Jack A. Taylor,et al.  Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. , 1994, Statistics in medicine.

[27]  Bhramar Mukherjee,et al.  Exploiting Gene‐Environment Independence for Analysis of Case–Control Studies: An Empirical Bayes‐Type Shrinkage Estimator to Trade‐Off between Bias and Efficiency , 2008, Biometrics.

[28]  James L. Powell,et al.  Efficient Estimation of Linear and Type I Censored Regression Models Under Conditional Quantile Restrictions , 1990, Econometric Theory.

[29]  Benjamin A. Logsdon,et al.  Simultaneously testing for marginal genetic association and gene-environment interaction. , 2012, American journal of epidemiology.

[30]  W. Gauderman,et al.  Gene-environment interaction in genome-wide association studies. , 2008, American journal of epidemiology.

[31]  C R Weinberg,et al.  Designing and analysing case-control studies to exploit independence of genotype and exposure. , 1997, Statistics in medicine.

[32]  Nilanjan Chatterjee,et al.  Semiparametric maximum likelihood estimation exploiting gene-environment independence in case-control studies , 2005 .

[33]  Carolyn Hutter,et al.  Powerful Cocktail Methods for Detecting Genome‐Wide Gene‐Environment Interaction , 2012, Genetic epidemiology.

[34]  D. Cox,et al.  Variation in the FGFR 2 Gene and the Effects of Postmenopausal Hormone Therapy on Invasive Breast Cancer , 2009 .

[35]  Robert L. Wolpert,et al.  Statistical Inference , 2019, Encyclopedia of Social Network Analysis and Mining.

[36]  Iuliana Ionita-Laza,et al.  Genomewide weighted hypothesis testing in family-based association studies, with an application to a 100K scan. , 2007, American journal of human genetics.

[37]  Jaeil Ahn,et al.  Testing gene-environment interaction in large-scale case-control association studies: possible choices and comparisons. , 2012, American journal of epidemiology.

[38]  Lihong Qi,et al.  Aspects of the design and analysis of high-dimensional SNP studies for disease risk estimation. , 2006, Biostatistics.