Including known covariates can reduce power to detect genetic effects in case-control studies

Genome-wide association studies (GWAS) search for associations between genetic variants and disease status, typically via logistic regression. Often there are covariates, such as sex or well-established major genetic factors, that are known to affect disease susceptibility and are independent of tested genotypes at the population level. We show theoretically and with data from recent GWAS on multiple sclerosis, psoriasis and ankylosing spondylitis that inclusion of known covariates can substantially reduce power for the identification of associated variants when the disease prevalence is lower than a few percent. Whether the inclusion of such covariates reduces or increases power to detect genetic effects depends on various factors, including the prevalence of the disease studied. When the disease is common (prevalence of >20%), the inclusion of covariates typically increases power, whereas, for rarer diseases, it can often decrease power to detect new genetic associations.

[1]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[2]  Lung-fei Lee,et al.  Specification error in multinomial logit models : Analysis of the omitted variable bias , 1982 .

[3]  N. Jewell,et al.  Some surprising results about covariate adjustment in logistic regression models , 1991 .

[4]  N. Jewell,et al.  A geometric approach to assess bias due to omitted covariates in generalized linear models , 1993 .

[5]  J. Neuhaus Estimation efficiency with omitted covariates in generalized linear models , 1998 .

[6]  P. Donnelly,et al.  Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip , 2009, PLoS genetics.

[7]  Matti Pirinen,et al.  A genome-wide association study identifies new psoriasis susceptibility loci and an interaction between HLA-C and ERAP1 , 2010, Nature Genetics.

[8]  C. Xing,et al.  Adjusting for covariates in logistic regression models , 2010, Genetic epidemiology.

[9]  D. Zeng,et al.  On the relative efficiency of using summary statistics versus individual-level data in meta-analysis. , 2010, Biometrika.

[10]  Paul Weston,et al.  Interaction between ERAP1 and HLA-B27 in ankylosing spondylitis implicates peptide handling in the mechanism for HLA-B27 in disease susceptibility , 2011, Nature Genetics.

[11]  Simon C. Potter,et al.  Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis , 2011, Nature.

[12]  L. Moutsianas,et al.  Corrigendum: Interaction between ERAP1 and HLA-B27 in ankylosing spondylitis implicates peptide handling in the mechanism for HLA-B27 in disease susceptibility , 2011, Nature Genetics.

[13]  P. Donnelly,et al.  Disease Model Distortion in Association Studies , 2011, Genetic epidemiology.

[14]  N. Wray,et al.  Underestimated Effect Sizes in GWAS: Fundamental Limitations of Single SNP Analysis for Dichotomous Phenotypes , 2011, PloS one.

[15]  Wolfgang Hoffmann,et al.  Genome-wide association study reveals three susceptibility loci for common migraine in the general population , 2011, Nature Genetics.