Informed Conditioning on Clinical Covariates Increases Power in Case-Control Association Studies

Genetic case-control association studies often include data on clinical covariates, such as body mass index (BMI), smoking status, or age, that may modify the underlying genetic risk of case or control samples. For example, in type 2 diabetes, odds ratios for established variants estimated from low–BMI cases are larger than those estimated from high–BMI cases. An unanswered question is how to use this information to maximize statistical power in case-control studies that ascertain individuals on the basis of phenotype (case-control ascertainment) or phenotype and clinical covariates (case-control-covariate ascertainment). While current approaches improve power in studies with random ascertainment, they often lose power under case-control ascertainment and fail to capture available power increases under case-control-covariate ascertainment. We show that an informed conditioning approach, based on the liability threshold model with parameters informed by external epidemiological information, fully accounts for disease prevalence and non-random ascertainment of phenotype as well as covariates and provides a substantial increase in power while maintaining a properly controlled false-positive rate. Our method outperforms standard case-control association tests with or without covariates, tests of gene x covariate interaction, and previously proposed tests for dealing with covariates in ascertained data, with especially large improvements in the case of case-control-covariate ascertainment. We investigate empirical case-control studies of type 2 diabetes, prostate cancer, lung cancer, breast cancer, rheumatoid arthritis, age-related macular degeneration, and end-stage kidney disease over a total of 89,726 samples. In these datasets, informed conditioning outperforms logistic regression for 115 of the 157 known associated variants investigated (P-value = 1×10−9). The improvement varied across diseases with a 16% median increase in χ2 test statistics and a commensurate increase in power. This suggests that applying our method to existing and future association studies of these diseases may identify novel disease loci.

[1]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[2]  Ayellet V. Segrè,et al.  Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis , 2010, Nature Genetics.

[3]  Anne Lohrli Chapman and Hall , 1985 .

[4]  Johan Van Limbergen,et al.  Common variants at five new loci associated with early-onset inflammatory bowel disease , 2009, Nature Genetics.

[5]  D. Altshuler,et al.  Power in the phenotypic extremes: a simulation study of power in discovery and replication of rare variants , 2011, Genetic epidemiology.

[6]  D. Clayton,et al.  Improved power offered by a score test for linkage disequilibrium mapping of quantitative-trait loci by selective genotyping. , 2006, American journal of human genetics.

[7]  D. Altshuler,et al.  Consistent Association of Type 2 Diabetes Risk Variants Found in Europeans in Diverse Racial and Ethnic Groups , 2010, PLoS genetics.

[8]  V. Cameron,et al.  A Common Variant at Chromosome 9P21.3 Is Associated With Age of Onset of Coronary Disease but Not Subsequent Mortality , 2010, Circulation. Cardiovascular genetics.

[9]  W. G. Hill,et al.  Data and Theory Point to Mainly Additive Genetic Variance for Complex Traits , 2008, PLoS genetics.

[10]  D. Thomas,et al.  Gene–environment-wide association studies: emerging approaches , 2010, Nature Reviews Genetics.

[11]  M. Pirinen,et al.  Including known covariates can reduce power to detect genetic effects in case-control studies , 2012, Nature Genetics.

[12]  Hon-Cheong So,et al.  A Unifying Framework for Evaluating the Predictive Power of Genetic Variants Based on the Level of Heritability Explained , 2010, PLoS genetics.

[13]  D. Campa,et al.  A comprehensive analysis of phase I and phase II metabolism gene polymorphisms and risk of non-small cell lung cancer in smokers. , 2008, Carcinogenesis.

[14]  Peter Kraft,et al.  Analysis of case-control association studies with known risk variants , 2012, Bioinform..

[15]  Chia-Ling Kuo,et al.  What's the best statistic for a simple test of genetic association in a case‐control study? , 2009, Genetic epidemiology.

[16]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[17]  Stephen J. Chanock,et al.  Genomics: When the smoke clears ... , 2008, Nature.

[18]  P. Kraft,et al.  Genome‐wide association scans for secondary traits using case‐control samples , 2009, Genetic epidemiology.

[19]  Beate Ritz,et al.  Genome-Wide Gene-Environment Study Identifies Glutamate Receptor Gene GRIN2A as a Parkinson's Disease Modifier Gene via Interaction with Coffee , 2011, PLoS genetics.

[20]  Hongbing Shen,et al.  Association analyses identify multiple new lung cancer susceptibility loci and their interactions with smoking in the Chinese population , 2012, Nature Genetics.

[21]  Laura J. Scott,et al.  Stratifying Type 2 Diabetes Cases by BMI Identifies Genetic Risk Variants in LAMA1 and Enrichment for Risk Variants in Lean Compared to Obese Cases , 2012, PLoS genetics.

[22]  Peter Kraft,et al.  Exploiting Gene-Environment Interaction to Detect Genetic Associations , 2007, Human Heredity.

[23]  P. Armitage Tests for Linear Trends in Proportions and Frequencies , 1955 .

[24]  J. Neuhaus Estimation efficiency with omitted covariates in generalized linear models , 1998 .

[25]  E. Rimm,et al.  A prospective study of 2 major age-related macular degeneration susceptibility alleles and interactions with modifiable risk factors. , 2007, Archives of ophthalmology.

[26]  M. Jarvelin,et al.  A Common Variant in the FTO Gene Is Associated with Body Mass Index and Predisposes to Childhood and Adult Obesity , 2007, Science.

[27]  Hongbing Shen,et al.  Common genetic variants on 5p15.33 contribute to risk of lung adenocarcinoma in a Chinese population. , 2009, Carcinogenesis.

[28]  Jack A. Taylor,et al.  Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. , 1994, Statistics in medicine.

[29]  Eric J Tchetgen Tchetgen,et al.  Genetic variants on 15q25.1, smoking, and lung cancer: an assessment of mediation and interaction. , 2012, American journal of epidemiology.

[30]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[31]  Nilanjan Chatterjee,et al.  Semiparametric maximum likelihood estimation exploiting gene-environment independence in case-control studies , 2005 .

[32]  H. Stefánsson,et al.  Identification of low-frequency variants associated with gout and serum uric acid levels , 2011, Nature Genetics.

[33]  N. Jewell,et al.  Some surprising results about covariate adjustment in logistic regression models , 1991 .

[34]  M. J. van der Laan,et al.  Simple Optimal Weighting of Cases and Controls in Case-Control Studies , 2008, The international journal of biostatistics.

[35]  N. Cook,et al.  Rationale, design, and methodology of the Women's Genome Health Study: a genome-wide association study of more than 25,000 initially healthy american women. , 2008, Clinical chemistry.

[36]  Dan M Roden,et al.  A rare variant in MYH6 is associated with high risk of sick sinus syndrome , 2011, Nature Genetics.

[37]  N Risch,et al.  Extreme discordant sib pairs for mapping quantitative trait loci in humans. , 1995, Science.

[38]  A. Hackshaw,et al.  Cigarette smoking: an epidemiological overview. , 1996, British medical bulletin.

[39]  Su-Chun Cheng,et al.  Semiparametric regression analysis of mean residual life with censored survival data , 2005 .

[40]  E. Lander,et al.  Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. , 1989, Genetics.

[41]  C. Winkler,et al.  Association of Trypanolytic ApoL1 Variants with Kidney Disease in African Americans , 2010, Science.

[42]  Z. Anusz [Statistics in epidemiology]. , 1974, Pielegniarka i polozna.

[43]  A. Whittemore,et al.  Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men , 2006, Proceedings of the National Academy of Sciences.

[44]  P. Visscher,et al.  Estimating missing heritability for disease from genome-wide association studies. , 2011, American journal of human genetics.

[45]  S. Duffy,et al.  The Liverpool Lung Project research protocol. , 2005, International journal of oncology.

[46]  Sivakumar Gowrisankar,et al.  A rare penetrant mutation in CFH confers high risk of age-related macular degeneration , 2011, Nature Genetics.

[47]  Peter Kraft,et al.  Interactions between genetic variants and breast cancer risk factors in the breast and prostate cancer cohort consortium. , 2011, Journal of the National Cancer Institute.

[48]  W. Willett,et al.  A candidate gene approach to searching for low-penetrance breast and prostate cancer genes , 2005, Nature Reviews Cancer.

[49]  M. Kendall Theoretical Statistics , 1956, Nature.

[50]  M. Spitz,et al.  Chipping away at the genetics of smoking behavior , 2010, Nature Genetics.

[51]  P. Visscher,et al.  The Genetic Interpretation of Area under the ROC Curve in Genomic Profiling , 2010, PLoS genetics.

[52]  Suzanne Chambers,et al.  Seven prostate cancer susceptibility loci identified by a multi-stage genome-wide association study , 2011, Nature Genetics.

[53]  Larry Wasserman,et al.  All of Statistics , 2004 .

[54]  B. Henderson,et al.  Diabetes prevalence and body mass index differ by ethnicity: the Multiethnic Cohort. , 2009, Ethnicity & disease.

[55]  N. Risch,et al.  Mapping quantitative trait loci with extreme discordant sib pairs: sampling considerations. , 1996, American journal of human genetics.

[56]  T. Lumley,et al.  The importance of the normality assumption in large public health data sets. , 2002, Annual review of public health.

[57]  D. Falconer The inheritance of liability to diseases with variable age of onset, with particular reference to diabetes mellitus , 1967, Annals of human genetics.

[58]  Alberto Piazza,et al.  Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number variants , 2009, Nature Genetics.

[59]  J. Stamatoyannopoulos,et al.  Power of deep, all-exon resequencing for discovery of human trait genes , 2009, Proceedings of the National Academy of Sciences.

[60]  Donald W. Bowden,et al.  Association of Trypanolytic ApoL 1 Variants with Kidney Disease in African Americans , 2010 .

[61]  D. Strachan,et al.  Rheumatoid arthritis association at 6q23 , 2007, Nature Genetics.

[62]  D. Clayton,et al.  Link Functions in Multi-Locus Genetic Models: Implications for Testing, Prediction, and Interpretation , 2012, Genetic epidemiology.

[63]  Jeanine J. Houwing-Duistermaat,et al.  Power of Selective Genotyping in Genetic Association Analyses of Quantitative Traits , 2000, Behavior genetics.

[64]  Peter Kraft,et al.  Characterizing Associations and SNP-Environment Interactions for GWAS-Identified Prostate Cancer Risk Markers—Results from BPC3 , 2011, PloS one.