A comparison of Cox and logistic regression for use in genome-wide association studies of cohort and case-cohort design

Logistic regression is often used instead of Cox regression to analyse genome-wide association studies (GWAS) of single-nucleotide polymorphisms (SNPs) and disease outcomes with cohort and case-cohort designs, as it is less computationally expensive. Although Cox and logistic regression models have been compared previously in cohort studies, this work does not completely cover the GWAS setting nor extend to the case-cohort study design. Here, we evaluated Cox and logistic regression applied to cohort and case-cohort genetic association studies using simulated data and genetic data from the EPIC-CVD study. In the cohort setting, there was a modest improvement in power to detect SNP–disease associations using Cox regression compared with logistic regression, which increased as the disease incidence increased. In contrast, logistic regression had more power than (Prentice weighted) Cox regression in the case-cohort setting. Logistic regression yielded inflated effect estimates (assuming the hazard ratio is the underlying measure of association) for both study designs, especially for SNPs with greater effect on disease. Given logistic regression is substantially more computationally efficient than Cox regression in both settings, we propose a two-step approach to GWAS in cohort and case-cohort studies. First to analyse all SNPs with logistic regression to identify associated variants below a pre-defined P-value threshold, and second to fit Cox regression (appropriately weighted in case-cohort studies) to those identified SNPs to ensure accurate estimation of association with disease.

[1]  J. Danesh,et al.  EPIC-Heart: The cardiovascular component of a prospective study of nutritional, lifestyle and biological factors in 520,000 middle-aged participants from 10 European countries , 2007, European Journal of Epidemiology.

[2]  Thomas W. Mühleisen,et al.  Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease , 2011, Nature Genetics.

[3]  N Slimani,et al.  Design and cohort description of the InterAct Project: an examination of the interaction of genetic and lifestyle factors on the incidence of type 2 diabetes in the EPIC Study , 2011, Diabetologia.

[4]  Ralf Bender,et al.  Generating survival times to simulate Cox proportional hazards models , 2005, Statistics in medicine.

[5]  M J Symons,et al.  A comparison of the logistic risk function and the proportional hazards model in prospective epidemiologic studies. , 1983, Journal of chronic diseases.

[6]  P. Mock Empirical comparisons of proportional hazards and logistic regression models. , 1990, Statistics in medicine.

[7]  R. L. Prentice,et al.  A case-cohort design for epidemiologic cohort studies and disease prevention trials , 1986 .

[8]  P. Peduzzi,et al.  Comparison of the logistic and Cox regression models when outcome is determined in all patients after a fixed period of time. , 1987, Journal of chronic diseases.

[9]  E. Steyerberg,et al.  Cox proportional hazards models have more statistical power than logistic regression models in cross-sectional genetic association studies , 2008, European Journal of Human Genetics.

[10]  J Lellouch,et al.  Efficiency of the logistic regression and Cox proportional hazards models in longitudinal studies. , 1989, Statistics in medicine.

[11]  J. Danesh,et al.  Large-scale association analysis identifies new risk loci for coronary artery disease , 2013 .

[12]  J. Cuzick The Efficiency of the Proportions Test and the Logrank Test for Censored Survival Data , 1982 .

[13]  Yurii S. Aulchenko,et al.  BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btm108 Genetics and population analysis GenABEL: an R library for genome-wide association analysis , 2022 .

[14]  W E Barlow,et al.  Analysis of case-cohort designs. , 1999, Journal of clinical epidemiology.

[15]  D. Hosmer,et al.  Empirical comparisons of proportional hazards, poisson, and logistic regression modeling of occupational cohort data. , 1998, American journal of industrial medicine.