Methods for interaction analyses using family‐based case‐control data: conditional logistic regression versus generalized estimating equations

A complex web of gene‐gene and gene‐environment interactions likely underlies late‐onset disease development. We compared conditional logistic regression (CLR) and generalized estimating equations (GEE) in modeling such interactions in pedigrees with missing parents. Using the simulation of linkage and association (SIMLA) program, disease genes, an environmental risk factor, gene‐gene interaction, and gene‐environment interaction were generated in family‐based data sets. Four scenarios for the relationship between the marker and disease loci were examined: linkage and association, linkage without association, association without linkage, and absence of both linkage and association. Models for CLR and GEE (with exchangeable and independence correlation matrices) were built, and type I error, power, average odds ratio (OR), standard deviation, and 95% confidence intervals were estimated. CLR and GEE were valid tests of association in the presence of linkage, but type I error was inflated for association without linkage, particularly with GEE. CLR generated estimates of the OR with lower bias but often more variability than the OR estimates observed for GEE. Further, GEE was more powerful than CLR in detecting main and interactive effects. Although GEE with both matrices had similar power, use of the independence matrix resulted in lower type I error and less biased OR estimation as compared to the exchangeable matrix. Our findings support the use of GEE in maximizing power to detect gene‐gene and gene‐environment interactions but caution its use under potential association without linkage (e.g., population stratification) and the interpretation of its OR estimates. Genet. Epidemiol. 2007. © 2007 Wiley‐Liss, Inc.

[1]  K Y Liang,et al.  Longitudinal data analysis for discrete and continuous outcomes. , 1986, Biometrics.

[2]  J. Kalbfleisch,et al.  A Comparison of Cluster-Specific and Population-Averaged Approaches for Analyzing Correlated Binary Data , 1991 .

[3]  Ruth Ottman Gene-environment interaction: definitions and study designs. , 1996 .

[4]  Stuart R. Lipsitz,et al.  Review of Software to Fit Generalized Estimating Equation Regression Models , 1999 .

[5]  C R Weinberg,et al.  Choosing a retrospective design to assess joint genetic and environmental contributions to risk. , 2000, American journal of epidemiology.

[6]  B Langholz,et al.  Testing linkage disequilibrium in sibships. , 2000, American journal of human genetics.

[7]  S. Bull,et al.  Design considerations for association studies of candidate genes in families , 2001, Genetic epidemiology.

[8]  J. Hardin,et al.  Generalized Estimating Equations , 2002 .

[9]  J. Hanley,et al.  Statistical analysis of correlated data using generalized estimating equations: an orientation. , 2003, American journal of epidemiology.

[10]  D J Schaid,et al.  Candidate‐gene association studies with pedigree data: Controlling for environmental covariates , 2003, Genetic epidemiology.

[11]  W. Tan,et al.  Functional polymorphisms in cell death pathway genes FAS and FASL contribute to risk of lung cancer , 2005, Journal of Medical Genetics.

[12]  Laurent Briollais,et al.  SNP-SNP interactions in breast cancer susceptibility , 2006, BMC Cancer.

[13]  A. B. Perkins,et al.  High-density single-nucleotide polymorphism maps of the human genome. , 2005, Genomics.

[14]  Mike Schmidt,et al.  Statistical Applications in Genetics and Molecular Biology Extension of the SIMLA Package for Generating Pedigrees with Complex Inheritance Patterns : Environmental Covariates , Gene-Gene and Gene-Environment Interaction , 2011 .