A comparison of strategies for analyzing dichotomous outcomes in genome‐wide association studies with general pedigrees

Genome‐wide association studies (GWAS) have been frequently conducted on general or isolated populations with related individuals. However, there is a lack of consensus on which strategy is most appropriate for analyzing dichotomous phenotypes in general pedigrees. Using simulation studies, we compared several strategies including generalized estimating equations (GEE) strategies with various working correlation structures, generalized linear mixed model (GLMM), and a variance component strategy (denoted LMEBIN) that treats dichotomous outcomes as continuous with special attentions to their performance with rare variants, rare diseases, and small sample sizes. In our simulations, when the sample size is not small, for type I error, only GEE and LMEBIN maintain nominal type I error in most cases with exceptions for GEE with very rare disease and genetic variants. GEE and LMEBIN have similar statistical power and slightly outperform GLMM when the prevalence is low. In terms of computational efficiency, GEE with sandwich variance estimator outperforms GLMM and LMEBIN. We apply the strategies to GWAS of gout in the Framingham Heart Study. Based on our results, we would recommend using GEE ind‐san in the GWAS for common variants and GEE ind‐fij or LMEBIN for rare variants for GWAS of dichotomous outcomes with general pedigrees. Genet. Epidemiol. 2011.  © 2011 Wiley Periodicals, Inc. 35:650‐657, 2011

[1]  D J Schaid,et al.  Candidate‐gene association studies with pedigree data: Controlling for environmental covariates , 2003, Genetic epidemiology.

[2]  Sharon R. Browning,et al.  Case‐control single‐marker and haplotypic association analysis of pedigree data , 2005, Genetic epidemiology.

[3]  W. Hauck,et al.  Wald's Test as Applied to Hypotheses in Logit Analysis , 1977 .

[4]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[5]  R. Carroll,et al.  A Note on the Efficiency of Sandwich Covariance Matrix Estimation , 2001 .

[6]  Douglas M. Bates,et al.  Linear mixed models and penalized least squares , 2004 .

[7]  L. Almasy,et al.  Multipoint quantitative-trait linkage analysis in general pedigrees. , 1998, American journal of human genetics.

[8]  D. Hernandez,et al.  Multiple Genetic Loci Influence Serum Urate Levels and Their Relationship With Gout and Cardiovascular Disease Risk Factors , 2010, Circulation. Cardiovascular genetics.

[9]  D J Schaid,et al.  Evaluation of candidate genes in case-control studies: a statistical method to account for related subjects. , 2001, American journal of human genetics.

[10]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[11]  Ming-Huei Chen,et al.  GWAF: an R package for genome-wide association analyses with family data , 2010, Bioinform..

[12]  Jason Fine,et al.  Estimating equations for association structures , 2004, Statistics in medicine.

[13]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[14]  Zhiwu Zhang,et al.  Mixed linear model approach adapted for genome-wide association studies , 2010, Nature Genetics.

[15]  S R Lipsitz,et al.  Jackknife estimators of variance for parameter estimates from estimating equations with applications to clustered survival data. , 1994, Biometrics.

[16]  Myunghee C. Paik,et al.  Repeated measurement analysis for nonnormal data in small samples , 1988 .

[17]  D. Bates,et al.  Approximations to the Log-Likelihood Function in the Nonlinear Mixed-Effects Model , 1995 .