GEE‐Based SNP Set Association Test for Continuous and Discrete Traits in Family‐Based Association Studies

Family‐based genetic association studies of related individuals provide opportunities to detect genetic variants that complement studies of unrelated individuals. Most statistical methods for family association studies for common variants are single marker based, which test one SNP a time. In this paper, we consider testing the effect of an SNP set, e.g., SNPs in a gene, in family studies, for both continuous and discrete traits. Specifically, we propose a generalized estimating equations (GEEs) based kernel association test, a variance component based testing method, to test for the association between a phenotype and multiple variants in an SNP set jointly using family samples. The proposed approach allows for both continuous and discrete traits, where the correlation among family members is taken into account through the use of an empirical covariance estimator. We derive the theoretical distribution of the proposed statistic under the null and develop analytical methods to calculate the P‐values. We also propose an efficient resampling method for correcting for small sample size bias in family studies. The proposed method allows for easily incorporating covariates and SNP‐SNP interactions. Simulation studies show that the proposed method properly controls for type I error rates under both random and ascertained sampling schemes in family studies. We demonstrate through simulation studies that our approach has superior performance for association mapping compared to the single marker based minimum P‐value GEE test for an SNP‐set effect over a range of scenarios. We illustrate the application of the proposed method using data from the Cleveland Family GWAS Study.

[1]  F. E. Satterthwaite An approximate distribution of estimates of variance components. , 1946, Biometrics.

[2]  C. Park,et al.  A Simple Method for Generating Correlated Binary Variates , 1996 .

[3]  Emmanuel Flachaire,et al.  The wild bootstrap, tamed at last , 2001 .

[4]  B S Weir,et al.  Truncated product method for combining P‐values , 2002, Genetic epidemiology.

[5]  Frank Dudbridge,et al.  Rank truncated product of P‐values, with application to genomewide association scans , 2003, Genetic epidemiology.

[6]  R. Elston,et al.  A whole-genome scan for obstructive sleep apnea and obesity. , 2003, American journal of human genetics.

[7]  N. Laird,et al.  Family-based designs in the age of large-scale gene-association studies , 2006, Nature Reviews Genetics.

[8]  N. Schork,et al.  Generalized genomic distance-based regression methodology for multilocus association analysis. , 2006, American journal of human genetics.

[9]  Xihong Lin,et al.  Semiparametric Regression of Multidimensional Genetic Pathway Data: Least‐Squares Kernel Machines and Linear Mixed Models , 2007, Biometrics.

[10]  Tao Wang,et al.  Improved power by use of a weighted score test for linkage disequilibrium mapping. , 2007, American journal of human genetics.

[11]  Ruth M Pfeiffer,et al.  On combining family and case‐control studies , 2008, Genetic epidemiology.

[12]  Kai Wang,et al.  A principal components regression approach to multilocus genetic association studies , 2008, Genetic epidemiology.

[13]  Xiaofeng Zhu,et al.  A unified association analysis approach for family and unrelated samples correcting for stratification. , 2008, American journal of human genetics.

[14]  P. Rosenberg,et al.  Pathway analysis by adaptive combination of P‐values , 2009, Genetic epidemiology.

[15]  Jung-Ying Tzeng,et al.  Gene‐Trait Similarity Regression for Multimarker‐Based Association Analysis , 2009, Biometrics.

[16]  Tao Wang,et al.  A partial least‐square approach for modeling gene‐gene and gene‐environment interactions when multiple markers are genotyped , 2009, Genetic epidemiology.

[17]  Anbupalam Thalamuthu,et al.  Association tests using kernel‐based measures of multi‐locus genotype similarity between individuals , 2009, Genetic epidemiology.

[18]  Wei Pan,et al.  Powerful multi‐marker association tests: unifying genomic distance‐based regression and logistic regression , 2010, Genetic epidemiology.

[19]  Yingye Zheng,et al.  On Combining Family‐Based and Population‐Based Case–Control Data in Association Studies , 2010, Biometrics.

[20]  R. Elston,et al.  Detecting rare variants for complex traits using family and unrelated data , 2010, Genetic epidemiology.

[21]  M. Province,et al.  Avoiding the high Bonferroni penalty in genome‐wide association studies , 2009, Genetic epidemiology.

[22]  Mary Sara McPeek,et al.  ROADTRIPS: case-control association testing with partially or completely unknown population and pedigree structure. , 2010, American journal of human genetics.

[23]  Ming-Huei Chen,et al.  GWAF: an R package for genome-wide association analyses with family data , 2010, Bioinform..

[24]  Pierre Lafaye de Micheaux,et al.  Computing the distribution of quadratic forms: Further comparisons between the Liu-Tang-Zhang approximation and exact methods , 2010, Comput. Stat. Data Anal..

[25]  Deanne M. Taylor,et al.  Powerful SNP-set analysis for case-control genome-wide association studies. , 2010, American journal of human genetics.

[26]  R. Elston,et al.  Detecting rare and common variants for complex traits: sibpair and odds ratio weighted sum statistics (SPWSS, ORWSS) , 2011, Genetic epidemiology.

[27]  J. Ott,et al.  Family-based designs for genome-wide association studies , 2011, Nature Reviews Genetics.

[28]  Jung-Ying Tzeng,et al.  Studying gene and gene-environment effects of uncommon and common variants on continuous traits: a marker-set approach using gene-trait similarity regression. , 2011, American journal of human genetics.

[29]  Tom R. Gaunt,et al.  Genetic Variants in Novel Pathways Influence Blood Pressure and Cardiovascular Disease Risk , 2011, Nature.

[30]  Johnny S. H. Kwan,et al.  GATES: a rapid and powerful gene-based association test using extended Simes procedure. , 2011, American journal of human genetics.

[31]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[32]  Peter Donnelly,et al.  HAPGEN2: simulation of multiple disease SNPs , 2011, Bioinform..

[33]  Ming-Huei Chen,et al.  A comparison of strategies for analyzing dichotomous outcomes in genome‐wide association studies with general pedigrees , 2011, Genetic epidemiology.

[34]  Joseph T. Glessner,et al.  Combined admixture mapping and association analysis identifies a novel blood pressure genetic locus on 5p13: contributions from the CARe consortium. , 2011, Human molecular genetics.

[35]  Judy H. Cho,et al.  Identification of association between disease and multiple markers via sparse partial least‐squares regression , 2011, Genetic epidemiology.

[36]  Tom R. Gaunt,et al.  Association of genetic variation with systolic and diastolic blood pressure among African Americans: the Candidate Gene Association Resource study , 2011, Human molecular genetics.

[37]  William G. Iacono,et al.  A Rapid Generalized Least Squares Model for a Genome-Wide Quantitative Trait Association Analysis in Families , 2011, Human Heredity.

[38]  Min A. Jhun,et al.  SNP Set Association Analysis for Familial Data , 2012, Genetic epidemiology.

[39]  M. Rieder,et al.  Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. , 2012, American journal of human genetics.

[40]  Xiaofeng Zhu,et al.  A variance component based multi-marker association test using family and unrelated data , 2013, BMC Genetics.

[41]  Xihong Lin,et al.  Optimal tests for rare variant effects in sequencing association studies. , 2012, Biostatistics.

[42]  J. Meigs,et al.  Sequence Kernel Association Test for Quantitative Traits in Family Samples , 2013, Genetic epidemiology.

[43]  Daniel J Schaid,et al.  Multiple Genetic Variant Association Testing by Collapsing and Kernel Methods With Pedigree or Population Structured Data , 2013, Genetic epidemiology.

[44]  Iuliana Ionita-Laza,et al.  Family-based association tests for sequence data, and comparisons with population-based association tests , 2013, European Journal of Human Genetics.