Univariate/Multivariate Genome-Wide Association Scans Using Data from Families and Unrelated Samples

As genome-wide association studies (GWAS) are becoming more popular, two approaches, among others, could be considered in order to improve statistical power for identifying genes contributing subtle to moderate effects to human diseases. The first approach is to increase sample size, which could be achieved by combining both unrelated and familial subjects together. The second approach is to jointly analyze multiple correlated traits. In this study, by extending generalized estimating equations (GEEs), we propose a simple approach for performing univariate or multivariate association tests for the combined data of unrelated subjects and nuclear families. In particular, we correct for population stratification by integrating principal component analysis and transmission disequilibrium test strategies. The proposed method allows for multiple siblings as well as missing parental information. Simulation studies show that the proposed test has improved power compared to two popular methods, EIGENSTRAT and FBAT, by analyzing the combined data, while correcting for population stratification. In addition, joint analysis of bivariate traits has improved power over univariate analysis when pleiotropic effects are present. Application to the Genetic Analysis Workshop 16 (GAW16) data sets attests to the feasibility and applicability of the proposed method.

[1]  Xiaofeng Zhu,et al.  On a semiparametric test to detect associations between quantitative traits and candidate genes using unrelated individuals , 2003, Genetic epidemiology.

[2]  D. Schaid General score tests for associations of genetic markers with disease using cases and their parents , 1996, Genetic epidemiology.

[3]  Lon R. Cardon,et al.  The complex interplay among factors that influence allelic association , 2004, Nature Reviews Genetics.

[4]  Xiaofeng Zhu,et al.  Linkage analysis of a complex disease through use of admixed populations. , 2004, American journal of human genetics.

[5]  W. Kannel,et al.  Factors of risk in the development of coronary heart disease--six year follow-up experience. The Framingham Study. , 1961, Annals of internal medicine.

[6]  R. Fan,et al.  Bivariate combined linkage and association mapping of quantitative trait loci , 2008, Genetic epidemiology.

[7]  D. Balding,et al.  A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity , 2005, Genetica.

[8]  N. Schork,et al.  Extended multipoint identity-by-descent analysis of human quantitative traits: efficiency, power, and modeling considerations. , 1993, American journal of human genetics.

[9]  Seong S. Chae,et al.  Effect of using principal coordinates and principal components on retrieval of clusters , 2006, Comput. Stat. Data Anal..

[10]  D Rabinowitz,et al.  A transmission disequilibrium test for quantitative trait loci. , 1997, Human heredity.

[11]  Ying Wang,et al.  Genomewide association study of leprosy. , 2009, The New England journal of medicine.

[12]  D. Allison,et al.  Transmission-disequilibrium tests for quantitative traits. , 1997, American journal of human genetics.

[13]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[14]  Chun Li,et al.  Genetic association analysis using data from triads and unrelated subjects. , 2005, American journal of human genetics.

[15]  Daniel Rabinowitz,et al.  A Unified Approach to Adjusting Association Tests for Population Admixture with Arbitrary Pedigree Structure and Arbitrary Missing Marker Information , 2000, Human Heredity.

[16]  Xin Xu,et al.  Implementing a unified approach to family‐based tests of association , 2000, Genetic epidemiology.

[17]  L. Almasy,et al.  Bivariate quantitative trait linkage analysis: Pleiotropy versus co‐incident linkages , 1997, Genetic epidemiology.

[18]  S. Wright,et al.  Genetical Structure of Populations , 1950, Nature.

[19]  L. Cardon,et al.  Population stratification and spurious allelic association , 2003, The Lancet.

[20]  B S Weir,et al.  Truncated product method for combining P‐values , 2002, Genetic epidemiology.

[21]  M Farrall,et al.  Integrating case-control and TDT studies. , 2005, Annals of human genetics.

[22]  Peter Teunis,et al.  Combining the transmission disequilibrium test and case–control methodology using generalized logistic regression , 2004, European Journal of Human Genetics.

[23]  J C Whittaker,et al.  Mapping quantitative trait Loci using generalized estimating equations. , 2001, Genetics.

[24]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES III: COMPUTING A COMPRESSED APPROXIMATE MATRIX DECOMPOSITION∗ , 2004 .

[25]  Mark D Shriver,et al.  Measuring European population stratification with microarray genotype data. , 2007, American journal of human genetics.

[26]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[27]  G. Abecasis,et al.  A general test of association for quantitative traits in nuclear families. , 2000, American journal of human genetics.

[28]  M. Farrall,et al.  Integrating Case‐control and TDT Studies , 2005 .

[29]  I. Ionita-Laza,et al.  Estimating the number of unseen variants in the human genome , 2009, Proceedings of the National Academy of Sciences.

[30]  R. Elston,et al.  A multivariate method for detecting genetic linkage, with application to a pedigree with an adverse lipoprotein phenotype. , 1990, American journal of human genetics.

[31]  N M Laird,et al.  Family-based tests of association in the presence of linkage. , 2000, American journal of human genetics.

[32]  P. van Eerdewegh,et al.  Joint multipoint linkage analysis of multivariate qualitative and quantitative traits. I. Likelihood formulation and simulation results. , 1999, American journal of human genetics.

[33]  R C Elston,et al.  Transmission/disequilibrium tests for quantitative traits , 2001, Genetic epidemiology.

[34]  H. Deng,et al.  Population admixture may appear to mask, change or reverse genetic effects of genes underlying complex traits. , 2001, Genetics.

[35]  C I Amos,et al.  A comparison of univariate and multivariate tests for genetic linkage , 1993, Genetic epidemiology.

[36]  Xiaofeng Zhu,et al.  A unified association analysis approach for family and unrelated samples correcting for stratification. , 2008, American journal of human genetics.

[37]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[38]  Yi-Hau Chen,et al.  Simple association analysis combining data from trios/sibships and unrelated controls , 2008, Genetic epidemiology.

[39]  Xiaolin Zhu,et al.  Qualitative Semi‐Parametric Test for Genetic Associations in Case‐Control Designs Under Structured Populations , 2003, Annals of human genetics.

[40]  Z B Zeng,et al.  Multiple trait analysis of genetic mapping for quantitative trait loci. , 1995, Genetics.

[41]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[42]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[43]  J K Hewitt,et al.  Combined linkage and association sib-pair analysis for quantitative traits. , 1999, American journal of human genetics.

[44]  H. Deng,et al.  Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations , 2009, Genetic epidemiology.

[45]  K Y Liang,et al.  Longitudinal data analysis for discrete and continuous outcomes. , 1986, Biometrics.

[46]  N M Laird,et al.  Family-based tests of association and linkage that use unaffected sibs, covariates, and interactions. , 2000, American journal of human genetics.

[47]  Xiaofeng Zhu,et al.  Association mapping, using a mixture model for complex traits , 2002, Genetic epidemiology.

[48]  D. Nickerson,et al.  Tracing Sub-Structure in the European American Population with PCA-Informative Markers , 2008, PLoS genetics.

[49]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[50]  Christoph Lange,et al.  A multivariate family-based association test using generalized estimating equations: FBAT-GEE. , 2003, Biostatistics.

[51]  Christoph Lange,et al.  Power calculations for a general class of family-based association tests: dichotomous traits. , 2002, American journal of human genetics.

[52]  W. Ewens,et al.  Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). , 1993, American journal of human genetics.

[53]  H. Deng,et al.  Tests of Association for Quantitative Traits in Nuclear Families Using Principal Components to Correct for Population Stratification , 2009, Annals of human genetics.

[54]  Christoph Lange,et al.  Power and design considerations for a general class of family-based association tests: quantitative traits. , 2002, American journal of human genetics.

[55]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[56]  N. Laird,et al.  Family-based designs in the age of large-scale gene-association studies , 2006, Nature Reviews Genetics.