Testing for association with multiple traits in generalized estimation equations, with application to neuroimaging data

There is an increasing need to develop and apply powerful statistical tests to detect multiple traits-single locus associations, as arising from neuroimaging genetics and other studies. For example, in the Alzheimer's Disease Neuroimaging Initiative (ADNI), in addition to genome-wide single nucleotide polymorphisms (SNPs), thousands of neuroimaging and neuropsychological phenotypes as intermediate phenotypes for Alzheimer's disease, have been collected. Although some classic methods like MANOVA and newly proposed methods may be applied, they have their own limitations. For example, MANOVA cannot be applied to binary and other discrete traits. In addition, the relationships among these methods are not well understood. Importantly, since these tests are not data adaptive, depending on the unknown association patterns among multiple traits and between multiple traits and a locus, these tests may or may not be powerful. In this paper we propose a class of data-adaptive weights and the corresponding weighted tests in the general framework of generalized estimation equations (GEE). A highly adaptive test is proposed to select the most powerful one from this class of the weighted tests so that it can maintain high power across a wide range of situations. Our proposed tests are applicable to various types of traits with or without covariates. Importantly, we also analytically show relationships among some existing and our proposed tests, indicating that many existing tests are special cases of our proposed tests. Extensive simulation studies were conducted to compare and contrast the power properties of various existing and our new methods. Finally, we applied the methods to an ADNI dataset to illustrate the performance of the methods. We conclude with the recommendation for the use of the GEE-based Score test and our proposed adaptive test for their high and complementary performance.

[1]  Heping Zhang,et al.  Why Do We Test Multiple Traits in Genetic Association Studies? , 2009, Journal of the Korean Statistical Society.

[2]  Arnab Maity,et al.  Multivariate Phenotype Association Analysis by Marker‐Set Kernel Machine Regression , 2012, Genetic epidemiology.

[3]  Michael Weiner,et al.  Whole genome association study of brain-wide imaging phenotypes for identifying quantitative trait loci in MCI and AD: A study of the ADNI cohort , 2010, NeuroImage.

[4]  D. Y. Lin,et al.  An efficient Monte Carlo approach to assessing statistical significance in genomic studies , 2005, Bioinform..

[5]  N. Laird,et al.  A likelihood-based method for analysing longitudinal binary responses , 1993 .

[6]  Paul M Thompson,et al.  Imaging genomics: Mapping the influence of genetics on brain structure and function , 2007, Human brain mapping.

[7]  Wei Pan,et al.  Relationship between genomic distance‐based regression and kernel machine regression for multi‐marker association testing , 2011, Genetic epidemiology.

[8]  Xiaotong Shen,et al.  A Powerful and Adaptive Association Test for Rare Variants , 2014, Genetics.

[9]  Manuel A. R. Ferreira,et al.  Genetics and population analysis A multivariate test of association , 2009 .

[10]  W. Pan,et al.  Small‐sample performance of the robust score test and its modifications in generalized estimating equations , 2005, Statistics in medicine.

[11]  Scott E. Maxwell,et al.  How the power of MANOVA can both increase and decrease as a function of the intercorrelations among the dependent variables. , 1994 .

[12]  Heping Zhang,et al.  Nonparametric Covariate-Adjusted Association Tests Based on the Generalized Kendall's Tau , 2012, Journal of the American Statistical Association.

[13]  Wei Pan,et al.  Test Selection with Application to Detecting Disease Association with Multiple SNPs , 2009, Human Heredity.

[14]  B Müller-Myhsok,et al.  Rapid simulation of P values for product methods and multiple-testing adjustment in association studies. , 2005, American journal of human genetics.

[15]  W. Pan On the robust variance estimator in generalised estimating equations , 2001 .

[16]  Kai Wang,et al.  A principal components regression approach to multilocus genetic association studies , 2008, Genetic epidemiology.

[17]  N. Schork,et al.  Generalized genomic distance-based regression methodology for multilocus association analysis. , 2006, American journal of human genetics.

[18]  B. Yandell,et al.  Dimension reduction for mapping mRNA abundance as quantitative traits. , 2003, Genetics.

[19]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[20]  Nicholas J. Schork,et al.  Statistical Properties of Multivariate Distance Matrix Regression for High-Dimensional Data Analysis , 2012, Front. Gene..

[21]  Qiong Yang,et al.  Analyze multivariate phenotypes in genetic association studies by combining univariate association tests , 2010, Genetic epidemiology.

[22]  Min A. Jhun,et al.  SNP Set Association Analysis for Familial Data , 2012, Genetic epidemiology.

[23]  Brian H. McArdle,et al.  FITTING MULTIVARIATE MODELS TO COMMUNITY DATA: A COMMENT ON DISTANCE‐BASED REDUNDANCY ANALYSIS , 2001 .

[24]  Dan-Yu Lin,et al.  A general framework for detecting disease associations with rare variants in sequencing studies. , 2011, American journal of human genetics.

[25]  Ming-Huei Chen,et al.  A comparison of strategies for analyzing dichotomous outcomes in genome‐wide association studies with general pedigrees , 2011, Genetic epidemiology.

[26]  Weidong Liu,et al.  Two‐sample test of high dimensional means under dependence , 2014 .

[27]  Heping Zhang,et al.  An Association Test for Multiple Traits Based on the Generalized Kendall’s Tau , 2010, Journal of the American Statistical Association.

[28]  Conor V. Dolan,et al.  TATES: Efficient Multivariate Genotype-Phenotype Analysis for Genome-Wide Association Studies , 2013, PLoS genetics.

[29]  Wei Pan,et al.  Comparison of statistical tests for disease association with rare variants , 2011, Genetic epidemiology.

[30]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[31]  William G. Iacono,et al.  A Rapid Generalized Least Squares Model for a Genome-Wide Quantitative Trait Association Analysis in Families , 2011, Human Heredity.

[32]  Eleanor Feingold,et al.  A comparison of principal component analysis and factor analysis strategies for uncovering pleiotropic factors , 2009, Genetic epidemiology.

[33]  Christoph Lange,et al.  A multivariate family-based association test using generalized estimating equations: FBAT-GEE. , 2003, Biostatistics.

[34]  Kathryn Roeder,et al.  Pleiotropy and principal components of heritability combine to increase power for association analysis , 2008, Genetic epidemiology.

[35]  Xiaofeng Zhu,et al.  A variance component based multi-marker association test using family and unrelated data , 2013, BMC Genetics.

[36]  Martin Styner,et al.  Projection Regression Models for Multivariate Imaging Phenotype , 2012, Genetic epidemiology.

[37]  H. Deng,et al.  Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations , 2009, Genetic epidemiology.

[38]  Qiong Yang,et al.  Methods for Analyzing Multivariate Phenotypes in Genetic Association Studies. , 2012, Journal of probability and statistics.

[39]  P. O’Reilly,et al.  MultiPhen: Joint Model of Multiple Phenotypes Can Increase Discovery in GWAS , 2012, PloS one.

[40]  Bjarni J. Vilhjálmsson,et al.  A mixed-model approach for genome-wide association studies of correlated traits in structured populations , 2012, Nature Genetics.

[41]  Wei Pan,et al.  Asymptotic tests of association with multiple SNPs in linkage disequilibrium , 2009, Genetic epidemiology.

[42]  Norbert Schuff,et al.  UCSF FreeSurfer Methods , 2014 .

[43]  P. McCullagh Regression Models for Ordinal Data , 1980 .