USAT: A Unified Score‐Based Association Test for Multiple Phenotype‐Genotype Analysis

Genome‐wide association studies (GWASs) for complex diseases often collect data on multiple correlated endo‐phenotypes. Multivariate analysis of these correlated phenotypes can improve the power to detect genetic variants. Multivariate analysis of variance (MANOVA) can perform such association analysis at a GWAS level, but the behavior of MANOVA under different trait models has not been carefully investigated. In this paper, we show that MANOVA is generally very powerful for detecting association but there are situations, such as when a genetic variant is associated with all the traits, where MANOVA may not have any detection power. In these situations, marginal model based methods, however, perform much better than multivariate methods. We investigate the behavior of MANOVA, both theoretically and using simulations, and derive the conditions where MANOVA loses power. Based on our findings, we propose a unified score‐based test statistic USAT that can perform better than MANOVA in such situations and nearly as well as MANOVA elsewhere. Our proposed test reports an approximate asymptotic P‐value for association and is computationally very efficient to implement at a GWAS level. We have studied through extensive simulations the performance of USAT, MANOVA, and other existing approaches and demonstrated the advantage of using the USAT approach to detect association between a genetic variant and multivariate phenotypes. We applied USAT to data from three correlated traits collected on 5, 816 Caucasian individuals from the Atherosclerosis Risk in Communities (ARIC, The ARIC Investigators [ ]) Study and detected some interesting associations.

[1]  Charles F. Bearden,et al.  A Nondegenerate Code of Deleterious Variants in Mendelian Loci Contributes to Complex Disease Risk , 2013, Cell.

[2]  Manuel A. R. Ferreira,et al.  A gene-based test of association using canonical correlation analysis , 2012, Bioinform..

[3]  P. O’Reilly,et al.  MultiPhen: Joint Model of Multiple Phenotypes Can Increase Discovery in GWAS , 2012, PloS one.

[4]  L. Kiemeney,et al.  A Comparison of Multivariate Genome-Wide Association Methods , 2014, PloS one.

[5]  Huan Liu,et al.  A new chi-square approximation to the distribution of non-negative definite quadratic forms in non-central normal variables , 2009, Comput. Stat. Data Anal..

[6]  Keith E. Muller,et al.  Practical methods for computing power in testing the multivariate general linear hypothesis , 1984 .

[7]  Claude Bouchard,et al.  A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance , 2012, Nature Genetics.

[8]  William G. Iacono,et al.  A Rapid Gene-Based Genome-Wide Association Test with Multivariate Traits , 2013, Human Heredity.

[9]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[10]  Yusuke Nakamura,et al.  A genome-wide association study in the Japanese population identifies susceptibility loci for type 2 diabetes at UBE2E2 and C2CD4A-C2CD4B , 2010, Nature Genetics.

[11]  Conor V. Dolan,et al.  TATES: Efficient Multivariate Genotype-Phenotype Analysis for Genome-Wide Association Studies , 2013, PLoS genetics.

[12]  Manuel A. R. Ferreira,et al.  Genetics and population analysis A multivariate test of association , 2009 .

[13]  Alex Doney,et al.  Genetic variation in GIPR influences the glucose and insulin responses to an oral glucose challenge , 2010, Nature Genetics.

[14]  Tanya M. Teslovich,et al.  Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes , 2012, Nature Genetics.

[15]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[16]  Qiong Yang,et al.  Methods for Analyzing Multivariate Phenotypes in Genetic Association Studies. , 2012, Journal of probability and statistics.

[17]  Yan V. Sun,et al.  A Bivariate Genome-Wide Approach to Metabolic Syndrome , 2011, Diabetes.

[18]  Jin-Ting Zhang Approximate and Asymptotic Distributions of Chi-Squared–Type Mixtures With Applications , 2005 .

[19]  Peter Kraft,et al.  Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies. , 2014, American journal of human genetics.

[20]  Hong-Wen Deng,et al.  Univariate/Multivariate Genome-Wide Association Scans Using Data from Families and Unrelated Samples , 2009, PloS one.

[21]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[22]  Wei Pan,et al.  Asymptotic tests of association with multiple SNPs in linkage disequilibrium , 2009, Genetic epidemiology.

[23]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[24]  F. Cambien,et al.  Genetics of Venous Thrombosis: Insights from a New Genome Wide Association Study , 2011, PloS one.

[25]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[26]  Aric Invest The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators , 1989 .

[27]  M. Stephens A Unified Framework for Association Analysis with Multiple Related Phenotypes , 2013, PloS one.

[28]  R. Wing,et al.  Human Cardiovascular Disease IBC Chip-Wide Association with Weight Loss and Weight Regain in the Look AHEAD Trial , 2013, Human Heredity.

[29]  Xihong Lin,et al.  Optimal tests for rare variant effects in sequencing association studies. , 2012, Biostatistics.

[30]  Fabian J Theis,et al.  Novel genetic associations with serum level metabolites identified by phenotype set enrichment analyses. , 2014, Human molecular genetics.

[31]  Paulo Mazzafera,et al.  An Arabidopsis Mitochondrial Uncoupling Protein Confers Tolerance to Drought and Salt Stress in Transgenic Tobacco Plants , 2011, PloS one.

[32]  Tanya M. Teslovich,et al.  Large-scale association analyses identify new loci influencing glycemic traits and provide insight into the underlying biological pathways , 2012, Nature Genetics.

[33]  P. Elliott,et al.  Genetic and Functional Assessment of the Role of the rs13431652-A and rs573225-A Alleles in the G6PC2 Promoter That Are Strongly Associated With Elevated Fasting Glucose Levels , 2010, Diabetes.

[34]  Christian Gieger,et al.  New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk , 2010, Nature Genetics.

[35]  A. Folsom,et al.  The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators. , 1989, American journal of epidemiology.

[36]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[37]  Fei Gao,et al.  An integrated epigenomic analysis for type 2 diabetes susceptibility loci in monozygotic twins , 2014, Nature Communications.

[38]  T. Hansen,et al.  The diabetogenic VPS13C/C2CD4A/C2CD4B rs7172432 variant impairs glucose-stimulated insulin response in 5,722 non-diabetic Danish individuals , 2011, Diabetologia.

[39]  L. Almasy,et al.  Genetic susceptibility to thrombosis and its relationship to physiological risk factors: the GAIT study. Genetic Analysis of Idiopathic Thrombophilia. , 2000, American journal of human genetics.

[40]  Arnab Maity,et al.  Multivariate Phenotype Association Analysis by Marker‐Set Kernel Machine Regression , 2012, Genetic epidemiology.

[41]  Bjarni J. Vilhjálmsson,et al.  A mixed-model approach for genome-wide association studies of correlated traits in structured populations , 2012, Nature Genetics.