MultiPhen: Joint Model of Multiple Phenotypes Can Increase Discovery in GWAS

The genome-wide association study (GWAS) approach has discovered hundreds of genetic variants associated with diseases and quantitative traits. However, despite clinical overlap and statistical correlation between many phenotypes, GWAS are generally performed one-phenotype-at-a-time. Here we compare the performance of modelling multiple phenotypes jointly with that of the standard univariate approach. We introduce a new method and software, MultiPhen, that models multiple phenotypes simultaneously in a fast and interpretable way. By performing ordinal regression, MultiPhen tests the linear combination of phenotypes most associated with the genotypes at each SNP, and thus potentially captures effects hidden to single phenotype GWAS. We demonstrate via simulation that this approach provides a dramatic increase in power in many scenarios. There is a boost in power for variants that affect multiple phenotypes and for those that affect only one phenotype. While other multivariate methods have similar power gains, we describe several benefits of MultiPhen over these. In particular, we demonstrate that other multivariate methods that assume the genotypes are normally distributed, such as canonical correlation analysis (CCA) and MANOVA, can have highly inflated type-1 error rates when testing case-control or non-normal continuous phenotypes, while MultiPhen produces no such inflation. To test the performance of MultiPhen on real data we applied it to lipid traits in the Northern Finland Birth Cohort 1966 (NFBC1966). In these data MultiPhen discovers 21% more independent SNPs with known associations than the standard univariate GWAS approach, while applying MultiPhen in addition to the standard approach provides 37% increased discovery. The most associated linear combinations of the lipids estimated by MultiPhen at the leading SNPs accurately reflect the Friedewald Formula, suggesting that MultiPhen could be used to refine the definition of existing phenotypes or uncover novel heritable phenotypes.

[1]  Z. Šidák ON MULTIVARIATE NORMAL PROBABILITIES OF RECTANGLES: THEIR DEPENDENCE ON CORRELATIONS' , 1968 .

[2]  R. Levy,et al.  Estimation of the concentration of low-density lipoprotein cholesterol in plasma, without use of the preparative ultracentrifuge. , 1972, Clinical chemistry.

[3]  D. Nyholt A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. , 2004, American journal of human genetics.

[4]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[5]  T. Hudson,et al.  A genome-wide association study identifies novel risk loci for type 2 diabetes , 2007, Nature.

[6]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[7]  J. Shaw,et al.  Relation between fasting glucose and retinopathy for diagnosis of diabetes: three population-based cross-sectional studies , 2008, The Lancet.

[8]  P. Macfarlane,et al.  Can metabolic syndrome usefully predict cardiovascular disease and diabetes? Outcome data from two prospective studies , 2008, The Lancet.

[9]  Kathryn Roeder,et al.  Pleiotropy and principal components of heritability combine to increase power for association analysis , 2008, Genetic epidemiology.

[10]  Eric P. Xing,et al.  A multivariate regression approach to association analysis of a quantitative trait network , 2008, Bioinform..

[11]  Ellen Kampman,et al.  Genome-wide association yields new sequence variants at seven loci that associate with measures of obesity , 2009, Nature Genetics.

[12]  Manuel A. R. Ferreira,et al.  Genetics and population analysis A multivariate test of association , 2009 .

[13]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[14]  Henrik,et al.  Association analyses of 249,796 individuals reveal eighteen new loci associated with body mass index , 2012 .

[15]  M. Neale,et al.  An integrated phenomic approach to multivariate allelic association , 2010, European Journal of Human Genetics.

[16]  Tanya M. Teslovich,et al.  Biological, Clinical, and Population Relevance of 95 Loci for Blood Lipids , 2010, Nature.

[17]  Christian Gieger,et al.  A genome-wide perspective of genetic variation in human metabolism , 2010, Nature Genetics.

[18]  Qiong Yang,et al.  Analyze multivariate phenotypes in genetic association studies by combining univariate association tests , 2010, Genetic epidemiology.

[19]  Mark I. McCarthy,et al.  Identification of an imprinted master trans-regulator at the KLF14 locus related to multiple metabolic phenotypes , 2011, Nature Genetics.

[20]  Kasper Lage,et al.  Pervasive Sharing of Genetic Effects in Autoimmune Disease , 2011, PLoS genetics.

[21]  C Kooperberg,et al.  The use of phenome‐wide association studies (PheWAS) for exploration of novel genotype‐phenotype relationships and pleiotropy discovery , 2011, Genetic epidemiology.

[22]  Christian Gieger,et al.  New gene functions in megakaryopoiesis and platelet formation , 2011, Nature.

[23]  Tom R. Gaunt,et al.  Genetic Variants in Novel Pathways Influence Blood Pressure and Cardiovascular Disease Risk , 2011, Nature.