PSEA: Phenotype Set Enrichment Analysis—A New Method for Analysis of Multiple Phenotypes

Most genome‐wide association studies (GWAS) are restricted to one phenotype, even if multiple related or unrelated phenotypes are available. However, an integrated analysis of multiple phenotypes can provide insight into their shared genetic basis and may improve the power of association studies. We present a new method, called “phenotype set enrichment analysis” (PSEA), which uses ideas of gene set enrichment analysis for the investigation of phenotype sets. PSEA combines statistics of univariate phenotype analyses and tests by permutation. It does not only allow analyzing predefined phenotype sets, but also to identify new phenotype sets. Apart from the application to situations where phenotypes and genotypes are available for each person, the method was adjusted to the analysis of GWAS summary statistics. PSEA was applied to data from the population‐based cohort KORA F4 (N = 1,814) using iron‐related and blood count traits. By confirming associations previously found in large meta‐analyses on these traits, PSEA was shown to be a reliable tool. Many of these associations were not detectable by GWAS on single phenotypes in KORA F4. Therefore, the results suggest that PSEA can be more powerful than a single phenotype GWAS for the identification of association with multiple phenotypes. PSEA is a valuable method for analysis of multiple phenotypes, which can help to understand phenotype networks. Its flexible design enables both the use of prior knowledge and the generation of new knowledge on connection of multiple phenotypes. A software program for PSEA based on GWAS results is available upon request.

[1]  Kathryn Roeder,et al.  Pleiotropy and principal components of heritability combine to increase power for association analysis , 2008, Genetic epidemiology.

[2]  Korbinian Strimmer,et al.  BMC Bioinformatics BioMed Central Methodology article A general modular framework for gene set enrichment analysis , 2009 .

[3]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[4]  Christian Gieger,et al.  A genome-wide meta-analysis identifies 22 loci associated with eight hematological parameters in the HaemGen consortium , 2009, Nature Genetics.

[5]  Toshiko Tanaka,et al.  A genome-wide association analysis of serum iron concentrations. , 2010, Blood.

[6]  Gonçalo Abecasis,et al.  Genome-wide association study identifies variants in TMPRSS6 associated with hemoglobin levels , 2009, Nature Genetics.

[7]  Ayellet V. Segrè,et al.  Common Inherited Variation in Mitochondrial Genes Is Not Enriched for Associations with Type 2 Diabetes or Related Glycemic Traits , 2010, PLoS genetics.

[8]  H. Deng,et al.  Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations , 2009, Genetic epidemiology.

[9]  Kai Wang,et al.  Pathway-based approaches for analysis of genomewide association studies. , 2007, American journal of human genetics.

[10]  Toshiko Tanaka,et al.  Novel association to the proprotein convertase PCSK7 gene locus revealed by analysing soluble transferrin receptor (sTfR) levels. , 2011, Human molecular genetics.

[11]  L. Zon,et al.  Hematopoiesis: An Evolving Paradigm for Stem Cell Biology , 2008, Cell.

[12]  M. Stephens,et al.  High-Resolution Mapping of Expression-QTLs Yields Insight into Human Gene Regulation , 2008, PLoS genetics.

[13]  P. O'Brien Procedures for comparing samples with multiple endpoints. , 1984, Biometrics.

[14]  Mayetri Gupta,et al.  Identification of homogeneous genetic architecture of multiple genetically correlated traits by block clustering of genome‐wide associations , 2011, Journal of bone and mineral research : the official journal of the American Society for Bone and Mineral Research.

[15]  Leena Peltonen,et al.  Variants in TF and HFE explain approximately 40% of genetic variation in serum-transferrin levels. , 2009, American journal of human genetics.

[16]  Yuan Chen,et al.  A new permutation strategy of pathway-based approach for genome-wide association study , 2009, BMC Bioinformatics.

[17]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[18]  C. Gieger,et al.  KORA-gen - Resource for Population Genetics, Controls and a Broad Spectrum of Disease Phenotypes , 2005 .

[19]  H. Hakonarson,et al.  Analysing biological pathways in genome-wide association studies , 2010, Nature Reviews Genetics.

[20]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[22]  Airat Bekmetjev,et al.  Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16 , 2009, BMC proceedings.

[23]  Manuel A. R. Ferreira,et al.  Genetics and population analysis A multivariate test of association , 2009 .

[24]  Qiong Yang,et al.  Analyze multivariate phenotypes in genetic association studies by combining univariate association tests , 2010, Genetic epidemiology.

[25]  Christopher G. Chute,et al.  A Genome-Wide Association Study of Red Blood Cell Traits Using the Electronic Medical Record , 2010, PloS one.

[26]  Jie Huang,et al.  PRIMe: a method for characterization and evaluation of pleiotropic regions from multiple genome-wide association studies , 2011, Bioinform..