Variable set enrichment analysis in genome-wide association studies

Complex diseases such as hypertension are inherently multifactorial and involve many factors of mild-to-minute effect sizes. A genome-wide association study (GWAS) typically tests hundreds of thousands of single-nucleotide polymorphisms (SNPs), and offers opportunity to evaluate aggregated effects of many genetic variants with effects that are too small to detect individually. The gene-set-enrichment analysis (GSEA) is a pathway-based approach that tests for such aggregated effects of genes that are linked by biological functions. A key step in GSEA is the summary statistic (gene score) used to measure the overall relevance of a gene based on all SNPs tested in the gene. Existing GSEA methods use maximum statistics sensitive to gene size and linkage equilibrium. We propose the approach of variable set enrichment analysis (VSEA) and study new gene score methods that are less dependent on gene size. The new method treats groups of variables (SNPs or other variants) as base units for summarizing gene scores and relies less on gene definition itself. The power of VSEA is analyzed by simulation studies modeling various scenarios of complex multiloci interactions. Results show that the new gene scores generally performed better, some substantially so, than existing GSEA extension to GWAS. The new methods are implemented in an R package and when applied to a real GWAS data set demonstrated its practical utility in a GWAS setting.

[1]  A. Butte,et al.  Coordinated reduction of genes of oxidative metabolism in humans with insulin resistance and diabetes: Potential role of PGC1 and NRF1 , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  F. Razak,et al.  Impaired mitochondrial activity in the insulin-resistant offspring of patients with type 2 diabetes. Petersen KF, Dufour S, Befroy D, Garcia R, Shulman GI. N Engl J Med 2004; 350: 664-71. , 2004, Vascular medicine.

[3]  K. Lange,et al.  Prioritizing GWAS results: A review of statistical methods and recommendations for their application. , 2010, American journal of human genetics.

[4]  E. Lander,et al.  Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease , 2003, Nature Genetics.

[5]  Joshua T. Burdick,et al.  Mapping determinants of human gene expression by regional and genome-wide association , 2005, Nature.

[6]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[7]  Thomas A Trikalinos,et al.  Genetic associations in large versus small studies: an empirical assessment , 2003, The Lancet.

[8]  Fengzhu Sun,et al.  BMC Bioinformatics BioMed Central Methodology article Testing gene set enrichment for subset of genes: Sub-GSE , 2008 .

[9]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[10]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[11]  D. Chasman On the utility of gene set methods in genomewide association studies of quantitative traits , 2008, Genetic epidemiology.

[12]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[13]  Francis S Collins,et al.  A HapMap harvest of insights into the genetics of common disease. , 2008, The Journal of clinical investigation.

[14]  K. Petersen,et al.  Impaired mitochondrial activity in the insulin-resistant offspring of patients with type 2 diabetes. , 2004, The New England journal of medicine.

[15]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[16]  P. Rosenberg,et al.  Pathway analysis by adaptive combination of P‐values , 2009, Genetic epidemiology.

[17]  David M Herrington,et al.  Relevance of Genetics and Genomics for Prevention and Treatment of Cardiovascular Disease: A Scientific Statement From the American Heart Association Council on Epidemiology and Prevention, the Stroke Council, and the Functional Genomics and Translational Biology Interdisciplinary Working Group , 2007, Circulation.

[18]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[19]  R. Altman,et al.  Finding haplotype tagging SNPs by use of principal components analysis. , 2004, American journal of human genetics.

[20]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[21]  P. Sorlie,et al.  The Burden of Adult Hypertension in the United States 1999 to 2000: A Rising Tide , 2004, Hypertension.

[22]  Kai Wang,et al.  Pathway-based approaches for analysis of genomewide association studies. , 2007, American journal of human genetics.

[23]  K. Rohde,et al.  Entropy as a Measure for Linkage Disequilibrium over Multilocus Haplotype Blocks , 2003, Human Heredity.

[24]  Lee-Jen Wei,et al.  Combining Association Tests across Multiple Genetic Markers in Case-Control Studies , 2007, Human Heredity.

[25]  Tyson A. Clark,et al.  Evaluation of genetic variation contributing to differences in gene expression between populations. , 2008, American journal of human genetics.

[26]  Z. Meng,et al.  Selection of genetic markers for association analyses, using linkage disequilibrium and haplotypes. , 2003, American journal of human genetics.

[27]  Margit Burmeister,et al.  Genetical genomics: combining genetics with gene expression analysis. , 2005, Human molecular genetics.

[28]  John P A Ioannidis,et al.  Genetic associations: false or true? , 2003, Trends in molecular medicine.

[29]  Momiao Xiong,et al.  Gene and pathway-based second-wave analysis of genome-wide association studies , 2010, European Journal of Human Genetics.

[30]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..