MAGMA: Generalized Gene-Set Analysis of GWAS Data

By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and geneset analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn’s Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn’s Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn’s Disease data was found to be considerably faster as well.

[1]  P. Visscher,et al.  Estimating missing heritability for disease from genome-wide association studies. , 2011, American journal of human genetics.

[2]  P. Visscher,et al.  A versatile gene-based test for genome-wide association studies. , 2010, American journal of human genetics.

[3]  Ayellet V. Segrè,et al.  Hundreds of variants clustered in genomic loci and biological pathways affect human height , 2010, Nature.

[4]  Chia-Ding Hou,et al.  A simple approximation for the distribution of the weighted combination of non-independent or independent probabilities , 2005 .

[5]  Tanya M. Teslovich,et al.  Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index , 2010 .

[6]  Marc Via i García An integrated map of genetic variation from 1,092 human genomes , 2012 .

[7]  Manuel A. R. Ferreira,et al.  Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder. , 2009, American journal of human genetics.

[8]  M C O'Donovan,et al.  Functional gene group analysis identifies synaptic gene groups as risk factor for schizophrenia , 2011, Molecular Psychiatry.

[9]  Peilin Jia,et al.  Gene set analysis of genome-wide association studies: methodological issues and perspectives. , 2011, Genomics.

[10]  Colm O'Dushlaine,et al.  INRICH: interval-based enrichment analysis for genome-wide association studies , 2012, Bioinform..

[11]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Ayellet V. Segrè,et al.  Common Inherited Variation in Mitochondrial Genes Is Not Enriched for Associations with Type 2 Diabetes or Related Glycemic Traits , 2010, PLoS genetics.

[13]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[14]  Kai Wang,et al.  Pathway-based approaches for analysis of genomewide association studies. , 2007, American journal of human genetics.

[15]  P. Visscher,et al.  Five years of GWAS discovery. , 2012, American journal of human genetics.

[16]  Johnny S. H. Kwan,et al.  HYST: a hybrid set-based test for genome-wide association studies, with application to protein-protein interaction-based association analysis. , 2012, American journal of human genetics.

[17]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[18]  Morton B. Brown 400: A Method for Combining Non-Independent, One-Sided Tests of Significance , 1975 .

[19]  A. Morris,et al.  Data quality control in genetic case-control association studies , 2010, Nature Protocols.

[20]  W. Massy Principal Components Regression in Exploratory Statistical Research , 1965 .

[21]  Tariq Ahmad,et al.  Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci , 2010, Nature Genetics.

[22]  Sathees B. C. Chandra,et al.  A review of major Crohn’s disease susceptibility genes and their role in disease pathogenesis , 2011, Genes & Genomics.

[23]  Simon C. Potter,et al.  Genome-wide Association Analysis Identifies 14 New Risk Loci for Schizophrenia , 2013, Nature Genetics.