A Powerful Procedure for Pathway-Based Meta-analysis Using Summary Statistics Identifies 43 Pathways Associated with Type II Diabetes in European Populations

Meta-analysis of multiple genome-wide association studies (GWAS) has become an effective approach for detecting single nucleotide polymorphism (SNP) associations with complex traits. However, it is difficult to integrate the readily accessible SNP-level summary statistics from a meta-analysis into more powerful multi-marker testing procedures, which generally require individual-level genetic data. We developed a general procedure called Summary based Adaptive Rank Truncated Product (sARTP) for conducting gene and pathway meta-analysis that uses only SNP-level summary statistics in combination with genotype correlation estimated from a panel of individual-level genetic data. We demonstrated the validity and power advantage of sARTP through empirical and simulated data. We conducted a comprehensive pathway-based meta-analysis with sARTP on type 2 diabetes (T2D) by integrating SNP-level summary statistics from two large studies consisting of 19,809 T2D cases and 111,181 controls with European ancestry. Among 4,713 candidate pathways from which genes in neighborhoods of 170 GWAS established T2D loci were excluded, we detected 43 T2D globally significant pathways (with Bonferroni corrected p-values < 0.05), which included the insulin signaling pathway and T2D pathway defined by KEGG, as well as the pathways defined according to specific gene expression patterns on pancreatic adenocarcinoma, hepatocellular carcinoma, and bladder carcinoma. Using summary data from 8 eastern Asian T2D GWAS with 6,952 cases and 11,865 controls, we showed 7 out of the 43 pathways identified in European populations remained to be significant in eastern Asians at the false discovery rate of 0.1. We created an R package and a web-based tool for sARTP with the capability to analyze pathways with thousands of genes and tens of thousands of SNPs.

[1]  Dan-Yu Lin,et al.  Meta-analysis of gene-level associations for rare variants based on single-variant statistics. , 2013, American journal of human genetics.

[2]  Tao Wang,et al.  Improved power by use of a weighted score test for linkage disequilibrium mapping. , 2007, American journal of human genetics.

[3]  Life Technologies,et al.  A map of human genome variation from population-scale sequencing , 2011 .

[4]  P. Visscher,et al.  A versatile gene-based test for genome-wide association studies. , 2010, American journal of human genetics.

[5]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[6]  Faming Liang,et al.  A fast multilocus test with adaptive SNP selection for large-scale genetic-association studies , 2013, European Journal of Human Genetics.

[7]  Yongchao Ge Resampling-based Multiple Testing for Microarray Data Analysis , 2003 .

[8]  Laura J. Scott,et al.  Psychiatric genome-wide association study analyses implicate neuronal, immune and histone pathways , 2015, Nature Neuroscience.

[9]  Ayellet V. Segrè,et al.  Common Inherited Variation in Mitochondrial Genes Is Not Enriched for Associations with Type 2 Diabetes or Related Glycemic Traits , 2010, PLoS genetics.

[10]  J. Ioannidis,et al.  Type 2 diabetes and cancer: umbrella review of meta-analyses of observational studies , 2015, BMJ : British Medical Journal.

[11]  Y. J. Kim,et al.  Genome-wide association studies in the Japanese population identify seven novel loci for type 2 diabetes , 2016, Nature Communications.

[12]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[13]  William Wheeler,et al.  A fast and powerful tree-based association test for detecting complex joint effects in case-control studies , 2014, Bioinform..

[14]  Tanya M. Teslovich,et al.  Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index , 2010 .

[15]  Kai Wang,et al.  ATOM: a powerful gene-based association test by combining optimally weighted markers , 2009, Bioinform..

[16]  Jason H. Moore,et al.  BIOINFORMATICS REVIEW , 2005 .

[17]  Hui Guo,et al.  VSEAMS: a pipeline for variant set enrichment analysis using summary GWAS data identifies IKZF3, BATF and ESRRA as key transcription factors in type 1 diabetes , 2014, Bioinform..

[18]  L. Wasserman,et al.  Genomic control, a new approach to genetic-based association studies. , 2001, Theoretical population biology.

[19]  Wei Lu,et al.  Meta-analysis of genome-wide association studies identifies eight new loci for type 2 diabetes in east Asians , 2011, Nature Genetics.

[20]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[21]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[22]  Jonathan Schug,et al.  Two novel type 2 diabetes loci revealed through integration of TCF7L2 DNA occupancy and SNP association data , 2014, BMJ Open Diabetes Research and Care.

[23]  Derek Y. Chiang,et al.  Integrative transcriptome analysis reveals common molecular subclasses of human hepatocellular carcinoma. , 2009, Cancer research.

[24]  Rui Li,et al.  An integrated data analysis approach to characterize genes highly expressed in hepatocellular carcinoma , 2005, Oncogene.

[25]  S. MacGregor,et al.  VEGAS2: Software for More Flexible Gene-Based Testing , 2014, Twin Research and Human Genetics.

[26]  Johnny S. H. Kwan,et al.  HYST: a hybrid set-based test for genome-wide association studies, with application to protein-protein interaction-based association analysis. , 2012, American journal of human genetics.

[27]  D. Zeng,et al.  On the relative efficiency of using summary statistics versus individual-level data in meta-analysis. , 2010, Biometrika.

[28]  Tanya M. Teslovich,et al.  Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility , 2014, Nature Genetics.

[29]  Pui-Yan Kwok,et al.  Design and coverage of high throughput genotyping arrays optimized for individuals of East Asian, African American, and Latino race/ethnicity using imputation and a novel hybrid SNP selection algorithm. , 2011, Genomics.

[30]  W. Pan,et al.  A Powerful Pathway-Based Adaptive Test for Genetic Association with Common or Rare Variants. , 2015, American journal of human genetics.

[31]  Kaanan P. Shah,et al.  A gene-based association method for mapping traits using reference transcriptome data , 2015, Nature Genetics.

[32]  P. Visscher,et al.  Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits , 2012, Nature Genetics.

[33]  Simon Cawley,et al.  Next generation genome-wide association tool: design and coverage of a high-throughput European-optimized SNP array. , 2011, Genomics.

[34]  Tanya M. Teslovich,et al.  Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes , 2012, Nature Genetics.

[35]  Lin S. Chen,et al.  Insights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data. , 2010, American journal of human genetics.

[36]  Marina Evangelou,et al.  Comparison of Methods for Competitive Tests of Pathway Analysis , 2012, PloS one.

[37]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[38]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[39]  Daniel Marbach,et al.  Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics , 2016, PLoS Comput. Biol..

[40]  Scott M. Williams,et al.  challenges for genome-wide association studies , 2010 .

[41]  Deanne M. Taylor,et al.  Powerful SNP-set analysis for case-control genome-wide association studies. , 2010, American journal of human genetics.

[42]  Marina Evangelou,et al.  A Method for Gene-Based Pathway Analysis Using Genomewide Association Study Summary Statistics Reveals Nine New Type 1 Diabetes Associations , 2014, Genetic epidemiology.

[43]  M. Daly,et al.  LD Score regression distinguishes confounding from polygenicity in genome-wide association studies , 2014, Nature Genetics.

[44]  P. Rosenberg,et al.  Pathway analysis by adaptive combination of P‐values , 2009, Genetic epidemiology.

[45]  N. Schork,et al.  Generalized genomic distance-based regression methodology for multilocus association analysis. , 2006, American journal of human genetics.

[46]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[47]  Wei Pan,et al.  Adaptive gene- and pathway-trait association testing with GWAS summary statistics , 2016, Bioinform..

[48]  P. McKeigue,et al.  Genome-wide association study of type 2 diabetes in a sample from Mexico City and a meta-analysis of a Mexican-American sample from Starr County, Texas , 2011, Diabetologia.

[49]  H. Hakonarson,et al.  Analysing biological pathways in genome-wide association studies , 2010, Nature Reviews Genetics.

[50]  B Müller-Myhsok,et al.  Rapid simulation of P values for product methods and multiple-testing adjustment in association studies. , 2005, American journal of human genetics.

[51]  Chia-Ing Li,et al.  Cancer risks among patients with type 2 diabetes: a 10-year follow-up study of a nationwide population-based cohort in Taiwan , 2014, BMC Cancer.

[52]  D. Blacker,et al.  Properties of permutation-based gene tests and controlling type 1 error using a summary statistic based gene test , 2013, BMC Genetics.