HisCoM-GGI: Hierarchical structural component analysis of gene-gene interactions

Although genome-wide association studies (GWAS) have successfully identified thousands of single nucleotide polymorphisms (SNPs) associated with common diseases, these observations are limited for fully explaining "missing heritability". Determining gene-gene interactions (GGI) are one possible avenue for addressing the missing heritability problem. While many statistical approaches have been proposed to detect GGI, most of these focus primarily on SNP-to-SNP interactions. While there are many advantages of gene-based GGI analyses, such as reducing the burden of multiple-testing correction, and increasing power by aggregating multiple causal signals across SNPs in specific genes, only a few methods are available. In this study, we proposed a new statistical approach for gene-based GGI analysis, "Hierarchical structural CoMponent analysis of Gene-Gene Interactions" (HisCoM-GGI). HisCoM-GGI is based on generalized structured component analysis, and can consider hierarchical structural relationships between genes and SNPs. For a pair of genes, HisCoM-GGI first effectively summarizes all possible pairwise SNP-SNP interactions into a latent variable, from which it then performs GGI analysis. HisCoM-GGI can evaluate both gene-level and SNP-level interactions. Through simulation studies, HisCoM-GGI demonstrated higher statistical power than existing gene-based GGI methods, in analyzing a GWAS of a Korean population for identifying GGI associated with body mass index. Resultantly, HisCoM-GGI successfully identified 14 potential GGI, two of which, (NCOR2 × SPOCK1) and (LINGO2 × ZNF385D) were successfully replicated in independent datasets. We conclude that HisCoM-GGI method may be a valuable tool for genome to identify GGI in missing heritability, allowing us to better understand the biological genetic mechanisms of complex traits. We conclude that HisCoM-GGI method may be a valuable tool for genome to identify GGI in missing heritability, allowing us to better understand biological genetic mechanisms of complex traits. An implementation of HisCoM-GGI can be downloaded from the website ( http://statgen.snu.ac.kr/software/hiscom-ggi ).

[1]  S. Soong,et al.  Variable selection in logistic regression for detecting SNP–SNP interactions: the rheumatoid arthritis example , 2008, European Journal of Human Genetics.

[2]  E. Tai,et al.  Genome-wide association studies in East Asians identify new loci for waist-hip ratio and waist circumference , 2016, Scientific Reports.

[3]  B S Weir,et al.  Truncated product method for combining P‐values , 2002, Genetic epidemiology.

[4]  M. L. Calle,et al.  Improving strategies for detecting genetic patterns of disease susceptibility in association studies , 2008, Statistics in medicine.

[5]  Taesung Park,et al.  A large-scale genome-wide association study of Asian populations uncovers genetic factors influencing eight quantitative traits , 2009, Nature Genetics.

[6]  S. Horvath,et al.  Variations in DNA elucidate molecular networks that cause disease , 2008, Nature.

[7]  Taesung Park,et al.  Log-linear model-based multifactor dimensionality reduction method to detect gene-gene interactions , 2007, Bioinform..

[8]  Jason H. Moore,et al.  Missing heritability and strategies for finding the underlying causes of complex disease , 2010, Nature Reviews Genetics.

[9]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[10]  Min-Seok Kwon,et al.  A Modified Entropy-Based Approach for Identifying Gene-Gene Interactions in Case-Control Study , 2013, PloS one.

[11]  Ku Chee Seng,et al.  The success of the genome-wide association approach: a brief story of a long struggle , 2008, European Journal of Human Genetics.

[12]  Sungkyoung Choi,et al.  Pathway-based approach using hierarchical components of collapsed rare variants , 2016, Bioinform..

[13]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[14]  Seungyeoun Lee,et al.  Gene-Gene Interaction Analysis for the Accelerated Failure Time Model Using a Unified Model-Based Multifactor Dimensionality Reduction Method , 2016, Genomics & informatics.

[15]  Taesung Park,et al.  Identification of multiple gene-gene interactions for ordinal phenotypes , 2013, BMC Medical Genomics.

[16]  P. Phillips Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems , 2008, Nature Reviews Genetics.

[17]  Qiang Yang,et al.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies , 2010, American journal of human genetics.

[18]  P. Visscher,et al.  A versatile gene-based test for genome-wide association studies. , 2010, American journal of human genetics.

[19]  Mario Cortina-Borja,et al.  Discovery by the Epistasis Project of an epistatic interaction between the GSTM3 gene and the HHEX/IDE/KIF11 locus in the risk of Alzheimer's disease , 2013, Neurobiology of Aging.

[20]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[21]  Taesung Park,et al.  A novel fuzzy set based multifactor dimensionality reduction method for detecting gene-gene interaction , 2016, Comput. Biol. Chem..

[22]  Bjarni V. Halldórsson,et al.  Many sequence variants affecting diversity of adult human height , 2008, Nature Genetics.

[23]  Heungsun Hwang,et al.  Generalized Structured Component Analysis with Latent Interactions , 2010 .

[24]  Taesung Park,et al.  Odds ratio based multifactor-dimensionality reduction method for detecting gene – gene interactions , 2006 .

[25]  Chris S. Haley,et al.  Detecting epistasis in human complex traits , 2014, Nature Reviews Genetics.

[26]  Y. Takane,et al.  Generalized structured component analysis , 2004 .

[27]  Taesung Park,et al.  An empirical fuzzy multifactor dimensionality reduction method for detecting gene-gene interactions , 2017, BMC Genomics.

[28]  Kai Wang,et al.  Pathway-based approaches for analysis of genomewide association studies. , 2007, American journal of human genetics.

[29]  T. Maruyama,et al.  A low-frequency GLIS3 variant associated with resistance to Japanese type 1 diabetes. , 2013, Biochemical and Biophysical Research Communications - BBRC.

[30]  A. Drago,et al.  Insight gained from genome-wide interaction and enrichment analysis on weight gain during citalopram treatment , 2017, Neuroscience Letters.

[31]  Andrew G. Clark,et al.  Gene-Based Testing of Interactions in Association Studies of Quantitative Traits , 2013, PLoS genetics.

[32]  Jing Li,et al.  Detecting gene-gene interactions using a permutation-based random forest method , 2016, BioData Mining.

[33]  Wei Lu,et al.  Meta-analysis of genome-wide association studies identifies eight new loci for type 2 diabetes in east Asians , 2011, Nature Genetics.

[34]  Jun Zhu,et al.  A generalized combinatorial approach for detecting gene-by-gene and gene-by-environment interactions with application to nicotine dependence. , 2007, American journal of human genetics.

[35]  D. Nyholt A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. , 2004, American journal of human genetics.

[36]  Heungsun Hwang,et al.  An extended redundancy analysis and its applications to two practical examples , 2005, Comput. Stat. Data Anal..

[37]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[38]  Momiao Xiong,et al.  A Novel Statistic for Genome-Wide Interaction Analysis , 2010, PLoS genetics.

[39]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[40]  M. A. Kelly,et al.  Effects of 16 Genetic Variants on Fasting Glucose and Type 2 Diabetes in South Asians: ADCY5 and GLIS3 Variants May Predispose to Type 2 Diabetes , 2011, PloS one.

[41]  L. Letenneur,et al.  A genome-wide search for common SNP x SNP interactions on the risk of venous thrombosis , 2013, BMC Medical Genetics.

[42]  Runze Li,et al.  A FAST ALGORITHM FOR DETECTING GENE-GENE INTERACTIONS IN GENOME-WIDE ASSOCIATION STUDIES. , 2014, The annals of applied statistics.

[43]  Xinwei Deng,et al.  Estimation in high-dimensional linear models with deterministic design matrices , 2012, 1206.0847.

[44]  Ingo Ruczinski,et al.  Detection of SNP‐SNP interactions in trios of parents with schizophrenic children , 2010, Genetic epidemiology.

[45]  K. Lunetta,et al.  Correction for multiple testing in a gene region , 2013, European Journal of Human Genetics.

[46]  Yan V. Sun,et al.  Identification of epistatic effects using a protein-protein interaction database. , 2010, Human molecular genetics.

[47]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[48]  R. Elston,et al.  Identification of gene‐gene interactions in the presence of missing data using the multifactor dimensionality reduction method , 2009, Genetic epidemiology.

[49]  Xingguang Luo,et al.  Genetic variants in the CPNE5 gene are associated with alcohol dependence and obesity in Caucasian populations. , 2015, Journal of psychiatric research.

[50]  Heungsun Hwang,et al.  Nonlinear Generalized Structured Component Analysis , 2010 .

[51]  Jason H. Moore,et al.  Renin-angiotensin system gene polymorphisms and coronary artery disease in a large angiographic cohort: detection of high order gene-gene interaction. , 2007, Atherosclerosis.

[52]  S. Baron-Cohen,et al.  Genetic variation in GABRB3 is associated with Asperger syndrome and multiple endophenotypes relevant to autism , 2013, Molecular Autism.

[53]  J. H. Moore,et al.  Multifactor-dimensionality reduction shows a two-locus interaction associated with Type 2 diabetes mellitus , 2004, Diabetologia.

[54]  Eden R Martin,et al.  A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms , 2008, Genetic epidemiology.

[55]  Y. Takane,et al.  MULTILEVEL GENERALIZED STRUCTURED COMPONENT ANALYSIS , 2007 .

[56]  Claude Bouchard,et al.  Genome-wide physical activity interactions in adiposity. A meta-analysis of 200,452 adults , 2017 .

[57]  Jun S. Liu,et al.  Bayesian inference of epistatic interactions in case-control studies , 2007, Nature Genetics.

[58]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[59]  A. Dewan,et al.  Exhaustive Genome-Wide Search for SNP-SNP Interactions Across 10 Human Diseases , 2016, G3: Genes, Genomes, Genetics.

[60]  Seungyeoun Lee,et al.  A unified model based multifactor dimensionality reduction framework for detecting gene-gene interactions , 2016, Bioinform..

[61]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[62]  J. Cheverud,et al.  A simple correction for multiple comparisons in interval mapping genome scans , 2001, Heredity.

[63]  Yao-Hwei Fang,et al.  SVM‐Based Generalized Multifactor Dimensionality Reduction Approaches for Detecting Gene‐Gene Interactions in Family Studies , 2012, Genetic epidemiology.

[64]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[65]  Joseph E. Powell,et al.  Detection and replication of epistasis influencing transcription in humans , 2014, Nature.

[66]  R. Loos,et al.  Variants in GLIS3 and CRY2 Are Associated with Type 2 Diabetes and Impaired Fasting Glucose in Chinese Hans , 2011, PloS one.

[67]  Taesung Park,et al.  IGENT: efficient entropy based algorithm for genome-wide gene-gene interaction analysis , 2014, BMC Medical Genomics.

[68]  P. Donnelly,et al.  Genome-wide strategies for detecting multiple loci that influence complex diseases , 2005, Nature Genetics.

[69]  Heungsun Hwang,et al.  Regularized Generalized Structured Component Analysis , 2009 .

[70]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[71]  J. Li,et al.  Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix , 2005, Heredity.

[72]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[73]  Taesung Park,et al.  Multivariate Quantitative Multifactor Dimensionality Reduction for Detecting Gene-Gene Interactions , 2015, Human Heredity.

[74]  TaeHyung Kim,et al.  JBASE: Joint Bayesian Analysis of Subphenotypes and Epistasis , 2015, Bioinform..

[75]  J. Hein,et al.  Using biological networks to search for interacting loci in genome-wide association studies , 2009, European Journal of Human Genetics.

[76]  Association of the LINGO2-related SNP rs10968576 with body mass in a cohort of elderly Swedes , 2015, Molecular Genetics and Genomics.

[77]  N. Galwey,et al.  A new measure of the effective number of tests, a practical tool for comparing families of non‐independent significance tests , 2009, Genetic epidemiology.

[78]  Yijun Zuo,et al.  A powerful truncated tail strength method for testing multiple null hypotheses in one dataset. , 2011, Journal of theoretical biology.

[79]  Taesung Park,et al.  Multivariate generalized multifactor dimensionality reduction to detect gene-gene interactions , 2013, BMC Systems Biology.

[80]  Qiang Yang,et al.  Predictive rule inference for epistatic interaction detection in genome-wide association studies , 2010, Bioinform..

[81]  Johnny S. H. Kwan,et al.  GATES: a rapid and powerful gene-based association test using extended Simes procedure. , 2011, American journal of human genetics.

[82]  Shyh-Huei Chen,et al.  A support vector machine approach for detecting gene‐gene interaction , 2008, Genetic epidemiology.

[83]  Li Wang,et al.  Evaluation of random forests performance for genome-wide association studies in the presence of interaction effects , 2009, BMC proceedings.

[84]  Seungyeoun Lee,et al.  Gene–gene interaction analysis for the survival phenotype based on the Cox model , 2012, Bioinform..

[85]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.