Complex systems analysis of bladder cancer susceptibility reveals a role for decarboxylase activity in two genome-wide association studies

BackgroundBladder cancer is common disease with a complex etiology that is likely due to many different genetic and environmental factors. The goal of this study was to embrace this complexity using a bioinformatics analysis pipeline designed to use machine learning to measure synergistic interactions between single nucleotide polymorphisms (SNPs) in two genome-wide association studies (GWAS) and then to assess their enrichment within functional groups defined by Gene Ontology. The significance of the results was evaluated using permutation testing and those results that replicated between the two GWAS data sets were reported.ResultsIn the first step of our bioinformatics pipeline, we estimated the pairwise synergistic effects of SNPs on bladder cancer risk in both GWAS data sets using Multifactor Dimensionality Reduction (MDR) machine learning method that is designed specifically for this purpose. Statistical significance was assessed using a 1000-fold permutation test. Each single SNP was assigned a p-value based on its strongest pairwise association. Each SNP was then mapped to one or more genes using a window of 500 kb upstream and downstream from each gene boundary. This window was chosen to capture as many regulatory variants as possible. Using Exploratory Visual Analysis (EVA), we then carried out a gene set enrichment analysis at the gene level to identify those genes with an overabundance of significant SNPs relative to the size of their mapped regions. Each gene was assigned to a biological functional group defined by Gene Ontology (GO). We next used EVA to evaluate the overabundance of significant genes in biological functional groups. Our study yielded one GO category, carboxy-lysase activity (GO:0016831), that was significant in analyses from both GWAS data sets. Interestingly, only the gamma-glutamyl carboxylase (GGCX) gene from this GO group was significant in both the detection and replication data, highlighting the complexity of the pathway-level effects on risk. The GGCX gene is expressed in the bladder, but has not been previously associated with bladder cancer in univariate GWAS. However, there is some experimental evidence that carboxy-lysase activity might play a role in cancer and that genes in this pathway should be explored as drug targets. This study provides a genetic basis for that observation.ConclusionsOur machine learning analysis of genetic associations in two GWAS for bladder cancer identified numerous associations with pairs of SNPs. Gene set enrichment analysis found aggregation of risk-associated SNPs in genes and significant genes in GO functional groups. This study supports a role for decarboxylase protein complexes in bladder cancer susceptibility. Previous research has implicated decarboxylases in bladder cancer etiology; however, the genes that we found to be significant in the detection and replication data are not known to have direct influence on bladder cancer, suggesting some novel hypotheses. This study highlights the need for a complex systems approach to the genetic and genomic analysis of common diseases such as cancer.

[1]  Andrew J. Saykin,et al.  Computational genetics analysis of grey matter density in Alzheimer’s disease , 2014, BioData Mining.

[2]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[3]  Scott M. Williams,et al.  Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. , 2005, BioEssays : news and reviews in molecular, cellular and developmental biology.

[4]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[5]  Suzanne Chambers,et al.  A meta-analysis of genome-wide association studies to identify prostate cancer susceptibility loci associated with aggressive and non-aggressive disease. , 2013, Human molecular genetics.

[6]  Scott M. Williams,et al.  Epistasis and its implications for personal genetics. , 2009, American journal of human genetics.

[7]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[8]  Jason H Moore,et al.  Epistasis analysis using multifactor dimensionality reduction. , 2015, Methods in molecular biology.

[9]  P. Woster,et al.  Polyamines and cancer: implications for chemotherapy and chemoprevention , 2013, Expert Reviews in Molecular Medicine.

[10]  Greg Gibson,et al.  Decanalization and the origin of complex disease , 2009, Nature Reviews Genetics.

[11]  Sung-Won Cho,et al.  Branched chain alpha-keto acid dehydrogenase, E1-beta subunit gene is associated with premature ovarian failure. , 2008, Fertility and sterility.

[12]  Suzanne Chambers,et al.  Seven prostate cancer susceptibility loci identified by a multi-stage genome-wide association study , 2011, Nature Genetics.

[13]  Liang Ai,et al.  Anticoagulation in combination with antiangiogenesis and chemotherapy for cancer patients: evidence and hypothesis , 2016, OncoTargets and therapy.

[14]  Yan Guo,et al.  Large-scale genetic study in East Asians identifies six new loci associated with colorectal cancer risk , 2014, Nature Genetics.

[15]  Pui-Yan Kwok,et al.  A large multiethnic genome-wide association study of prostate cancer identifies novel risk variants and substantial ethnic differences. , 2015, Cancer discovery.

[16]  Jason H. Moore,et al.  BIOINFORMATICS REVIEW , 2005 .

[17]  Jason H. Moore,et al.  Exploratory Visual Analysis of Pharmacogenomic Results , 2004, Pacific Symposium on Biocomputing.

[18]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[19]  Helga Thorvaldsdóttir,et al.  Molecular signatures database (MSigDB) 3.0 , 2011, Bioinform..

[20]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.

[21]  Jason H. Moore,et al.  Gene ontology analysis of pairwise genetic associations in two genome-wide studies of sporadic ALS , 2012, BioData Mining.

[22]  Jason H. Moore,et al.  Exploratory Visual Analysis of Statistical Results from Microarray Experiments Comparing High and Low Grade Glioma , 2007, Cancer informatics.

[23]  Deng Pan,et al.  DGIdb 2.0: mining clinically relevant drug–gene interactions , 2015, Nucleic Acids Res..