Machine Learning–Based Gene Prioritization Identifies Novel Candidate Risk Genes for Inflammatory Bowel Disease

Background: The inflammatory bowel diseases (IBDs) are chronic inflammatory disorders, associated with genetic, immunologic, and environmental factors. Although hundreds of genes are implicated in IBD etiology, it is likely that additional genes play a role in the disease process. We developed a machine learning–based gene prioritization method to identify novel IBD-risk genes. Methods: Known IBD genes were collected from genome-wide association studies and annotated with expression and pathway information. Using these genes, a model was trained to identify IBD-risk genes. A comprehensive list of 16,390 genes was then scored and classified. Results: Immune and inflammatory responses, as well as pathways such as cell adhesion, cytokine–cytokine receptor interaction, and sulfur metabolism were identified to be related to IBD. Scores predicted for IBD genes were significantly higher than those for non-IBD genes (P < 10−20). There was a significant association between the score and having an IBD publication (P < 10−20). Overall, 347 genes had a high prediction score (>0.8). A literature review of the genes, excluding those used to train the model, identified 67 genes without any publication concerning IBD. These genes represent novel candidate IBD-risk genes, which can be targeted in future studies. Conclusions: Our method successfully differentiated IBD-risk genes from non-IBD genes by using information from expression data and a multitude of gene annotations. Crucial features were defined, and we were able to detect novel candidate risk genes for IBD. These findings may help detect new IBD-risk genes and improve the understanding of IBD pathogenesis.

[1]  Dennis B. Troup,et al.  NCBI GEO: archive for high-throughput functional genomic data , 2008, Nucleic Acids Res..

[2]  G Van Assche,et al.  Mucosal gene signatures to predict response to infliximab in patients with ulcerative colitis , 2009, Gut.

[3]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[4]  W. Stenson The universe of arachidonic acid metabolites in inflammatory bowel disease: can we tell the good from the bad? , 2014, Current opinion in gastroenterology.

[5]  R. Xavier,et al.  Genetics and pathogenesis of inflammatory bowel disease , 2011, Nature.

[6]  Tsuyoshi Konishi,et al.  Gene Expression Signature and the Prediction of Ulcerative Colitis–Associated Colorectal Cancer by DNA Microarray , 2007, Clinical Cancer Research.

[7]  Judy H. Cho,et al.  Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations , 2015, Nature Genetics.

[8]  H. Gaskins,et al.  Microbial pathways in colonic sulfur metabolism and links with health and disease , 2012, Front. Physio..

[9]  G. Bouma,et al.  The immunological and genetic basis of inflammatory bowel disease , 2003, Nature Reviews Immunology.

[10]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[11]  R. Pounder,et al.  Genetics versus environment in inflammatory bowel disease: results of a British twin study , 1996, BMJ.

[12]  Yaniv Erlich,et al.  Exome sequencing and disease-network analysis of a single family implicate a mutation in KIF1A in hereditary spastic paraparesis. , 2011, Genome research.

[13]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[14]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[15]  J. Yamamoto-Furusho,et al.  Role of cytokines in inflammatory bowel disease. , 2008, World journal of gastroenterology.

[16]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[17]  Tanya M. Teslovich,et al.  Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility , 2014, Nature Genetics.

[18]  Yong Cui,et al.  Genetic susceptibility to SLE: recent progress from GWAS. , 2013, Journal of autoimmunity.

[19]  O. Nielsen,et al.  Involvement of JAK/STAT signaling in the pathogenesis of inflammatory bowel disease. , 2013, Pharmacological research.

[20]  A. Hughes Consistent across-tissue signatures of differential gene expression in Crohn's disease , 2005, Immunogenetics.

[21]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[22]  Chuong B. Do,et al.  Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson’s disease , 2014, Nature Genetics.

[23]  T. Gerds,et al.  Diagnosis of ulcerative colitis before onset of inflammation by multivariate modeling of genome‐wide gene expression data , 2009, Inflammatory bowel diseases.

[24]  J. Danesh,et al.  A comprehensive 1000 Genomes-based genome-wide association meta-analysis of coronary artery disease , 2016 .

[25]  Y. Moreau,et al.  Computational tools for prioritizing candidate genes: boosting disease gene discovery , 2012, Nature Reviews Genetics.

[26]  David C. Wilson,et al.  Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease , 2012, Nature.

[27]  Z. Szallasi,et al.  Evaluation of Microarray Preprocessing Algorithms Based on Concordance with RT-PCR in Clinical Samples , 2009, PloS one.

[28]  Kenneth H. Buetow,et al.  PID: the Pathway Interaction Database , 2008, Nucleic Acids Res..

[29]  P. Rosenstiel,et al.  Genetic control of global gene expression levels in the intestinal mucosa: a human twin study. , 2009, Physiological genomics.

[30]  I. Lawrance,et al.  Ulcerative colitis and Crohn's disease: distinctive gene expression profiles and novel susceptibility candidate genes. , 2001, Human molecular genetics.

[31]  J. Satsangi,et al.  The genetic jigsaw of inflammatory bowel disease , 2002, Gut.

[32]  Jing Chen,et al.  ToppGene Suite for gene list enrichment analysis and candidate gene prioritization , 2009, Nucleic Acids Res..

[33]  Núria Queralt-Rosinach,et al.  DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants , 2016, Nucleic Acids Res..

[34]  A. Axon,et al.  Adhesion molecules in inflammatory bowel disease. , 1995, Gut.

[35]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[36]  Timothy L. Tickle,et al.  Pediatric Crohn disease patients exhibit specific ileal transcriptome and microbiome signature. , 2014, The Journal of clinical investigation.

[37]  S. Brant Promises, delivery, and challenges of inflammatory bowel disease risk gene discovery. , 2013, Clinical gastroenterology and hepatology : the official clinical practice journal of the American Gastroenterological Association.

[38]  Steven B. Cogill,et al.  Support vector machine model of developmental brain gene expression data for prioritization of Autism risk gene candidates , 2016, Bioinform..