Improved integrative framework combining association data with gene expression features to prioritize Crohn's disease genes.

Genome-wide association studies in Crohn's disease (CD) have identified 140 genome-wide significant loci. However, identification of genes driving association signals remains challenging. Furthermore, genome-wide significant thresholds limit false positives at the expense of decreased sensitivity. In this study, we explored gene features contributing to CD pathogenicity, including gene-based association data from CD and autoimmune (AI) diseases, as well as gene expression features (eQTLs, epigenetic markers of expression and intestinal gene expression data). We developed an integrative model based on a CD reference gene set. This integrative approach outperformed gene-based association signals alone in identifying CD-related genes based on statistical validation, gene ontology enrichment, differential expression between M1 and M2 macrophages and a validation using genes causing monogenic forms of inflammatory bowel disease as a reference. Besides gene-level CD association P-values, association with AI diseases was the strongest predictor, highlighting generalized mechanisms of inflammation, and the interferon-γ pathway particularly. Within the 140 high-confidence CD regions, 598 of 1328 genes had low prioritization scores, highlighting genes unlikely to contribute to CD pathogenesis. For select regions, comparably high integrative model scores were observed for multiple genes. This is particularly evident for regions having extensive linkage disequilibrium such as the IBD5 locus. Our analyses provide a standardized reference for prioritizing potential CD-related genes, in regions with both highly significant and nominally significant gene-level association P-values. Our integrative model may be particularly valuable in prioritizing rare, potentially private, missense variants for which genome-wide evidence for association may be unattainable.

[1]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[2]  Simon C. Potter,et al.  The Architecture of Gene Regulatory Variation across Multiple Human Tissues: The MuTHER Study , 2011, PLoS genetics.

[3]  W. Huber,et al.  Differential expression analysis for sequence count data , 2010 .

[4]  P. Visscher,et al.  A versatile gene-based test for genome-wide association studies. , 2010, American journal of human genetics.

[5]  Bin Zhang,et al.  A survey of the genetics of stomach, liver, and adipose gene expression from a morbidly obese cohort. , 2011, Genome research.

[6]  Israel Steinfeld,et al.  BMC Bioinformatics BioMed Central , 2008 .

[7]  Steven J. Schrodi,et al.  A large-scale genetic association study confirms IL12B and leads to the identification of IL23R as psoriasis-risk genes. , 2007, American journal of human genetics.

[8]  Tariq Ahmad,et al.  Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci , 2010, Nature Genetics.

[9]  L. Liang,et al.  A genome-wide association study of global gene expression , 2007, Nature Genetics.

[10]  Michael M. Ward,et al.  Genome-wide association study of ankylosing spondylitis identifies non-MHC susceptibility loci , 2010, Nature Genetics.

[11]  R. D. Hatton,et al.  Interplay between the TH17 and TReg cell lineages: a (co-)evolutionary perspective , 2009, Nature Reviews Immunology.

[12]  T. Lumley,et al.  gplots: Various R Programming Tools for Plotting Data , 2015 .

[13]  P. Rosenstiel,et al.  XIAP variants in male Crohn's disease , 2014, Gut.

[14]  R. Ophoff,et al.  Unraveling the Regulatory Mechanisms Underlying Tissue-Dependent Genetic Variation of Gene Expression , 2012, PLoS genetics.

[15]  H. Uhlig Monogenic diseases associated with intestinal inflammation: implications for the understanding of inflammatory bowel disease , 2013, Gut.

[16]  J. Rioux International Inflammatory Bowel Disease Genetics Consortium Identifies >50 Genetic Risk Factors for Ulcerative Colitis , 2010 .

[17]  Judy H. Cho,et al.  A Genome-Wide Association Study Identifies IL23R as an Inflammatory Bowel Disease Gene , 2006, Science.

[18]  Joshua M. Korn,et al.  Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease , 2011, Nature Genetics.

[19]  Xia Yang,et al.  Sherlock: detecting gene-disease associations by matching patterns of expression QTL and GWAS. , 2013, American journal of human genetics.

[20]  Eric Vivier,et al.  Innate lymphoid cells — a proposal for uniform nomenclature , 2013, Nature Reviews Immunology.

[21]  N. Sebire,et al.  Infant colitis—it's in the genes , 2010, The Lancet.

[22]  F. Rieux-Laucat,et al.  Phenotypic Characterization of Very Early-onset IBD Due to Mutations in the IL10, IL10 Receptor Alpha or Beta Gene: A Survey of the GENIUS Working Group , 2013, Inflammatory bowel diseases.

[23]  Pan Du,et al.  lumi: a pipeline for processing Illumina microarray , 2008, Bioinform..

[24]  R. Andrews,et al.  Innate Immune Activity Conditions the Effect of Regulatory Variants upon Monocyte Gene Expression , 2014, Science.

[25]  Sinead B. O'Leary,et al.  Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to Crohn disease , 2001, Nature Genetics.

[26]  Simon C. Potter,et al.  Association scan of 14,500 nonsynonymous SNPs in four diseases identifies autoimmunity variants , 2007, Nature Genetics.

[27]  Olga M. Pena,et al.  Endotoxin Tolerance Represents a Distinctive State of Alternative Polarization (M2) in Human Mononuclear Cells , 2011, The Journal of Immunology.

[28]  Kasper Lage,et al.  Pervasive Sharing of Genetic Effects in Autoimmune Disease , 2011, PLoS genetics.

[29]  J. DeVoss,et al.  A Crohn’s disease variant in Atg16l1 enhances its degradation by caspase 3 , 2014, Nature.

[30]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[31]  A. Schäffer,et al.  Inflammatory bowel disease and mutations affecting the interleukin-10 receptor. , 2009, The New England journal of medicine.

[32]  David C. Wilson,et al.  Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease , 2012, Nature.

[33]  M. Silverberg,et al.  Expression quantitative trait loci analysis identifies associations between genotype and gene expression in human intestine. , 2013, Gastroenterology.

[34]  M. Daly,et al.  Proteins Encoded in Genomic Regions Associated with Immune-Mediated Disease Physically Interact and Suggest Underlying Biology , 2011, PLoS genetics.

[35]  David Haussler,et al.  ENCODE Data in the UCSC Genome Browser: year 5 update , 2012, Nucleic Acids Res..

[36]  Judy H. Cho,et al.  [Letters to Nature] , 1975, Nature.

[37]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[38]  M. Daly,et al.  Identifying Relationships among Genomic Disease Regions: Predicting Genes at Pathogenic SNP Associations and Rare Deletions , 2009, PLoS genetics.

[39]  B. Molnár,et al.  Peripheral Blood Based Discrimination of Ulcerative Colitis and Crohn’s Disease from Non-IBD Colitis by Genome-Wide Gene Expression Profiling , 2011, Disease markers.

[40]  Judy H Cho,et al.  Inflammatory bowel disease. , 2009, The New England journal of medicine.

[41]  Judy H. Cho,et al.  Effector CD4+ T Cell Expression Signatures and Immune-Mediated Disease Associated Genes , 2012, PloS one.

[42]  Matthew Stephens,et al.  Dissecting the regulatory architecture of gene expression QTLs , 2012, Genome Biology.

[43]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[44]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[45]  M. Daly,et al.  Genetic and Epigenetic Fine-Mapping of Causal Autoimmune Disease Variants , 2014, Nature.

[46]  N. Cox,et al.  Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS , 2010, PLoS genetics.

[47]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .