Constructing tissue-specific transcriptional regulatory networks via a Markov random field

BackgroundRecent advances in sequencing technologies have enabled parallel assays of chromatin accessibility and gene expression for major human cell lines. Such innovation provides a great opportunity to decode phenotypic consequences of genetic variation via the construction of predictive gene regulatory network models. However, there still lacks a computational method to systematically integrate chromatin accessibility information with gene expression data to recover complicated regulatory relationships between genes in a tissue-specific manner.ResultsWe propose a Markov random field (MRF) model for constructing tissue-specific transcriptional regulatory networks via integrative analysis of DNase-seq and RNA-seq data. Our method, named CSNets (cell-line specific regulatory networks), first infers regulatory networks for individual cell lines using chromatin accessibility information, and then fine-tunes these networks using the MRF based on pairwise similarity between cell lines derived from gene expression data. Using this method, we constructed regulatory networks specific to 110 human cell lines and 13 major tissues with the use of ENCODE data. We demonstrated the high quality of these networks via comprehensive statistical analysis based on ChIP-seq profiles, functional annotations, taxonomic analysis, and literature surveys. We further applied these networks to analyze GWAS data of Crohn’s disease and prostate cancer. Results were either consistent with the literature or provided biological insights into regulatory mechanisms of these two complex diseases. The website of CSNets is freely available at http://bioinfo.au.tsinghua.edu.cn/jianglab/CSNETS/.ConclusionsCSNets demonstrated the power of joint analysis on epigenomic and transcriptomic data towards the accurate construction of gene regulatory network. Our work provides not only a useful resource of regulatory networks to the community, but also valuable experiences in methodology development for multi-omics data integration.

[1]  C. E. Pearson,et al.  Table S2: Trans-factors and trinucleotide repeat instability Trans-factor , 2010 .

[2]  Daniel Marbach,et al.  Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics , 2016, PLoS Comput. Biol..

[3]  Xin Chen,et al.  TRANSFAC: an integrated system for gene expression regulation , 2000, Nucleic Acids Res..

[4]  Rui Jiang,et al.  Simultaneous inference of phenotype-associated genes and relevant tissues from GWAS data via Bayesian integration of multiple tissue-specific gene networks , 2017, Journal of molecular cell biology.

[5]  Yair Weiss,et al.  Correctness of Local Probability Propagation in Graphical Models with Loops , 2000, Neural Computation.

[6]  Daniel Marbach,et al.  Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases , 2016, Nature Methods.

[7]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[8]  Mariano J. Alvarez,et al.  Identification of Causal Genetic Drivers of Human Disease through Systems-Level Analysis of Regulatory Networks , 2014, Cell.

[9]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  C. Loddenkemper,et al.  Proteasome-mediated degradation of IκBα and processing of p105 in Crohn disease and ulcerative colitis , 2006 .

[11]  Tao Jiang,et al.  Differential regulation enrichment analysis via the integration of transcriptional regulatory network and gene expression data , 2015, Bioinform..

[12]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[13]  Hairong Lv,et al.  Leveraging multiple gene networks to prioritize GWAS candidate genes via network representation learning. , 2018, Methods.

[14]  Howard Y. Chang,et al.  ATAC‐seq: A Method for Assaying Chromatin Accessibility Genome‐Wide , 2015, Current protocols in molecular biology.

[15]  Hongzhe Li,et al.  A Markov random field model for network-based analysis of genomic data , 2007, Bioinform..

[16]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[17]  Jianzhu Chen,et al.  A genome-wide regulatory network identifies key transcription factors for memory CD8+ T-cell development , 2013, Nature Communications.

[18]  Xia Li,et al.  Construction and analysis of dynamic transcription factor regulatory networks in the progression of glioma , 2015, Scientific Reports.

[20]  Binhua Tang,et al.  Hierarchical Modularity in ERα Transcriptional Network Is Associated with Distinct Functions and Implicates Clinical Outcomes , 2012, Scientific Reports.

[21]  Jacob F. Degner,et al.  Sequence and Chromatin Accessibility Data Accurate Inference of Transcription Factor Binding from Dna Material Supplemental Open Access , 2022 .

[22]  Rui Jiang,et al.  Modeling the causal regulatory network by integrating chromatin accessibility and transcriptome data. , 2016, National science review.

[23]  Xingming Zhao,et al.  HISP: a hybrid intelligent approach for identifying directed signaling pathways , 2017, Journal of molecular cell biology.

[24]  A. del Sol,et al.  Prediction of disease–gene–drug relationships following a differential network analysis , 2016, Cell Death and Disease.

[25]  David Z. Chen,et al.  Architecture of the human regulatory network derived from ENCODE data , 2012, Nature.

[26]  Shane J. Neph,et al.  Circuitry and Dynamics of Human Transcription Factor Regulatory Networks , 2012, Cell.

[27]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[28]  Kevin Y. Yip,et al.  Understanding transcriptional regulation by integrative analysis of transcription factor binding data , 2012, Genome research.

[29]  O. Stegle,et al.  Single-cell epigenomics: Recording the past and predicting the future , 2017, Science.

[30]  K. Aldape,et al.  Identification of Causal Genetic Drivers of Human Disease through Systems-Level Analysis of Regulatory Networks , 2016, Cell.

[31]  R. Sandberg,et al.  Assessment of tumor characteristic gene expression in cell lines using a tissue similarity index (TSI). , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[32]  C. Loddenkemper,et al.  Proteasome-mediated degradation of IkappaBalpha and processing of p105 in Crohn disease and ulcerative colitis. , 2006, The Journal of clinical investigation.

[33]  H. Lehrach,et al.  In silico identification of a core regulatory network of OCT4 in human embryonic stem cells using an integrated approach , 2009, BMC Genomics.

[34]  J. Hampe,et al.  Activation of nuclear factor κB in inflammatory bowel disease , 1998, Gut.

[35]  W. Wong,et al.  Modeling gene regulation from paired expression and chromatin accessibility data , 2017, Proceedings of the National Academy of Sciences.

[36]  Chengyu Liu,et al.  Identification of sample-specific regulations using integrative network level analysis , 2015, BMC Cancer.

[37]  Ning Chen,et al.  Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding , 2017, Bioinform..

[38]  G. A. Limb,et al.  Activation of nuclear factor kappa B in Crohn's disease , 1998, Inflammation Research.

[39]  Tao Jiang,et al.  Differential gene expression analysis using coexpression and RNA-Seq data , 2013, 2013 IEEE 3rd International Conference on Computational Advances in Bio and medical Sciences (ICCABS).

[40]  José L. V. Mejino,et al.  A reference ontology for biomedical informatics: the Foundational Model of Anatomy , 2003, J. Biomed. Informatics.

[41]  Helga Thorvaldsdóttir,et al.  Molecular signatures database (MSigDB) 3.0 , 2011, Bioinform..

[42]  Xiao Wang,et al.  CRNET: an efficient sampling approach to infer functional regulatory networks by integrating large‐scale ChIP‐seq and time‐course RNA‐seq data , 2018, Bioinform..

[43]  Gos Micklem,et al.  Supporting Online Material Materials and Methods Figs. S1 to S50 Tables S1 to S18 References Identification of Functional Elements and Regulatory Circuits by Drosophila Modencode , 2022 .

[44]  Boris Lenhard,et al.  Patterns of regulatory activity across diverse human cell types predict tissue identity, transcription factor binding, and long-range interactions , 2013, Genome research.

[45]  Megan F. Cole,et al.  Core Transcriptional Regulatory Circuitry in Human Embryonic Stem Cells , 2005, Cell.

[46]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Tatsunori B. Hashimoto,et al.  Discovery of non-directional and directional pioneer transcription factors by modeling DNase profile magnitude and shape , 2014, Nature Biotechnology.

[48]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[49]  Howard Y. Chang,et al.  HiChIP: efficient and sensitive analysis of protein-directed genome architecture , 2016, Nature Methods.

[50]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[51]  S. Aerts,et al.  Decoding the regulatory landscape of melanoma reveals TEADS as regulators of the invasive cell state , 2015, Nature Communications.

[52]  Z. Weng,et al.  High-Resolution Mapping and Characterization of Open Chromatin across the Genome , 2008, Cell.

[53]  Rui Jiang,et al.  Reconstructing cell cycle pseudo time-series via single-cell transcriptome data , 2017, Nature Communications.

[54]  Shane J. Neph,et al.  An expansive human regulatory lexicon encoded in transcription factor footprints , 2012, Nature.

[55]  L. MacNeil,et al.  Gene regulatory networks and the role of robustness and stochasticity in the control of gene expression. , 2011, Genome research.

[56]  David J. Arenillas,et al.  JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles , 2015, Nucleic Acids Res..

[57]  Y Weiss Correctness of local probability in graphical models with loops. , 2000, Neural computation.

[58]  P. Pandolfi,et al.  Oncogenic Role of Fusion-circRNAs Derived from Cancer-Associated Chromosomal Translocations , 2016, Cell.

[59]  M. Neurath,et al.  NF‐κB in inflammatory bowel disease , 2008, Journal of internal medicine.

[60]  R. Jiang Walking on multiple disease-gene networks to prioritize candidate genes. , 2015, Journal of molecular cell biology.

[61]  K. Aihara,et al.  Personalized characterization of diseases using sample-specific networks , 2016, bioRxiv.

[62]  Mei Jiang,et al.  Androgen-responsive gene database: integrated knowledge on androgen-responsive genes. , 2009, Molecular endocrinology.

[63]  Ning Chen,et al.  Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding , 2017, Bioinform..

[64]  Tariq Ahmad,et al.  Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci , 2010, Nature Genetics.

[65]  John Quackenbush,et al.  Estimating Sample-Specific Regulatory Networks , 2015, iScience.

[66]  I. Peluso,et al.  Interleukin-12 and Th1 immune response in Crohn's disease: pathogenetic relevance and therapeutic implication. , 2006, World journal of gastroenterology.