Integrative analysis of epigenetics data identifies gene-specific regulatory elements

Understanding the complexity of transcriptional regulation is a major goal of computational biology. Because experimental linkage of regulatory sites to genes is challenging, computational methods considering epigenomics data have been proposed to create tissue-specific regulatory maps. However, we showed that these approaches are not well suited to account for the variations of the regulatory landscape between cell-types. To overcome these drawbacks, we developed a new method called STITCHIT, that identifies and links putative regulatory sites to genes. Within STITCHIT, we consider the chromatin accessibility signal of all samples jointly to identify regions exhibiting a signal variation related to the expression of a distinct gene. STITCHIT outperforms previous approaches in various validation experiments and was used with a genome-wide CRISPR-Cas9 screen to prioritize novel doxorubicin-resistance genes and their associated non-coding regulatory regions. We believe that our work paves the way for a more refined understanding of transcriptional regulation at the gene-level.

[1]  Marcel H. Schulz,et al.  EpiRegio: analysis and retrieval of regulatory elements linked to genes , 2020, Nucleic Acids Res..

[2]  Andrew D. Yates,et al.  eQTL Catalogue: a compendium of uniformly processed human gene expression and splicing QTLs , 2020, bioRxiv.

[3]  Marcel H. Schulz,et al.  Integrative prediction of gene expression with chromatin accessibility and conformation data , 2019, Epigenetics & Chromatin.

[4]  Neva C. Durand,et al.  Activity-by-Contact model of enhancer-promoter regulation from thousands of CRISPR perturbations , 2019, Nature Genetics.

[5]  C. Benner,et al.  Circular synthesized CRISPR/Cas gRNAs for functional interrogations in the coding and noncoding genome , 2019, eLife.

[6]  Martin Vingron,et al.  CRUP: a comprehensive framework to predict condition-specific regulatory units , 2018, Genome Biology.

[7]  M. Brudno,et al.  Developing OCHROdb, a comprehensive quality checked database of open chromatin regions from sequencing data , 2018, bioRxiv.

[8]  Kendall R. Sanson,et al.  Optimized libraries for CRISPR-Cas9 genetic screens with multiple modalities , 2018, Nature Communications.

[9]  Helen E. Parkinson,et al.  The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 , 2018, Nucleic Acids Res..

[10]  Jian Zhang,et al.  SEdb: a comprehensive human super-enhancer database , 2018, Nucleic Acids Res..

[11]  Nina Baumgarten,et al.  TEPIC 2—an extended framework for transcription factor binding prediction and integrative epigenomic analysis , 2018, Bioinform..

[12]  Marcel H. Schulz,et al.  On the problem of confounders in modeling gene expression , 2018, Bioinform..

[13]  R. Shamir,et al.  FOCS: a novel method for analyzing enhancer and gene activity patterns infers an extensive enhancer–promoter map , 2018, Genome Biology.

[14]  Daniel S. Day,et al.  YY1 Is a Structural Regulator of Enhancer-Promoter Loops , 2017, Cell.

[15]  David J. Arenillas,et al.  JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework , 2017, Nucleic acids research.

[16]  Kevin Y. Yip,et al.  Reconstruction of enhancer–target networks in 935 samples of human primary cells, tissues and cell lines , 2017, Nature Genetics.

[17]  C. Cotsapas,et al.  Integrative Genetic and Epigenetic Analysis Uncovers Regulatory Mechanisms of Autoimmune Disease. , 2017, American journal of human genetics.

[18]  Doron Lancet,et al.  GeneHancer: genome-wide integration of enhancers and target genes in GeneCards , 2017, Database J. Biol. Databases Curation.

[19]  Timothy E. Reddy,et al.  CRISPR–Cas9 epigenome editing enables high-throughput screening for functional regulatory elements in the human genome , 2017, Nature Biotechnology.

[20]  Clifford A. Meyer,et al.  Transcriptional landscape of the human cell cycle , 2017, Proceedings of the National Academy of Sciences.

[21]  Helen E. Parkinson,et al.  The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog) , 2016, Nucleic Acids Res..

[22]  Jonathan M. Cairns,et al.  Lineage-Specific Genome Architecture Links Enhancers and Non-coding Disease Variants to Target Gene Promoters , 2016, Cell.

[23]  Marcel H. Schulz,et al.  Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction , 2016, bioRxiv.

[24]  MoultJohn,et al.  Consensus Genome-Wide Expression Quantitative Trait Loci and Their Relationship with Human Complex Trait Disease. , 2016 .

[25]  Jesse R. Dixon,et al.  Chromatin Domains: The Unit of Chromosome Organization. , 2016, Molecular cell.

[26]  Katherine S. Pollard,et al.  Features that define the best ChIP-seq peak calling algorithms , 2016, Briefings Bioinform..

[27]  Meagan E. Sullender,et al.  Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9 , 2015, Nature Biotechnology.

[28]  Aaron T. L. Lun,et al.  csaw: a Bioconductor package for differential binding analysis of ChIP-seq data using sliding windows , 2015, Nucleic acids research.

[29]  B. Berman,et al.  Demystifying the secret mission of enhancers: linking distal regulatory elements to target genes , 2015, Critical reviews in biochemistry and molecular biology.

[30]  Christina S. Leslie,et al.  Early enhancer establishment and regulatory locus complexity shape transcriptional programs in hematopoietic differentiation , 2015, Nature Genetics.

[31]  Li Teng,et al.  4DGenome: a comprehensive database of chromatin interactions , 2015, Bioinform..

[32]  Eric S. Lander,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2015, Cell.

[33]  Philip A. Ewels,et al.  Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C , 2015, Nature Genetics.

[34]  P. Flicek,et al.  The Ensembl Regulatory Build , 2015, Genome Biology.

[35]  P. Sebastiani,et al.  BCL11A enhancer haplotypes and fetal hemoglobin in sickle cell anemia. , 2015, Blood cells, molecules & diseases.

[36]  R. Houlston,et al.  Capture Hi-C identifies the chromatin interactome of colorectal cancer risk loci , 2015, Nature Communications.

[37]  J. Lieb,et al.  What are super-enhancers? , 2014, Nature Genetics.

[38]  Neva C. Durand,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2014, Cell.

[39]  Gabi Kastenmüller,et al.  SNiPA: an interactive, genetic variant-centered annotation browser , 2014, Bioinform..

[40]  Thomas A. Down,et al.  A Comparison of Peak Callers Used for DNase-Seq Data , 2014, bioRxiv.

[41]  Britta A. M. Bouwman,et al.  A Single Oncogenic Enhancer Rearrangement Causes Concomitant EVI1 and GATA2 Deregulation in Leukemia , 2014, Cell.

[42]  T. Meehan,et al.  An atlas of active enhancers across human cell types and tissues , 2014, Nature.

[43]  Jin-Soo Kim,et al.  Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases , 2014, Bioinform..

[44]  A. Eccleston,et al.  Transcription and epigenetics , 2013, Nature.

[45]  A. Dean,et al.  Role of Ldb1 in the transition from chromatin looping to transcription activation , 2013, Epigenetics & Chromatin.

[46]  E. Wagner,et al.  FOSL1 Controls the Assembly of Endothelial Cells into Capillary Tubes by Direct Repression of αv and β3 Integrin Transcription , 2013, Molecular and Cellular Biology.

[47]  Jennifer R. Harris,et al.  Limitations and possibilities of low cell number ChIP-seq , 2012, BMC Genomics.

[48]  Timothy L. Bailey,et al.  Genome-wide in silico prediction of gene expression , 2012, Bioinform..

[49]  Sangdun Choi,et al.  Doxorubicin Induces Cytotoxicity through Upregulation of pERK–Dependent ATF3 , 2012, PloS one.

[50]  Nathan C. Sheffield,et al.  The accessible chromatin landscape of the human genome , 2012, Nature.

[51]  J. Dekker,et al.  The long-range interaction landscape of gene promoters , 2012, Nature.

[52]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[53]  V. Corces,et al.  Enhancer function: new insights into the regulation of tissue-specific gene expression , 2011, Nature Reviews Genetics.

[54]  Timothy J. Durham,et al.  "Systematic" , 1966, Comput. J..

[55]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..

[56]  Timothy J. Durham,et al.  Systematic analysis of chromatin state dynamics in nine human cell types , 2011, Nature.

[57]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[58]  Yu Zhang,et al.  A varying threshold method for ChIP peak-calling using multiple sources of information , 2010, Bioinform..

[59]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[60]  G. Crawford,et al.  DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. , 2010, Cold Spring Harbor protocols.

[61]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[62]  J. Chambard,et al.  ERK and cell death: Mechanisms of ERK‐induced cell death – apoptosis, autophagy and senescence , 2010, The FEBS journal.

[63]  Y. Ruan,et al.  ChIP‐based methods for the identification of long‐range chromatin interactions , 2009, Journal of cellular biochemistry.

[64]  Juan M. Vaquerizas,et al.  A census of human transcription factors: function, expression and evolution , 2009, Nature Reviews Genetics.

[65]  A. Visel,et al.  ChIP-seq accurately predicts tissue-specific activity of enhancers , 2009, Nature.

[66]  D. Tuan,et al.  A facilitated tracking and transcription mechanism of long-range enhancer function , 2007, Nucleic acids research.

[67]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[68]  Nathaniel D. Heintzman,et al.  Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome , 2007, Nature Genetics.

[69]  Martin Vingron,et al.  Predicting transcription factor affinities to DNA from a biophysical model , 2007, Bioinform..

[70]  A. Ballestrero,et al.  Matrix metalloproteinase-2 and -9 are induced differently by doxorubicin in H9c2 cells: The role of MAP kinases and NAD(P)H oxidase. , 2006, Cardiovascular research.

[71]  G. Stein,et al.  The Bone-specific Expression of Runx2 Oscillates during the Cell Cycle to Support a G1-related Antiproliferative Function in Osteoblasts* , 2005, Journal of Biological Chemistry.

[72]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[73]  Doron Lancet,et al.  GeneLoc: exon-based integration of human genome maps , 2003, ISMB.

[74]  M. Isobe,et al.  ATF3 inhibits doxorubicin-induced apoptosis in cardiac myocytes: a novel cardioprotective role of ATF3. , 2002, Journal of molecular and cellular cardiology.

[75]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[76]  J. T. Kadonaga,et al.  *To whom correspondence should be addressed. E- , 2022 .

[77]  L. Liu,et al.  Adriamycin-induced DNA damage mediated by mammalian DNA topoisomerase II. , 1984, Science.

[78]  R. Bellman The theory of dynamic programming , 1954 .

[79]  John Moult,et al.  Consensus Genome-Wide Expression Quantitative Trait Loci and Their Relationship with Human Complex Trait Disease. , 2016, Omics : a journal of integrative biology.

[80]  Uwe Ohler,et al.  JAMM: a peak finder for joint analysis of NGS replicates , 2015, Bioinform..

[81]  Li Teng,et al.  4DGenome: a comprehensive database of chromatin interactions , 2015, Bioinform..

[82]  Giovanni Parmigiani,et al.  POE: Statistical Methods for Qualitative Analysis of Gene Expression , 2003 .

[83]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[84]  Tsonwin Hai,et al.  ATF3 and stress responses. , 1999, Gene expression.

[85]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .