High-resolution TADs reveal DNA sequences underlying genome organization in flies

Eukaryotic chromatin is partitioned into domains called TADs that are broadly conserved between species and virtually identical among cell types within the same species. Previous studies in mammals have shown that the DNA binding protein CTCF and cohesin contribute to a fraction of TAD boundaries. Apart from this, the molecular mechanisms governing this partitioning remain poorly understood. Using our new software, HiCExplorer, we annotated high-resolution (570 bp) TAD boundaries in flies and identified eight DNA motifs enriched at boundaries. Known insulator proteins bind five of these motifs while the remaining three motifs are novel. We find that boundaries are either at core promoters of active genes or at non-promoter regions of inactive chromatin and that these two groups are characterized by different sets of DNA motifs. Most boundaries are present at divergent promoters of constitutively expressed genes and the gene expression tends to be coordinated within TADs. In contrast to mammals, the CTCF motif is only present on 2% of boundaries in flies. We demonstrate that boundaries can be accurately predicted using only the motif sequences, along with open chromatin, suggesting that DNA sequence encodes the 3D genome architecture in flies. Finally, we present an interactive online database to access and explore the spatial organization of fly, mouse and human genomes, available at http://chorogeome.ie-freiburg.mpg.de.

[1]  Neva C. Durand,et al.  Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes , 2015, Proceedings of the National Academy of Sciences.

[2]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[3]  G. Rubin,et al.  Computational analysis of core promoters in the Drosophila genome , 2002, Genome Biology.

[4]  Fidel Ramírez,et al.  deepTools2: a next generation web server for deep-sequencing data analysis , 2016, Nucleic Acids Res..

[5]  William Stafford Noble,et al.  Motif-based analysis of large nucleotide data sets using MEME-ChIP , 2014, Nature Protocols.

[6]  Daniel S. Day,et al.  Activation of proto-oncogenes by disruption of chromosome neighborhoods , 2015, Science.

[7]  Zhaohui S. Qin,et al.  Insulator function and topological domain border strength scale with architectural protein occupancy , 2014, Genome Biology.

[8]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[9]  Zhaohui S. Qin,et al.  Gene density, transcription, and insulators contribute to the partition of the Drosophila genome into physical domains. , 2012, Molecular cell.

[10]  V. Corces,et al.  Drosophila CTCF tandemly aligns with other insulator proteins at the borders of H3K27me3 domains , 2012, Genome research.

[11]  Tobias Straub,et al.  Active promoters give rise to false positive ‘Phantom Peaks’ in ChIP-seq experiments , 2015, Nucleic acids research.

[12]  Olivier Cuvier,et al.  Chromatin immunoprecipitation indirect peaks highlight long-range interactions of insulator proteins and Pol II pausing. , 2014, Molecular cell.

[13]  A. Tanay,et al.  Three-Dimensional Folding and Functional Organization Principles of the Drosophila Genome , 2012, Cell.

[14]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[15]  D. Odom,et al.  Comparative Hi-C Reveals that CTCF Underlies Evolution of Chromosomal Domain Architecture , 2015, Cell reports.

[16]  L. Mirny,et al.  Iterative Correction of Hi-C Data Reveals Hallmarks of Chromosome Organization , 2012, Nature Methods.

[17]  Yong Zhang,et al.  Identifying ChIP-seq enrichment using MACS , 2012, Nature Protocols.

[18]  Neva C. Durand,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2014, Cell.

[19]  L. Mirny,et al.  Formation of Chromosomal Domains in Interphase by Loop Extrusion , 2015, bioRxiv.

[20]  I. Kulakovskiy,et al.  Architectural proteins Pita, Zw5,and ZIPIC contain homodimerization domain and support specific long-range interactions in Drosophila , 2016, Nucleic acids research.

[21]  Antonio Z. Politi,et al.  Real-time chromatin dynamics at the single gene level during transcription activation , 2017, bioRxiv.

[22]  M. Gerstein,et al.  Unlocking the secrets of the genome , 2009, Nature.

[23]  B. Ren,et al.  Genome organization and long-range regulation of gene expression by enhancers. , 2013, Current opinion in cell biology.

[24]  V. Corces,et al.  Dynamic changes in the genomic localization of DNA replication-related element binding factor during the cell cycle , 2013, Cell cycle.

[25]  O. Reina,et al.  Ibf1 and Ibf2 are novel CP190‐interacting proteins required for insulator function , 2014, The EMBO journal.

[26]  Martina Rath,et al.  Enhancer–core-promoter specificity separates developmental and housekeeping gene regulation , 2014, Nature.

[27]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..

[28]  Jesse R. Dixon,et al.  Chromatin Domains: The Unit of Chromosome Organization. , 2016, Molecular cell.

[29]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[30]  Guillaume J. Filion,et al.  Systematic Protein Location Mapping Reveals Five Principal Chromatin Types in Drosophila Cells , 2010, Cell.

[31]  Wendy A. Bickmore,et al.  Transcription factories: gene expression in unions? , 2009, Nature Reviews Genetics.

[32]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[33]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[34]  Jesse R. Dixon,et al.  Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions , 2012, Nature.

[35]  Timothy L. Bailey,et al.  Gene expression Advance Access publication May 4, 2011 DREME: motif discovery in transcription factor ChIP-seq data , 2011 .

[36]  X. Zhou,et al.  TopDom: an efficient and deterministic method for identifying topological domains in genomes , 2015, Nucleic acids research.

[37]  Ho-Ryun Chung,et al.  High-Affinity Sites Form an Interaction Network to Facilitate Spreading of the MSL Complex across the X Chromosome in Drosophila. , 2015, Molecular cell.

[38]  Job Dekker,et al.  Cohesin-dependent globules and heterochromatin shape 3D genome architecture in S. pombe , 2014, Nature.

[39]  Helge G. Roider,et al.  Transcription factor binding predictions using TRAP for the analysis of ChIP-seq data and regulatory SNPs , 2011, Nature Protocols.

[40]  Timothy L. Bailey,et al.  Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data , 2010, BMC Bioinformatics.

[41]  Terrence S. Furey,et al.  The UCSC Table Browser data retrieval tool , 2004, Nucleic Acids Res..

[42]  Zhaohui S. Qin,et al.  Widespread rearrangement of 3D chromatin organization underlies polycomb-mediated stress-induced silencing. , 2015, Molecular cell.

[43]  V. Corces,et al.  A CTCF Code for 3D Genome Architecture , 2015, Cell.

[44]  Hanspeter Pfister,et al.  UpSet: Visualization of Intersecting Sets , 2014, IEEE Transactions on Visualization and Computer Graphics.

[45]  V. Corces,et al.  CTCF: an architectural protein bridging genome topology and function , 2014, Nature Reviews Genetics.

[46]  L. Mirny,et al.  High-Resolution Mapping of the Spatial Organization of a Bacterial Chromosome , 2013, Science.

[47]  Łukasz M. Boryń,et al.  Genome-Wide Quantitative Enhancer Activity Maps Identified by STARR-seq , 2013, Science.

[48]  J. Dekker,et al.  Condensin-Driven Remodeling of X-Chromosome Topology during Dosage Compensation , 2015, Nature.

[49]  D. Gilmour,et al.  Distinct mechanisms of transcriptional pausing orchestrated by GAGA factor and M1BP, a novel transcription factor , 2013, The EMBO journal.

[50]  Roger D. Kornberg,et al.  Stable Chromosome Condensation Revealed by Chromosome Conformation Capture , 2015, Cell.

[51]  Ilya M. Flyamer,et al.  Active chromatin and transcription play a key role in chromosome partitioning into topologically associating domains , 2016, Genome research.

[52]  P. Schedl,et al.  The Zw5 protein, a component of the scs chromatin domain boundary, is able to block enhancer-promoter interaction. , 1999, Genes & development.

[53]  Peter R Cook,et al.  A model for all genomes: the role of transcription factories. , 2010, Journal of molecular biology.

[54]  P. Georgiev,et al.  Two new insulator proteins, Pita and ZIPIC, target CP190 to chromatin , 2015, Genome research.

[55]  V. Corces,et al.  Different enhancer classes in Drosophila bind distinct architectural proteins and mediate unique chromatin interactions and 3D architecture , 2016, Nucleic acids research.

[56]  A. Visel,et al.  Disruptions of Topological Chromatin Domains Cause Pathogenic Rewiring of Gene-Enhancer Interactions , 2015, Cell.

[57]  K. Jones,et al.  Regulation of chromatin organization and inducible gene expression by a Drosophila insulator. , 2011, Molecular cell.

[58]  Jeffrey T Leek,et al.  Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown , 2016, Nature Protocols.

[59]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[60]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[61]  J. Sedat,et al.  Spatial partitioning of the regulatory landscape of the X-inactivation centre , 2012, Nature.

[62]  B. Pulverer EMBO Press – a new way to publish , 2014, The EMBO journal.

[63]  Damian Roqueiro,et al.  Genome-wide analysis of local chromatin packing in Arabidopsis thaliana , 2015, Genome research.