MrTADFinder: A network modularity based approach to identify topologically associating domains in multiple resolutions

Genome-wide proximity ligation based assays such as Hi-C have revealed that eukaryotic genomes are organized into structural units called topologically associating domains (TADs). From a visual examination of the chromosomal contact map, however, it is clear that the organization of the domains is not simple or obvious. Instead, TADs exhibit various length scales and, in many cases, a nested arrangement. Here, by exploiting the resemblance between TADs in a chromosomal contact map and densely connected modules in a network, we formulate TAD identification as an optimization problem and propose an algorithm, MrTADFinder, to identify TADs from intra-chromosomal contact maps. MrTADFinder is based on the network-science concept of modularity. A key component of it is deriving an appropriate background model for contacts in a random chain, by numerically solving a set of matrix equations. The background model preserves the observed coverage of each genomic bin as well as the distance dependence of the contact frequency for any pair of bins exhibited by the empirical map. Also, by introducing a tunable resolution parameter, MrTADFinder provides a self-consistent approach for identifying TADs at different length scales, hence the acronym “Mr” standing for Multiple Resolutions. We then apply MrTADFinder to various Hi-C datasets. The identified domains are marked by boundary signatures in chromatin marks and transcription factor (TF) that are consistent with earlier work. Moreover, by calling TADs at different length scales, we observe that boundary signatures change with resolution, with different chromatin features having different characteristic length scales. Furthermore, we report an enrichment of HOT regions near TAD boundaries and investigate the role of different TFs in determining boundaries at various resolutions. To further explore the interplay between TADs and epigenetic marks, we examine how somatic mutations are distributed across boundaries (as tumor mutational burden is known to be coupled to chromatin structure), finding a clear stepwise pattern. Overall, MrTADFinder provides a novel computational framework to explore the multi-scale structures in Hi-C contact maps. Author Summary The accommodation of the roughly 2m of DNA in the nuclei of mammalian cells results in an intricate structure, in which the topologically associating domains (TADs) formed by densely interacting genomic regions emerge as a fundamental structural unit. Identification of TADs is essential for understanding the role of 3D genome organization in gene regulation. By viewing the chromosomal contact map as a network, TADs correspond to the densely connected regions in the network. Motivated by this mapping, we propose a novel method, MrTADFinder, to identify TADs based on the concept of modularity in network science. Using MrTADFinder, we identify domains at various resolutions, and further explore the interplay between domains and other chromatin features like transcription factors binding and histone modifications at different resolutions. Overall, MrTADFinder provides a new computational framework to investigate the multiple length scales that are built inside the organization of the genome.

[1]  William Stafford Noble,et al.  An Integrative Framework for Detecting Structural Variations in Cancer Genomes , 2017, bioRxiv.

[2]  Giacomo Cavalli,et al.  Organization and function of the 3D genome , 2016, Nature Reviews Genetics.

[3]  Hua-Jun Wu,et al.  A computational strategy to adjust for copy number in tumor Hi-C data , 2016, Bioinform..

[4]  Benjamin J. Raphael,et al.  Identification of hierarchical chromatin domains , 2016, Bioinform..

[5]  Victor G Corces,et al.  The Three-dimensional Genome: Principles and Roles of Long-distance Interactions This Review Comes from a Themed Issue on Cell Nucleus Introduction: a Three-dimensional Genome Units of Organization , 2022 .

[6]  Frank Alber,et al.  Mining 3D genome structure populations identifies major factors governing the stability of regulatory communities , 2016, Nature Communications.

[7]  Sushmita Roy,et al.  A multi-task graph-clustering approach for chromosome conformation capture data sets identifies conserved modules of chromosomal interactions , 2016, Genome Biology.

[8]  Raphaël Mourad,et al.  Computational Identification of Genomic Features That Influence 3D Chromatin Domain Formation , 2016, PLoS Comput. Biol..

[9]  S. Mundlos,et al.  Breaking TADs: How Alterations of Chromatin Domains Result in Disease. , 2016, Trends in genetics : TIG.

[10]  Job Dekker,et al.  TAD disruption as oncogenic driver. , 2016, Current opinion in genetics & development.

[11]  Dariusz M Plewczynski,et al.  CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription , 2015, Cell.

[12]  Jean-Philippe Vert,et al.  HiC-Pro: an optimized and flexible pipeline for Hi-C data processing , 2015, Genome Biology.

[13]  S. Q. Xie,et al.  Hierarchical folding and reorganization of chromosomes are linked to transcriptional changes in cellular differentiation , 2015, Molecular systems biology.

[14]  Laraib Malik,et al.  Rich chromatin structure prediction from Hi-C data , 2015, bioRxiv.

[15]  J. Dekker,et al.  Structural and functional diversity of Topologically Associating Domains , 2015, FEBS letters.

[16]  G. Stein,et al.  Chromatin interaction analysis reveals changes in small chromosome and telomere clustering between epithelial and breast cancer cells , 2015, Genome Biology.

[17]  L. Chin,et al.  HiCPlotter integrates genomic data with interaction matrices , 2015, Genome Biology.

[18]  William Stafford Noble,et al.  Analysis methods for studying the 3D architecture of the genome , 2015, Genome Biology.

[19]  E. Marco,et al.  Predicting chromatin organization using histone marks , 2015, Genome Biology.

[20]  Wouter de Laat,et al.  Getting the genome in shape: the formation of loops, domains and compartments , 2015, Genome Biology.

[21]  Eric S. Lander,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2015, Cell.

[22]  Viviana I. Risca,et al.  Unraveling the 3D genome: genomics tools for multiscale exploration. , 2015, Trends in genetics : TIG.

[23]  Nir Friedman,et al.  Mapping Nucleosome Resolution Chromosome Folding in Yeast by Micro-C , 2015, Cell.

[24]  Philip A. Ewels,et al.  Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C , 2015, Nature Genetics.

[25]  J. Dekker,et al.  Condensin-Driven Remodeling of X-Chromosome Topology during Dosage Compensation , 2015, Nature.

[26]  Paz Polak,et al.  Cell-of-origin chromatin organization shapes the mutational landscape of cancer , 2015, Nature.

[27]  Neva C. Durand,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2014, Cell.

[28]  Yanli Wang,et al.  Topologically associating domains are stable units of replication-timing regulation , 2014, Nature.

[29]  V. Corces,et al.  Architectural proteins: regulators of 3D genome organization in cell fate. , 2014, Trends in cell biology.

[30]  Mark Gerstein,et al.  MUSIC: identification of enriched regions in ChIP-Seq experiments using a mappability-corrected multiscale signal processing framework , 2014, Genome Biology.

[31]  Céline Lévy-Leduc,et al.  Two-dimensional segmentation for analyzing Hi-C data , 2014, Bioinform..

[32]  Mark Gerstein,et al.  OrthoClust: an orthology-based network framework for clustering data across multiple species , 2014, Genome Biology.

[33]  Peter J. Bickel,et al.  Comparative analysis of regulatory information and circuits across distant species , 2014, Nature.

[34]  V. Corces,et al.  CTCF: an architectural protein bridging genome topology and function , 2014, Nature Reviews Genetics.

[35]  William Stafford Noble,et al.  Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts , 2014, Genome research.

[36]  Robert Patro,et al.  Identification of alternative topological domains in chromatin , 2014, Algorithms for Molecular Biology.

[37]  Jennifer E. Phillips-Cremins,et al.  Architectural Protein Subclasses Shape 3D Organization of Genomes during Lineage Commitment , 2013, Cell.

[38]  L. Mirny,et al.  Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data , 2013, Nature Reviews Genetics.

[39]  M. Babu,et al.  A complex network framework for unbiased statistical analyses of DNA–DNA contact maps , 2012, Nucleic acids research.

[40]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer-associated genes , 2013 .

[41]  L. Mirny,et al.  Iterative Correction of Hi-C Data Reveals Hallmarks of Chromosome Organization , 2012, Nature Methods.

[42]  Jesse R. Dixon,et al.  Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions , 2012, Nature.

[43]  A. Tanay,et al.  Three-Dimensional Folding and Functional Organization Principles of the Drosophila Genome , 2012, Cell.

[44]  Reza Kalhor,et al.  Genome architectures revealed by tethered chromosome conformation capture and population-based modeling , 2011, Nature Biotechnology.

[45]  S. Tapscott,et al.  Networking the nucleus , 2010, Molecular systems biology.

[46]  Lusy Handoko,et al.  CHD7 Targets Active Gene Enhancer Elements to Modulate ES Cell-Specific Gene Expression , 2010, PLoS genetics.

[47]  P. Flicek,et al.  CHD 7 Targets Active Gene Enhancer Elements to Modulate ES Cell-Specific Gene Expression , 2010 .

[48]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[49]  Y. Ruan,et al.  ChIP‐based methods for the identification of long‐range chromatin interactions , 2009, Journal of cellular biochemistry.

[50]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[51]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[52]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Jerrold R. Griggs,et al.  Algorithms for Loop Matchings , 1978 .

[54]  Giacomo Cavalli,et al.  Organization and function of the 3 D genome , 2022 .