Graph embedding and unsupervised learning predict genomic sub-compartments from HiC chromatin interaction data

Chromatin interaction studies can reveal how the genome is organized into spatially confined sub-compartments in the nucleus. However, accurately identifying sub-compartments from chromatin interaction data remains a challenge in computational biology. Here, we present Sub-Compartment Identifier (SCI), an algorithm that uses graph embedding followed by unsupervised learning to predict sub-compartments using Hi-C chromatin interaction data. We find that the network topological centrality and clustering performance of SCI sub-compartment predictions are superior to those of hidden Markov model (HMM) sub-compartment predictions. Moreover, using orthogonal Chromatin Interaction Analysis by in-situ Paired-End Tag Sequencing (ChIA-PET) data, we confirmed that SCI sub-compartment prediction outperforms HMM. We show that SCI-predicted sub-compartments have distinct epigenetic marks, transcriptional activities, and transcription factor enrichment. Moreover, we present a deep neural network to predict sub-compartments using epigenome, replication timing, and sequence data. Our neural network predicts more accurate sub-compartment predictions when SCI-determined sub-compartments are used as labels for training. Accurate identification of sub-compartments from chromatin interaction data remains a challenge. Here, the authors introduce an algorithm combining graph embedding and unsupervised learning to predict sub-compartments using Hi-C data.

[1]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[2]  Erez Lieberman Aiden,et al.  De novo prediction of human chromosome structures: Epigenetic marking patterns encode genome architecture , 2017, Proceedings of the National Academy of Sciences.

[3]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[4]  Wei Zhu,et al.  3D Chromatin Structures of Mature Gametes and Structural Reprogramming during Mammalian Embryogenesis , 2017, Cell.

[5]  E. C. Schirmer,et al.  Constrained release of lamina-associated enhancers and genes from the nuclear envelope during T-cell activation facilitates their association in chromosome compartments , 2016, bioRxiv.

[6]  André L. Martins,et al.  Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers , 2014, Nature Genetics.

[7]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[8]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[9]  Jian Pei,et al.  Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[10]  Mikko Kivelä,et al.  Generalizations of the clustering coefficient to weighted complex networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Howard Y. Chang,et al.  Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position , 2013, Nature Methods.

[12]  Neva C. Durand,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2014, Cell.

[13]  Sushmita Roy,et al.  A multi-task graph-clustering approach for chromosome conformation capture data sets identifies conserved modules of chromosomal interactions , 2016, Genome Biology.

[14]  Srinivasan Parthasarathy,et al.  Network Representation Learning: Consolidation and Renewed Bearing , 2019, ArXiv.

[15]  K. Hansen,et al.  Reconstructing A/B compartments as revealed by Hi-C using long-range correlations in epigenetic data , 2015, Genome Biology.

[16]  R. Young,et al.  A Phase Separation Model for Transcriptional Control , 2017, Cell.

[17]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[18]  Anthony D. Schmitt,et al.  Genome-wide mapping and analysis of chromosome architecture , 2016, Nature Reviews Molecular Cell Biology.

[19]  Manu Setty,et al.  Inferring transcriptional and microRNA-mediated regulatory programs in glioblastoma , 2012, Molecular systems biology.

[20]  Shawn M. Gillespie,et al.  Insulator dysfunction and oncogene activation in IDH mutant gliomas , 2015, Nature.

[21]  E. Liu,et al.  An Oestrogen Receptor α-bound Human Chromatin Interactome , 2009, Nature.

[22]  A. Conesa,et al.  Initial Genomics of the Human Nucleolus , 2010, PLoS genetics.

[23]  Aziz Khan,et al.  dbSUPER: a database of super-enhancers in mouse and human genome , 2015, bioRxiv.

[24]  Jian Ma,et al.  Mapping 3D genome organization relative to nuclear compartments using TSA-Seq as a cytological ruler , 2018, The Journal of cell biology.

[25]  Kate B. Cook,et al.  Determination and Inference of Eukaryotic Transcription Factor Sequence Specificity , 2014, Cell.

[26]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[27]  Vladimir B. Bajic,et al.  HMCan: a method for detecting chromatin modifications in cancer samples using ChIP-seq data , 2013, Bioinform..

[28]  Sheng Li,et al.  Dynamic evolution of clonal epialleles revealed by methclone , 2014, Genome Biology.

[29]  Maxwell W. Libbrecht,et al.  Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements , 2012, Genome research.

[30]  Zohar Mukamel,et al.  Epigenetic polymorphism and the stochastic formation of differentially methylated regions in normal and cancerous tissues , 2012, Nature Genetics.

[31]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[32]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..

[33]  A. Tanay,et al.  Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture , 2011, Nature Genetics.

[34]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[35]  Dariusz M Plewczynski,et al.  CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription , 2015, Cell.

[36]  Michael J. Ziller,et al.  Locally disordered methylation forms the basis of intratumor methylome variation in chronic lymphocytic leukemia. , 2014, Cancer cell.

[37]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[38]  Miguel Beato,et al.  bwtool: a tool for bigWig files , 2014, Bioinform..

[39]  Chengqi Zhang,et al.  Network Representation Learning: A Survey , 2017, IEEE Transactions on Big Data.

[40]  Vladimir B. Bajic,et al.  DENdb: database of integrated human enhancers , 2015, Database J. Biol. Databases Curation.

[41]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[42]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[43]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..