TADKB: Family classification and a knowledge base of topologically associating domains

BackgroundTopologically associating domains (TADs) are considered the structural and functional units of the genome. However, there is a lack of an integrated resource for TADs in the literature where researchers can obtain family classifications and detailed information about TADs.ResultsWe built an online knowledge base TADKB integrating knowledge for TADs in eleven cell types of human and mouse. For each TAD, TADKB provides the predicted three-dimensional (3D) structures of chromosomes and TADs, and detailed annotations about the protein-coding genes and long non-coding RNAs (lncRNAs) existent in each TAD. Besides the 3D chromosomal structures inferred by population Hi-C, the single-cell haplotype-resolved chromosomal 3D structures of 17 GM12878 cells are also integrated in TADKB. A user can submit query gene/lncRNA ID/sequence to search for the TAD(s) that contain(s) the query gene or lncRNA. We also classified TADs into families. To achieve that, we used the TM-scores between reconstructed 3D structures of TADs as structural similarities and the Pearson’s correlation coefficients between the fold enrichment of chromatin states as functional similarities. All of the TADs in one cell type were clustered based on structural and functional similarities respectively using the spectral clustering algorithm with various predefined numbers of clusters. We have compared the overlapping TADs from structural and functional clusters and found that most of the TADs in the functional clusters with depleted chromatin states are clustered into one or two structural clusters. This novel finding indicates a connection between the 3D structures of TADs and their DNA functions in terms of chromatin states.ConclusionTADKB is available at http://dna.cs.miami.edu/TADKB/.

[1]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[2]  W. Kabsch A discussion of the solution for the best rotation to relate two sets of vectors , 1978 .

[3]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[4]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[6]  Terrence S. Furey,et al.  The UCSC Genome Browser Database: update 2006 , 2005, Nucleic Acids Res..

[7]  Daniel Ruiz,et al.  A Fast Algorithm for Matrix Balancing , 2013, Web Information Retrieval and Linear Algebra Algorithms.

[8]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[9]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[10]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[11]  William Stafford Noble,et al.  A Three-Dimensional Model of the Yeast Genome , 2010, Nature.

[12]  David Haussler,et al.  The UCSC Genome Browser database: update 2010 , 2009, Nucleic Acids Res..

[13]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[14]  A. Tanay,et al.  Three-Dimensional Folding and Functional Organization Principles of the Drosophila Genome , 2012, Cell.

[15]  Nadav S. Bar,et al.  Landscape of transcription in human cells , 2012, Nature.

[16]  Jeannie T. Lee Epigenetic Regulation by Long Noncoding RNAs , 2012, Science.

[17]  A. Schuldt Gene expression: An ncRNA relocation package , 2011, Nature Reviews Molecular Cell Biology.

[18]  Jesse R. Dixon,et al.  Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions , 2012, Nature.

[19]  C. Wahlestedt,et al.  Regulation of chromatin structure by long noncoding RNAs: focus on natural antisense transcripts. , 2012, Trends in genetics : TIG.

[20]  Manolis Kellis,et al.  ChromHMM: automating chromatin-state discovery and characterization , 2012, Nature Methods.

[21]  Jesse R. Dixon,et al.  Cohesin and CTCF differentially affect chromatin architecture and gene expression in human cells , 2013, Proceedings of the National Academy of Sciences.

[22]  Kim-Chuan Toh,et al.  3D Chromosome Modeling with Semi-Definite Programming and Hi-C Data , 2013, J. Comput. Biol..

[23]  Jennifer E. Phillips-Cremins,et al.  Architectural Protein Subclasses Shape 3D Organization of Genomes during Lineage Commitment , 2013, Cell.

[24]  Robert Patro,et al.  Identification of alternative topological domains in chromatin , 2014, Algorithms for Molecular Biology.

[25]  Julien Pontis,et al.  H19 lncRNA controls gene expression of the Imprinted Gene Network by recruiting MBD1 , 2013, Proceedings of the National Academy of Sciences.

[26]  E. Lander,et al.  The Xist lncRNA Exploits Three-Dimensional Genome Architecture to Spread Across the X Chromosome , 2013, Science.

[27]  Anushya Muruganujan,et al.  Large-scale gene function analysis with the PANTHER classification system , 2013, Nature Protocols.

[28]  Jianlin Cheng,et al.  Large-scale reconstruction of 3D structures of human chromosomes from chromosomal contact data , 2014, Nucleic acids research.

[29]  Sanghyuk Lee,et al.  lncRNAtor: a comprehensive resource for functional investigation of long non-coding RNAs , 2014, Bioinform..

[30]  William Stafford Noble,et al.  A statistical approach for inferring the 3D structure of the genome , 2014, Bioinform..

[31]  Céline Lévy-Leduc,et al.  Two-dimensional segmentation for analyzing Hi-C data , 2014, Bioinform..

[32]  G. Calin,et al.  Clinical significance of the interaction between non-coding RNAs and the epigenetics machinery , 2014, Epigenetics.

[33]  Neva C. Durand,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2014, Cell.

[34]  A. Fatica,et al.  The Role of Long Noncoding RNAs in the Epigenetic Control of Gene Expression , 2014, ChemMedChem.

[35]  Frank Alber,et al.  Hi-Corrector: a fast, scalable and memory-efficient package for normalizing large-scale Hi-C data , 2014, Bioinform..

[36]  Marcel E. Dinger,et al.  lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs , 2014, Nucleic Acids Res..

[37]  J. Dekker,et al.  Condensin-Driven Remodeling of X-Chromosome Topology during Dosage Compensation , 2015, Nature.

[38]  William Stafford Noble,et al.  Joint annotation of chromatin state and chromatin conformation reveals relationships among domain types and identifies domains of cell-type-specific expression , 2014, bioRxiv.

[39]  D. Odom,et al.  Comparative Hi-C Reveals that CTCF Underlies Evolution of Chromosomal Domain Architecture , 2015, Cell reports.

[40]  Lennart Martens,et al.  An update on LNCipedia: a database for annotated human lncRNA sequences , 2014, Nucleic acids research.

[41]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[42]  Benjamin J. Raphael,et al.  Identification of hierarchical chromatin domains , 2016, Bioinform..

[43]  X. Zhou,et al.  TopDom: an efficient and deterministic method for identifying topological domains in genomes , 2015, Nucleic acids research.

[44]  Michael Q. Zhang,et al.  De novo deciphering three-dimensional chromatin interaction and topological domains by wavelet transformation of epigenetic profiles , 2016, Nucleic acids research.

[45]  Daniel R. Zerbino,et al.  Ensembl 2016 , 2015, Nucleic Acids Res..

[46]  Howard Y. Chang,et al.  HiChIP: efficient and sensitive analysis of protein-directed genome architecture , 2016, Nature Methods.

[47]  Jonathan M. Cairns,et al.  Lineage-Specific Genome Architecture Links Enhancers and Non-coding Disease Variants to Target Gene Promoters , 2016, Cell.

[48]  Wei Wu,et al.  NONCODE 2016: an informative and valuable data source of long non-coding RNAs , 2015, Nucleic Acids Res..

[49]  Jesse R. Dixon,et al.  Chromatin Domains: The Unit of Chromosome Organization. , 2016, Molecular cell.

[50]  A. Tanay,et al.  Multiscale 3D Genome Rewiring during Mouse Neural Development , 2017, Cell.

[51]  Bing He,et al.  Identifying topologically associating domains and subdomains by Gaussian Mixture model And Proportion test , 2017, Nature Communications.

[52]  William Stafford Noble,et al.  Massively multiplex single-cell Hi-C , 2016, Nature Methods.

[53]  Tong Liu,et al.  Reconstructing high-resolution chromosome three-dimensional structures by Hi-C complex networks , 2018, BMC Bioinformatics.

[54]  Helga Thorvaldsdóttir,et al.  Juicebox.js Provides a Cloud-Based Visualization System for Hi-C Data , 2017, bioRxiv.

[55]  B. Tabak,et al.  Higher-Order Inter-chromosomal Hubs Shape 3D Genome Organization in the Nucleus , 2018, Cell.

[56]  X. Xie,et al.  Three-dimensional genome structures of single diploid human cells , 2018, Science.

[57]  Tong Liu,et al.  scHiCNorm: a software package to eliminate systematic biases in single-cell Hi-C data , 2017, Bioinform..

[58]  Tong Liu,et al.  Measuring the three-dimensional structural properties of topologically associating domains , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).