Topological Data Analysis Reveals Principles of Chromosome Structure in Cellular Differentiation

Topological data analysis (TDA) is a mathematically well-founded set of methods to derive robust information about the structure and topology of data. It has been applied successfully in several biological contexts. Derived primarily from algebraic topology, TDA rigorously identifies persistent features in complex data, making it well-suited to better understand the key features of three-dimensional chromosome structure. Chromosome structure has a significant influence in many diverse genomic processes and has recently been shown to relate to cellular differentiation. While there exist many methods to study specific substructures of chromosomes, we are still missing a global view of all geometric features of chromosomes. By applying TDA to the study of chromosome structure through differentiation across three cell lines, we provide insight into principles of chromosome folding and looping. We identify persistent connected components and one-dimensional topological features of chromosomes and characterize them across cell types and stages of differentiation. Availability: Scripts to reproduce the results from this study can be found at https://github.com/Kingsford-Group/hictda

[1]  Frédéric Chazal,et al.  An Introduction to Topological Data Analysis: Fundamental and Practical Aspects for Data Scientists , 2017, Frontiers in Artificial Intelligence.

[2]  Neva C. Durand,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2014, Cell.

[3]  L. Mirny,et al.  Iterative Correction of Hi-C Data Reveals Hallmarks of Chromosome Organization , 2012, Nature Methods.

[4]  Gunnar E. Carlsson,et al.  Topological pattern recognition for point cloud data* , 2014, Acta Numerica.

[5]  Philip A. Wilsey,et al.  Cluster-based Data Reduction for Persistent Homology , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[6]  R. Andersson,et al.  Transcriptional decomposition reveals active chromatin architectures and cell specific regulatory interactions , 2017, Nature Communications.

[7]  G. Carlsson,et al.  Topology of viral evolution , 2013, Proceedings of the National Academy of Sciences.

[8]  Pablo G. Cámara,et al.  Topological methods for genomics: present and future directions. , 2017, Current opinion in systems biology.

[9]  Elena K. Kandror,et al.  Single-cell topological RNA-Seq analysis reveals insights into cellular differentiation and development , 2017, Nature Biotechnology.

[10]  Jing Liang,et al.  Chromatin architecture reorganization during stem cell differentiation , 2015, Nature.

[11]  Mariette Yvinec,et al.  The Gudhi Library: Simplicial Complexes and Persistent Homology , 2014, ICMS.

[12]  Yanli Wang,et al.  Topologically associating domains are stable units of replication-timing regulation , 2014, Nature.

[13]  P. Howarth,et al.  Multidimensional endotyping in patients with severe asthma reveals inflammatory heterogeneity in matrix metalloproteinases and chitinase 3–like protein 1 , 2016, The Journal of allergy and clinical immunology.

[14]  Kevin J. Emmett,et al.  Multiscale Topology of Chromatin Folding , 2015, BICT.

[15]  Tom Misteli,et al.  Functional implications of genome topology , 2013, Nature Structural &Molecular Biology.

[16]  Carl Kingsford,et al.  Higher-order chromatin domains link eQTLs with the expression of far-away genes , 2013, Nucleic acids research.

[17]  G. Carlsson,et al.  Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival , 2011, Proceedings of the National Academy of Sciences.

[18]  Benjamin S. Glicksberg,et al.  Identification of type 2 diabetes subgroups through topological analysis of patient similarity , 2015, Science Translational Medicine.

[19]  J. Dekker,et al.  Capturing Chromosome Conformation , 2002, Science.

[20]  Samir Khuller,et al.  Resolving spatial inconsistencies in chromosome conformation measurements , 2013, Algorithms for Molecular Biology.

[21]  S. Bicciato,et al.  Comparison of computational methods for Hi-C data analysis , 2017, Nature Methods.

[22]  Kevin J. Emmett,et al.  Topological Data Analysis Generates High-Resolution, Genome-wide Maps of Human Recombination. , 2016, Cell systems.

[23]  Raymond G. Cavalcante,et al.  Identification of Copy Number Aberrations in Breast Cancer Subtypes Using Persistence Topology , 2015, Microarrays.

[24]  Mathieu Carrière,et al.  Topological Data Analysis of Single-cell Hi-C Contact Maps , 2018, bioRxiv.

[25]  Jian Ma,et al.  Mapping 3D genome organization relative to nuclear compartments using TSA-Seq as a cytological ruler , 2018, The Journal of cell biology.

[26]  William Stafford Noble,et al.  Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts , 2014, Genome research.

[27]  William Stafford Noble,et al.  Dynamic reorganization of nuclear architecture during human cardiogenesis , 2017, bioRxiv.

[28]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .