Systematic clustering algorithm for chromatin accessibility data and its application to hematopoietic cells

The huge amount of data acquired by high-throughput sequencing requires data reduction for effective analysis. Here we give a clustering algorithm for genome-wide open chromatin data using a new data reduction method. This method regards the genome as a string of 1s and 0s based on a set of peaks and calculates the Hamming distances between the strings. This algorithm with the systematically optimized set of peaks enables us to quantitatively evaluate differences between samples of hematopoietic cells and classify cell types, potentially leading to a better understanding of leukemia pathogenesis.

[1]  Charles R. M. Bangham,et al.  CADM1/TSLC1 Identifies HTLV-1-Infected Cells and Determines Their Susceptibility to CTL-Mediated Lysis , 2016, PLoS pathogens.

[2]  Shiqi Tu,et al.  An introduction to computational tools for differential binding analysis with ChIP-seq data , 2017, Quantitative Biology.

[3]  Denne Reed,et al.  Fossils from Mille-Logya, Afar, Ethiopia, elucidate the link between Pliocene environmental changes and Homo origins , 2020, Nature Communications.

[4]  Renata Walewska,et al.  Chromatin accessibility maps of chronic lymphocytic leukaemia identify subtype-specific epigenome signatures and transcription regulatory networks , 2016, Nature Communications.

[5]  M. Ramalho-Santos,et al.  Open chromatin in pluripotency and reprogramming , 2010, Nature Reviews Molecular Cell Biology.

[6]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[7]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[8]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[9]  Daniel Müllner,et al.  Modern hierarchical, agglomerative clustering algorithms , 2011, ArXiv.

[10]  Manolis Kellis,et al.  ChromHMM: automating chromatin-state discovery and characterization , 2012, Nature Methods.

[11]  Chuan He,et al.  Fate by RNA methylation: m6A steers stem cell pluripotency , 2015, Genome Biology.

[12]  P. Vyas,et al.  Coexistence of LMPP-like and GMP-like leukemia stem cells in acute myeloid leukemia. , 2011, Cancer cell.

[13]  M. Shimoyama,et al.  Diagnostic criteria and classification of clinical subtypes of adult T‐cell leukaemia‐lymphoma , 1991, British journal of haematology.

[14]  Warren D. Sharp,et al.  Comment on “The earliest modern humans outside Africa” , 2018, Science.

[15]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[16]  Christoph Bock,et al.  Chromatin mapping and single-cell immune profiling define the temporal dynamics of ibrutinib response in CLL , 2020, Nature Communications.

[17]  Howard Y. Chang,et al.  Single-cell chromatin accessibility reveals principles of regulatory variation , 2015, Nature.

[18]  Rafael A. Irizarry,et al.  quantro: a data-driven approach to guide the choice of an appropriate normalization method , 2015, Genome Biology.

[19]  Manuel Llinás,et al.  Open Source Drug Discovery with the Malaria Box Compound Collection for Neglected Diseases and Beyond , 2016, PLoS pathogens.

[20]  Kuan-Teh Jeang,et al.  Human T-cell leukaemia virus type 1 (HTLV-1) infectivity and cellular transformation , 2007, Nature Reviews Cancer.

[21]  Rory Stark Differential Oestrogen Receptor Binding is Associated with Clinical Outcome in Breast Cancer , 2012, RECOMB.

[22]  Michael W. Pfaffl,et al.  Normalization Strategies for Microrna Profiling Experiments: a 'normal' Way to a Hidden Layer of Complexity? , 2010 .

[23]  Howard Y. Chang,et al.  Chromatin Accessibility Landscape of Cutaneous T Cell Lymphoma and Dynamic Response to HDAC Inhibitors. , 2017, Cancer cell.

[24]  Mauro A. A. Castro,et al.  The chromatin accessibility landscape of primary human cancers , 2018, Science.

[25]  David R. Powell,et al.  From reads to insight: a hitchhiker’s guide to ATAC-seq data analysis , 2020, Genome Biology.

[26]  Alicia N. Schep,et al.  Nfib Promotes Metastasis through a Widespread Increase in Chromatin Accessibility , 2016, Cell.

[27]  P Rudge,et al.  In vivo cellular tropism of human T-cell leukemia virus type 1 , 1990, Journal of virology.

[28]  J. Stamatoyannopoulos,et al.  Chromatin accessibility pre-determines glucocorticoid receptor binding patterns , 2011, Nature Genetics.

[29]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[30]  Patrick Lombard,et al.  CODEX: a next-generation sequencing experiment database for the haematopoietic and embryonic stem cell communities , 2014, Nucleic Acids Res..

[31]  Sandy L. Klemm,et al.  Chromatin accessibility and the regulatory epigenome , 2019, Nature Reviews Genetics.

[32]  Howard Y. Chang,et al.  Lineage-specific and single cell chromatin accessibility charts human hematopoiesis and leukemia evolution , 2016, Nature Genetics.

[33]  Francine E. Garrett-Bakelman,et al.  CD99 is a therapeutic target on disease stem cells in myeloid malignancies , 2017, Science Translational Medicine.

[34]  Nathan C. Sheffield,et al.  The accessible chromatin landscape of the human genome , 2012, Nature.

[35]  A Okayama,et al.  Clinical significance of CADM1/TSLC1/IgSF4 expression in adult T-cell leukemia/lymphoma , 2012, Leukemia.

[36]  Howard Y. Chang,et al.  Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position , 2013, Nature Methods.