Generative modeling of multi-mapping reads with mHi-C advances analysis of Hi-C studies

Current Hi-C analysis approaches are unable to account for reads that align to multiple locations, and hence underestimate biological signal from repetitive regions of genomes. We developed and validated mHi-C, a multi-read mapping strategy to probabilistically allocate Hi-C multi-reads. mHi-C exhibited superior performance over utilizing only uni-reads and heuristic approaches aimed at rescuing multi-reads on benchmarks. Specifically, mHi-C increased the sequencing depth by an average of 20% resulting in higher reproducibility of contact matrices and detected interactions across biological replicates. The impact of the multi-reads on the detection of significant interactions is influenced marginally by the relative contribution of multi-reads to the sequencing depth compared to uni-reads, cis-to-trans ratio of contacts, and the broad data quality as reflected by the proportion of mappable reads of datasets. Computational experiments highlighted that in Hi-C studies with short read lengths, mHi-C rescued multi-reads can emulate the effect of longer reads. mHi-C also revealed biologically supported bona fide promoter-enhancer interactions and topologically associating domains involving repetitive genomic regions, thereby unlocking a previously masked portion of the genome for conformation capture studies.

[1]  Aaron T. L. Lun,et al.  diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data , 2015, BMC Bioinformatics.

[2]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[3]  David Haussler,et al.  The UCSC Genome Browser database: 2017 update , 2016, Nucleic Acids Res..

[4]  C. Glass,et al.  Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. , 2010, Molecular cell.

[5]  Fredrick R. Schumacher,et al.  Modeling disease risk through analysis of physical interactions between genetic variants within chromatin regulatory circuitry , 2016, Nature Genetics.

[6]  Tiziana Bonaldi,et al.  Polycomb-dependent H3K27me1 and H3K27me2 regulate active transcription and enhancer fidelity. , 2014, Molecular cell.

[7]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..

[8]  William Stafford Noble,et al.  HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient , 2017, bioRxiv.

[9]  Jonathan M. Cairns,et al.  Lineage-Specific Genome Architecture Links Enhancers and Non-coding Disease Variants to Target Gene Promoters , 2016, Cell.

[10]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[11]  Philip A. Ewels,et al.  Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C , 2015, Nature Genetics.

[12]  A. Pombo,et al.  Three-dimensional genome architecture: players and mechanisms , 2015, Nature Reviews Molecular Cell Biology.

[13]  Keith L. Ligon,et al.  DNA hypomethylation within specific transposable element families associates with tissue-specific enhancer landscape , 2013, Nature Genetics.

[14]  Yi Xing,et al.  CLIP-seq analysis of multi-mapped reads discovers novel functional RNA regulatory sites in the human transcriptome , 2017, Nucleic acids research.

[15]  Phillip A. Richmond,et al.  JASPAR 2020: update of the open-access database of transcription factor binding profiles , 2019, Nucleic Acids Res..

[16]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[17]  S. Mundlos,et al.  Structural variation in the 3D genome , 2018, Nature Reviews Genetics.

[18]  Houda Belaghzal,et al.  Hi-C 2.0: An Optimized Hi-C Procedure for High-Resolution Genome-Wide Mapping of Chromosome Conformation , 2016, bioRxiv.

[19]  J. Dekker,et al.  Capturing Chromosome Conformation , 2002, Science.

[20]  Tsviya Olender,et al.  GeneCards Version 3: the human gene integrator , 2010, Database J. Biol. Databases Curation.

[21]  V. Corces,et al.  CTCF: an architectural protein bridging genome topology and function , 2014, Nature Reviews Genetics.

[22]  Kathryn O'Neill,et al.  Mobile genomics: tools and techniques for tackling transposons , 2020, Philosophical Transactions of the Royal Society B.

[23]  William Stafford Noble,et al.  Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts , 2014, Genome research.

[24]  Matteo Pellegrini,et al.  High-Resolution Mapping of Chromatin Conformation in Cardiac Myocytes Reveals Structural Remodeling of the Epigenome in Heart Failure , 2017, Circulation.

[25]  A. Tanay,et al.  Multiscale 3D Genome Rewiring during Mouse Neural Development , 2017, Cell.

[26]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[27]  Neva C. Durand,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2014, Cell.

[28]  Bing Ren,et al.  The Three-Dimensional Organization of Mammalian Genomes. , 2017, Annual review of cell and developmental biology.

[29]  Ye Zheng,et al.  Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping , 2015, PLoS Comput. Biol..

[30]  James T. Robinson,et al.  Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. , 2016, Cell systems.

[31]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[32]  Qi Zheng,et al.  HIPPIE: a high-throughput identification pipeline for promoter interacting enhancer elements , 2015, Bioinform..

[33]  William Stafford Noble,et al.  Changes in genome organization of parasite-specific gene families during the Plasmodium transmission stages , 2018, Nature Communications.

[34]  David J. Arenillas,et al.  JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework , 2017, Nucleic acids research.

[35]  Perry Evans,et al.  The BET Protein BRD2 Cooperates with CTCF to Enforce Transcriptional and Architectural Boundaries. , 2017, Molecular cell.

[36]  Daniel Ruiz,et al.  A Fast Algorithm for Matrix Balancing , 2013, Web Information Retrieval and Linear Algebra Algorithms.

[37]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[38]  William Stafford Noble,et al.  Integrative detection and analysis of structural variation in cancer genomes , 2018, Nature Genetics.

[39]  David Haussler,et al.  The UCSC Genome Browser database: 2018 update , 2017, Nucleic Acids Res..

[40]  S. Salzberg,et al.  Repetitive DNA and next-generation sequencing: computational challenges and solutions , 2011, Nature Reviews Genetics.

[41]  D. Duboule,et al.  Topology of mammalian developmental enhancers and their regulatory landscapes , 2013, Nature.

[42]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[43]  Jean-Philippe Vert,et al.  HiC-Pro: an optimized and flexible pipeline for Hi-C data processing , 2015, Genome Biology.

[44]  Yan Li,et al.  A high-resolution map of three-dimensional chromatin interactome in human cells , 2013, Nature.

[45]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[46]  Dariusz M Plewczynski,et al.  CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription , 2015, Cell.

[47]  David Haussler,et al.  The Human Epigenome Browser at Washington University , 2011, Nature Methods.

[48]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[49]  S. Bicciato,et al.  Comparison of computational methods for Hi-C data analysis , 2017, Nature Methods.

[50]  L. Mirny,et al.  Iterative Correction of Hi-C Data Reveals Hallmarks of Chromosome Organization , 2012, Nature Methods.

[51]  William Stafford Noble,et al.  Analysis methods for studying the 3D architecture of the genome , 2015, Genome Biology.

[52]  Daning Lu,et al.  Chromosome conformation elucidates regulatory relationships in developing human brain , 2016, Nature.

[53]  Mark Gerstein,et al.  Measuring the reproducibility and quality of Hi-C data , 2017 .

[54]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[55]  Jesse R. Dixon,et al.  Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions , 2012, Nature.

[56]  Colin N. Dewey,et al.  Discovering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data , 2011, PLoS Comput. Biol..

[57]  Thomas G. Gilgenast,et al.  Disease-Associated Short Tandem Repeats Co-localize with Chromatin Domain Boundaries , 2018, Cell.

[58]  Qi Zhang,et al.  CNV-guided multi-read allocation for ChIP-seq , 2014, Bioinform..

[59]  A. Cournac,et al.  The 3D folding of metazoan genomes correlates with the association of similar repetitive elements , 2015, Nucleic acids research.

[60]  William Stafford Noble,et al.  Three-dimensional modeling of the P. falciparum genome during the erythrocytic cycle reveals a strong connection between genome architecture and gene expression , 2014, Genome research.

[61]  Job Dekker,et al.  Hi-C 2.0: An optimized Hi-C procedure for high-resolution genome-wide mapping of chromosome conformation. , 2017, Methods.

[62]  Peter J. Park,et al.  HiGlass: Web-based visual comparison and exploration of genome interaction maps , 2017 .

[63]  Jacob M. Luber,et al.  HiGlass: web-based visual exploration and analysis of genome interaction maps , 2017, Genome Biology.

[64]  Jing Liang,et al.  Chromatin architecture reorganization during stem cell differentiation , 2015, Nature.