Generative Modeling of Multi-mapping Reads with mHi-C Advances Analysis of High Throughput Genome-wide Conformation Capture Studies

Current Hi-C analysis approaches are unable to account for reads that align to multiple locations, and hence underestimate biological signal from repetitive regions of genomes. We developed mHi-C, a multi-read mapping strategy to probabilistically allocate Hi-C multi-reads. mHi-C exhibited superior performance over utilizing only unireads and heuristic approaches aimed at rescuing multi-reads on benchmarks. Specifically, mHi-C increased the sequencing depth by an average of 20% leading to higher reproducibility of contact matrices and larger number of significant contacts across biological replicates. mHi-C also revealed biologically supported bona fide promoter-enhancer interactions and topologically associating domains involving repetitive genomic regions, thereby unlocking a previously masked portion of the genome for conformation capture studies.

[1]  Tsviya Olender,et al.  GeneCards Version 3: the human gene integrator , 2010, Database J. Biol. Databases Curation.

[2]  J. Dekker,et al.  Capturing Chromosome Conformation , 2002, Science.

[3]  Qi Zheng,et al.  HIPPIE: a high-throughput identification pipeline for promoter interacting enhancer elements , 2015, Bioinform..

[4]  William Stafford Noble,et al.  Changes in genome organization of parasite-specific gene families during the Plasmodium transmission stages , 2018, Nature Communications.

[5]  David Haussler,et al.  The Human Epigenome Browser at Washington University , 2011, Nature Methods.

[6]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[7]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[8]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[9]  L. Mirny,et al.  Iterative Correction of Hi-C Data Reveals Hallmarks of Chromosome Organization , 2012, Nature Methods.

[10]  Mark Gerstein,et al.  Measuring the reproducibility and quality of Hi-C data , 2017 .

[11]  Peter J. Park,et al.  HiGlass: Web-based visual comparison and exploration of genome interaction maps , 2017 .

[12]  Jesse R. Dixon,et al.  Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions , 2012, Nature.

[13]  Aaron T. L. Lun,et al.  diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data , 2015, BMC Bioinformatics.

[14]  S. Bicciato,et al.  Comparison of computational methods for Hi-C data analysis , 2017, Nature Methods.

[15]  Yan Li,et al.  A high-resolution map of three-dimensional chromatin interactome in human cells , 2013, Nature.

[16]  Colin N. Dewey,et al.  Discovering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data , 2011, PLoS Comput. Biol..

[17]  William Stafford Noble,et al.  Three-dimensional modeling of the P. falciparum genome during the erythrocytic cycle reveals a strong connection between genome architecture and gene expression , 2014, Genome research.

[18]  Job Dekker,et al.  Hi-C 2.0: An optimized Hi-C procedure for high-resolution genome-wide mapping of chromosome conformation. , 2017, Methods.

[19]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[20]  Perry Evans,et al.  The BET Protein BRD2 Cooperates with CTCF to Enforce Transcriptional and Architectural Boundaries. , 2017, Molecular cell.

[21]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[22]  Philip A. Ewels,et al.  Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C , 2015, Nature Genetics.

[23]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[24]  Dariusz M Plewczynski,et al.  CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription , 2015, Cell.

[25]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[26]  Neva C. Durand,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2014, Cell.

[27]  Bing Ren,et al.  The Three-Dimensional Organization of Mammalian Genomes. , 2017, Annual review of cell and developmental biology.

[28]  Ye Zheng,et al.  Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping , 2015, PLoS Comput. Biol..

[29]  David J. Arenillas,et al.  JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework , 2017, Nucleic acids research.

[30]  Keith L. Ligon,et al.  DNA hypomethylation within specific transposable element families associates with tissue-specific enhancer landscape , 2013, Nature Genetics.

[31]  Yi Xing,et al.  CLIP-seq analysis of multi-mapped reads discovers novel functional RNA regulatory sites in the human transcriptome , 2017, Nucleic acids research.

[32]  Leonid A. Mirny,et al.  Emerging Evidence of Chromosome Folding by Loop Extrusion , 2018, bioRxiv.

[33]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[34]  M. Fullwood,et al.  Gene neighbourhood integrity disrupted by CTCF loss in vivo , 2017, bioRxiv.

[35]  Jing Liang,et al.  Chromatin architecture reorganization during stem cell differentiation , 2015, Nature.

[36]  V. Corces,et al.  CTCF: an architectural protein bridging genome topology and function , 2014, Nature Reviews Genetics.

[37]  Matteo Pellegrini,et al.  High-Resolution Mapping of Chromatin Conformation in Cardiac Myocytes Reveals Structural Remodeling of the Epigenome in Heart Failure , 2017, Circulation.

[38]  Tiziana Bonaldi,et al.  Polycomb-dependent H3K27me1 and H3K27me2 regulate active transcription and enhancer fidelity. , 2014, Molecular cell.

[39]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..

[40]  William Stafford Noble,et al.  HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient , 2017, bioRxiv.

[41]  C. Glass,et al.  Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. , 2010, Molecular cell.

[42]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[43]  A. Pombo,et al.  Three-dimensional genome architecture: players and mechanisms , 2015, Nature Reviews Molecular Cell Biology.

[44]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[45]  J. Dekker,et al.  Structural and functional diversity of Topologically Associating Domains , 2015, FEBS letters.

[46]  William Stafford Noble,et al.  Analysis methods for studying the 3D architecture of the genome , 2015, Genome Biology.

[47]  Nuno A. Fonseca,et al.  Two independent modes of chromatin organization revealed by cohesin removal , 2017, Nature.

[48]  Jean-Philippe Vert,et al.  HiC-Pro: an optimized and flexible pipeline for Hi-C data processing , 2015, Genome Biology.

[49]  D. Duboule,et al.  Topology of mammalian developmental enhancers and their regulatory landscapes , 2013, Nature.

[50]  William Stafford Noble,et al.  Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts , 2014, Genome research.