HiCSR: a Hi-C super-resolution framework for producing highly realistic contact maps

Motivation Hi-C data has enabled the genome-wide study of chromatin folding and architecture, and has led to important discoveries in the structure and function of chromatin conformation. Here, high resolution data plays a particularly important role as many chromatin substructures such as Topologically Associating Domains (TADs) and chromatin loops cannot be adequately studied with low resolution contact maps. However, the high sequencing costs associated with the generation of high resolution Hi-C data has become an experimental barrier. Data driven machine learning models, which allow low resolution Hi-C data to be computationally enhanced, offer a promising avenue to address this challenge. Results By carefully examining the properties of Hi-C maps and integrating various recent advances in deep learning, we developed a Hi-C Super-Resolution (HiCSR) framework capable of accurately recovering the fine details, textures, and substructures found in high resolution contact maps. This was achieved using a novel loss function tailored to the Hi-C enhancement problem which optimizes for an adversarial loss from a Generative Adversarial Network (GAN), a feature reconstruction loss derived from the latent representation of a denoising autoencoder, and a pixel-wise loss. Not only can the resulting framework generate enhanced Hi-C maps more visually similar to the original high resolution maps, it also excels on a suite of reproducibility metrics produced by members of the ENCODE Consortium compared to existing approaches, including HiCPlus, HiCNN, hicGAN and DeepHiC. Finally, we demonstrate that HiCSR is capable of enhancing Hi-C data across sequencing depth, cell types, and species, recovering biologically significant contact domain boundaries. Availability We make our implementation available for download at: https://github.com/PSI-Lab/HiCSR Contact ljlee@psi.toronto.edu Supplementary information Available Online

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Hairong Lv,et al.  hicGAN infers super resolution Hi-C data with generative adversarial networks , 2019, Bioinform..

[3]  Mark Gerstein,et al.  Measuring the reproducibility and quality of Hi-C data , 2017, Genome Biology.

[4]  Michael E.G. Sauria,et al.  QuASAR: Quality Assessment of Spatial Arrangement Reproducibility in Hi-C Data , 2017, bioRxiv.

[5]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[6]  Neva C. Durand,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2014, Cell.

[7]  Mark Gerstein,et al.  HiC-spector: a matrix library for spectral and reproducibility analysis of Hi-C contact maps , 2016, bioRxiv.

[8]  William Stafford Noble,et al.  HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient , 2017, bioRxiv.

[9]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Leon A. Gatys,et al.  Texture Synthesis Using Convolutional Neural Networks , 2015, NIPS.

[12]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Yu-Bin Yang,et al.  Image Restoration Using Convolutional Auto-encoders with Symmetric Skip Connections , 2016, ArXiv.

[14]  A. Visel,et al.  Disruptions of Topological Chromatin Domains Cause Pathogenic Rewiring of Gene-Enhancer Interactions , 2015, Cell.

[15]  L. Chin,et al.  HiCPlotter integrates genomic data with interaction matrices , 2015, Genome Biology.

[16]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[17]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[18]  Y. Mo,et al.  TADKB: Family classification and a knowledge base of topologically associating domains , 2019, BMC Genomics.

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  J. Dekker,et al.  Condensin-Driven Remodeling of X-Chromosome Topology during Dosage Compensation , 2015, Nature.

[21]  Mark Gerstein,et al.  Measuring the reproducibility and quality of Hi-C data , 2017 .

[22]  Jesse R. Dixon,et al.  Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions , 2012, Nature.

[23]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[24]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[25]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[26]  Hao Li,et al.  DeepHiC: A generative adversarial network for enhancing Hi-C data resolution , 2019, bioRxiv.

[27]  Christopher J. F. Cameron,et al.  HIFI: estimating DNA-DNA interaction frequency from Hi-C data at restriction-fragment resolution , 2020, Genome Biology.

[28]  Bing He,et al.  Identifying topologically associating domains and subdomains by Gaussian Mixture model And Proportion test , 2017, Nature Communications.

[29]  Tong Liu,et al.  HiCNN: a very deep convolutional neural network to better enhance the resolution of Hi-C data , 2019, Bioinform..

[30]  Bo Zhang,et al.  Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus , 2018, Nature Communications.

[31]  S. Mundlos,et al.  Formation of new chromatin domains determines pathogenicity of genomic duplications , 2016, Nature.

[32]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[33]  Jean-Philippe Vert,et al.  HiC-Pro: an optimized and flexible pipeline for Hi-C data processing , 2015, Genome Biology.

[34]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[35]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[36]  Anthony D. Schmitt,et al.  Genome-wide mapping and analysis of chromosome architecture , 2016, Nature Reviews Molecular Cell Biology.

[37]  William Stafford Noble,et al.  GenomeDISCO: A concordance score for chromosome conformation capture experiments using random walks on contact map graphs , 2017, bioRxiv.

[38]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[39]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40]  William Stafford Noble,et al.  Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts , 2014, Genome research.

[41]  Yu-Bin Yang,et al.  Single Image Super-Resolution via Perceptual Loss Guided by Denoising Auto-Encoder , 2018, PRICAI.