HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient

Hi-C is a powerful technology for studying genome-wide chromatin interactions. However, current methods for assessing Hi-C data reproducibility can produce misleading results because they ignore spatial features in Hi-C data, such as domain structure and distance dependence. We present HiCRep, a framework for assessing the reproducibility of Hi-C data that systematically accounts for these features. In particular, we introduce a novel similarity measure, the stratum adjusted correlation coefficient (SCC), for quantifying the similarity between Hi-C interaction matrices. Not only does it provide a statistically sound and reliable evaluation of reproducibility, SCC can also be used to quantify differences between Hi-C contact matrices and to determine the optimal sequencing depth for a desired resolution. The measure consistently shows higher accuracy than existing approaches in distinguishing subtle differences in reproducibility and depicting interrelationships of cell lineages. The proposed measure is straightforward to interpret and easy to compute, making it well-suited for providing standardized, interpretable, automatable, and scalable quality control. The freely available R package HiCRep implements our approach.

[1]  R. Fisher 014: On the "Probable Error" of a Coefficient of Correlation Deduced from a Small Sample. , 1921 .

[2]  Nathan Mantel,et al.  Chi-square tests with one degree of freedom , 1963 .

[3]  J. K. Benedetti,et al.  Sampling Behavior of Tests for Correlation in Two-Way Contingency Tables , 1977 .

[4]  C. S. Davis,et al.  Estimation of the average correlation coefficient for stratified bivariate data. , 1999, Statistics in medicine.

[5]  J. Dekker,et al.  Capturing Chromosome Conformation , 2002, Science.

[6]  C. Nusbaum,et al.  Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. , 2006, Genome research.

[7]  B. Steensel,et al.  Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture–on-chip (4C) , 2006, Nature Genetics.

[8]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[9]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[10]  W. Sung,et al.  ChIA-PET tool for comprehensive chromatin interaction analysis with paired-end tag sequencing , 2010, Genome Biology.

[11]  Tom Misteli,et al.  Higher-order genome organization in human disease. , 2010, Cold Spring Harbor perspectives in biology.

[12]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[13]  A. Tanay,et al.  Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture , 2011, Nature Genetics.

[14]  D. Bazett-Jones,et al.  A view of the chromatin landscape. , 2012, Micron.

[15]  L. Mirny,et al.  Higher-order chromatin structure: bridging physics and biology. , 2012, Current opinion in genetics & development.

[16]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[17]  Romain Koszul,et al.  Normalization of a chromosomal contact map , 2012, BMC Genomics.

[18]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[19]  Jesse R. Dixon,et al.  Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions , 2012, Nature.

[20]  Ming Hu,et al.  HiCNorm: removing biases in Hi-C data via Poisson regression , 2012, Bioinform..

[21]  L. Mirny,et al.  Iterative Correction of Hi-C Data Reveals Hallmarks of Chromosome Organization , 2012, Nature Methods.

[22]  E. R. Davies Computer and Machine Vision: Theory, Algorithms, Practicalities , 2012 .

[23]  Yan Li,et al.  A high-resolution map of three-dimensional chromatin interactome in human cells , 2013, Nature.

[24]  L. Mirny,et al.  Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data , 2013, Nature Reviews Genetics.

[25]  Bing Ren,et al.  Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing , 2013, Nature Biotechnology.

[26]  A. Tanay,et al.  Single cell Hi-C reveals cell-to-cell variability in chromosome structure , 2013, Nature.

[27]  Wendy A Bickmore,et al.  The spatial organization of the human genome. , 2013, Annual review of genomics and human genetics.

[28]  Michael Q. Zhang,et al.  Epigenomic Analysis of Multilineage Differentiation of Human Embryonic Stem Cells , 2013, Cell.

[29]  B. Ren,et al.  The 3D genome in transcriptional regulation and pluripotency. , 2014, Cell stem cell.

[30]  William Stafford Noble,et al.  Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts , 2014, Genome research.

[31]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[32]  M. Gobbi,et al.  Analysis of hundreds of cis-regulatory landscapes at high resolution in a single, high-throughput experiment , 2014, Nature Genetics.

[33]  Neva C. Durand,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2014, Cell.

[34]  Noam Kaplan,et al.  The Hitchhiker's guide to Hi-C analysis: practical guidelines. , 2015, Methods.

[35]  Giacomo Cavalli,et al.  The Role of Chromosome Domains in Shaping the Functional Genome , 2015, Cell.

[36]  Michael Q. Zhang,et al.  Integrative analysis of haplotype-resolved epigenomes across human tissues , 2015, Nature.

[37]  Philip A. Ewels,et al.  Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C , 2015, Nature Genetics.

[38]  Jean-Philippe Vert,et al.  HiC-Pro: an optimized and flexible pipeline for Hi-C data processing , 2015, Genome Biology.

[39]  Jing Liang,et al.  Chromatin architecture reorganization during stem cell differentiation , 2015, Nature.

[40]  William Stafford Noble,et al.  Analysis methods for studying the 3D architecture of the genome , 2015, Genome Biology.

[41]  Anthony D. Schmitt,et al.  A Compendium of Chromatin Contact Maps Reveals Spatially Active Regions in the Human Genome. , 2016, Cell reports.

[42]  J. Michael Cherry,et al.  ENCODE data at the ENCODE portal , 2015, Nucleic Acids Res..

[43]  Howard Y. Chang,et al.  HiChIP: efficient and sensitive analysis of protein-directed genome architecture , 2016, Nature Methods.