GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in Hi-C data

Hi-C is one of the main methods for investigating spatial co-localisation of DNA in the nucleus. However, the raw sequencing data obtained from Hi-C experiments suffer from large biases and spurious contacts, making it difficult to identify true interactions. Existing methods use complex models to account for biases and do not provide a significance threshold for detecting interactions. Here we introduce a simple binomial probabilistic model that resolves complex biases and distinguishes between true and false interactions. The model corrects biases of known and unknown origin and yields a p-value for each interaction, providing a reliable threshold based on significance. We demonstrate this experimentally by testing the method against a random ligation dataset. Our method outperforms previous methods and provides a statistical framework for further data analysis, such as comparisons of Hi-C interactions between different conditions. GOTHiC is available as a BioConductor package (http://www.bioconductor.org/packages/release/bioc/html/GOTHiC.html).

[1]  William Stafford Noble,et al.  A Three-Dimensional Model of the Yeast Genome , 2010, Nature.

[2]  L. Mirny,et al.  Iterative Correction of Hi-C Data Reveals Hallmarks of Chromosome Organization , 2012, Nature Methods.

[3]  A. Tanay,et al.  Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture , 2011, Nature Genetics.

[4]  Frank Alber,et al.  Hi-Corrector: a fast, scalable and memory-efficient package for normalizing large-scale Hi-C data , 2014, Bioinform..

[5]  Ming Hu,et al.  HiCNorm: removing biases in Hi-C data via Poisson regression , 2012, Bioinform..

[6]  Céline Lévy-Leduc,et al.  Two-dimensional segmentation for analyzing Hi-C data , 2014, Bioinform..

[7]  Neva C. Durand,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2014, Cell.

[8]  Wouter Meuleman,et al.  Chromatin Position Effects Assayed by Thousands of Reporters Integrated in Parallel , 2013, Cell.

[9]  Jesse R. Dixon,et al.  Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions , 2012, Nature.

[10]  Philip A. Ewels,et al.  Global Reorganization of the Nuclear Landscape in Senescent Cells , 2015, Cell reports.

[11]  William Stafford Noble,et al.  Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts , 2014, Genome research.

[12]  J. Dekker,et al.  Capturing Chromosome Conformation , 2002, Science.

[13]  Yoli Shavit,et al.  Combining a wavelet change point and the Bayes factor for analysing chromosomal interaction data. , 2014, Molecular bioSystems.

[14]  W. D. Laat,et al.  A Decade of 3c Technologies: Insights into Nuclear Organization References , 2022 .

[15]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[16]  Yan Li,et al.  A high-resolution map of three-dimensional chromatin interactome in human cells , 2013, Nature.

[17]  J. Lawrence,et al.  The three-dimensional folding of the α-globin gene domain reveals formation of chromatin globules , 2011, Nature Structural &Molecular Biology.