Impact of data resolution on three-dimensional structure inference methods

BackgroundAssays that are capable of detecting genome-wide chromatin interactions have produced massive amount of data and led to great understanding of the chromosomal three-dimensional (3D) structure. As technology becomes more sophisticated, higher-and-higher resolution data are being produced, going from the initial 1 Megabases (Mb) resolution to the current 10 Kilobases (Kb) or even 1 Kb resolution. The availability of genome-wide interaction data necessitates development of analytical methods to recover the underlying 3D spatial chromatin structure, but challenges abound. Most of the methods were proposed for analyzing data at low resolution (1 Mb). Their behaviors are thus unknown for higher resolution data. For such data, one of the key features is the high proportion of “0” contact counts among all available data, in other words, the excess of zeros.ResultsTo address the issue of excess of zeros, in this paper, we propose a truncated Random effect EXpression (tREX) method that can handle data at various resolutions. We then assess the performance of tREX and a number of leading existing methods for recovering the underlying chromatin 3D structure. This was accomplished by creating in-silico data to mimic multiple levels of resolution and submit the methods to a “stress test”. Finally, we applied tREX and the comparison methods to a Hi-C dataset for which FISH measurements are available to evaluate estimation accuracy.ConclusionThe proposed tREX method achieves consistently good performance in all 30 simulated settings considered. It is not only robust to resolution level and underlying parameters, but also insensitive to model misspecification. This conclusion is based on observations made in terms of 3D structure estimation accuracy and preservation of topologically associated domains. Application of the methods to the human lymphoblastoid cell line data on chromosomes 14 and 22 further substantiates the superior performance of tREX: the constructed 3D structure from tREX is consistent with the FISH measurements, and the corresponding distances predicted by tREX have higher correlation with the FISH measurements than any of the comparison methods.SoftwareAn open-source R-package is available at http://www.stat.osu.edu/~statgen/Software/tRex.

[1]  Hideki Tanizawa,et al.  Mapping of long-range associations throughout the fission yeast genome reveals global genome organization linked to transcriptional regulation , 2010, Nucleic acids research.

[2]  Mathieu Blanchette,et al.  Chromatin conformation signatures of cellular differentiation , 2009, Genome Biology.

[3]  Z. Yakhini,et al.  Spatial localization of co-regulated genes exceeds genomic gene clustering in the Saccharomyces cerevisiae genome , 2013, Nucleic acids research.

[4]  Mathieu Blanchette,et al.  Three-dimensional modeling of chromatin structure from interaction frequency data using Markov chain Monte Carlo sampling , 2011, BMC Bioinformatics.

[5]  A. Lesne,et al.  3D genome reconstruction from chromosomal contacts , 2014, Nature Methods.

[6]  Ming Hu,et al.  Bayesian Inference of Spatial Organizations of Chromosomes , 2013, PLoS Comput. Biol..

[7]  William Stafford Noble,et al.  A statistical approach for inferring the 3D structure of the genome , 2014, Bioinform..

[8]  Kim-Chuan Toh,et al.  Inference of Spatial Organizations of Chromosomes Using Semi-definite Embedding Approach and Hi-C Data , 2013, RECOMB.

[9]  Shili Lin,et al.  Statistical Inference on Three-Dimensional Structure of Genome by Truncated Poisson Architecture Model , 2015 .

[10]  David B. Dunson,et al.  Bayesian data analysis, third edition , 2013 .

[11]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[12]  A. Tanay,et al.  Single cell Hi-C reveals cell-to-cell variability in chromosome structure , 2013, Nature.

[13]  E. Liu,et al.  An Oestrogen Receptor α-bound Human Chromatin Interactome , 2009, Nature.

[14]  Ming Hu,et al.  HiCNorm: removing biases in Hi-C data via Poisson regression , 2012, Bioinform..

[15]  Jesse R. Dixon,et al.  Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions , 2012, Nature.

[16]  J. Lawrence,et al.  The three-dimensional folding of the α-globin gene domain reveals formation of chromatin globules , 2011, Nature Structural &Molecular Biology.

[17]  Neva C. Durand,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2014, Cell.

[18]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[19]  Reza Kalhor,et al.  Genome architectures revealed by tethered chromosome conformation capture and population-based modeling , 2011, Nature Biotechnology.