Combining a wavelet change point and the Bayes factor for analysing chromosomal interaction data.

Over the past few decades we have witnessed great efforts to understand the cellular function at the cytoplasm level. Nowadays there is a growing interest in understanding the relationship between function and structure at the nuclear, chromosomal and sub-chromosomal levels. Data on chromosomal interactions that are now becoming available in unprecedented resolution and scale open the way to address this challenge. Consequently, there is a growing need for new methods and tools that will transform these data into knowledge and insights. Here, we have developed all the steps required for the analysis of chromosomal interaction data (Hi-C data). The result is a methodology which combines a wavelet change point with the Bayes factor for useful correction, segmentation and comparison of Hi-C data. We further developed chromoR, an R package that implements the methods presented here. The chromoR package provides researchers with a means to analyse chromosomal interaction data using statistical bioinformatics, offering a new and comprehensive solution to this task.

[1]  G Bernardi,et al.  The distribution of genes in the human genome. , 1991, Gene.

[2]  A. Tanay,et al.  Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture , 2011, Nature Genetics.

[3]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[4]  M. Speicher,et al.  Complete karyotype characterization of the K562 cell line by combined application of G-banding, multiplex-fluorescence in situ hybridization, fluorescence in situ hybridization, and comparative genomic hybridization. , 2001, Leukemia research.

[5]  Mathieu Blanchette,et al.  Chromatin conformation signatures of cellular differentiation , 2009, Genome Biology.

[6]  L. Mirny,et al.  Iterative Correction of Hi-C Data Reveals Hallmarks of Chromosome Organization , 2012, Nature Methods.

[7]  S. Karlin,et al.  Over- and under-representation of short oligonucleotides in DNA sequences. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[8]  G. Nason,et al.  A Haar-Fisz Algorithm for Poisson Intensity Estimation , 2004 .

[9]  Pietro Liò,et al.  NuChart: An R Package to Study Gene Spatial Neighbourhoods with Multi-Omics Annotations , 2013, PloS one.

[10]  Jesse M. Engreitz,et al.  Three-Dimensional Genome Architecture Influences Partner Selection for Chromosomal Translocations in Human Disease , 2012, PloS one.

[11]  G Bernardi,et al.  Isochores and the evolutionary genomics of vertebrates. , 2000, Gene.

[12]  Michael S. Becker,et al.  Spatial Organization of the Mouse Genome and Its Role in Recurrent Chromosomal Translocations , 2012, Cell.

[13]  Alain Arneodo,et al.  Multiscale analysis of genome-wide replication timing profiles using a wavelet-based signal-processing algorithm , 2012, Nature Protocols.

[14]  Jon Wakefield,et al.  Bayes factors for genome‐wide association studies: comparison with P‐values , 2009, Genetic epidemiology.

[15]  Romain Koszul,et al.  Normalization of a chromosomal contact map , 2012, BMC Genomics.

[16]  M. Suyama,et al.  Prediction of the coding sequences of unidentified human genes. XI. The complete sequences of 100 new cDNA clones from brain which code for large proteins in vitro. , 1998, DNA research : an international journal for rapid publication of reports on genes and genomes.

[17]  M. Fisz The limiting distribution of a function of two independent random variables and its statistical application , 1955 .

[18]  Juliet A. Ellis,et al.  The spatial organization of human chromosomes within the nuclei of normal and emerin-mutant cells. , 2001, Human molecular genetics.

[19]  Jesse R. Dixon,et al.  Topological Domains in Mammalian Genomes Identified by Analysis of Chromatin Interactions , 2012, Nature.

[20]  Maarten Jansen,et al.  Multiscale change point analysis in Poisson count data , 2007 .

[21]  Pietro Liò,et al.  Wavelets in bioinformatics and computational biology: state of art and perspectives , 2003, Bioinform..

[22]  Piotr Fryzlewicz,et al.  Data and text mining Variance stabilization and normalization for one-color microarray data using a data-driven multiscale approach , 2006 .

[23]  Pietro Liò,et al.  CytoHiC: a cytoscape plugin for visual comparison of Hi-C networks , 2013, Bioinform..

[24]  Luc Girard,et al.  Increased expression and no mutation of the Flap endonuclease (FEN1) gene in human lung cancer , 2003, Oncogene.