MMDiff: quantitative testing for shape changes in ChIP-Seq data sets

BackgroundCell-specific gene expression is controlled by epigenetic modifications and transcription factor binding. While genome-wide maps for these protein-DNA interactions have become widely available, quantitative comparison of the resulting ChIP-Seq data sets remains challenging. Current approaches to detect differentially bound or modified regions are mainly borrowed from RNA-Seq data analysis, thus focusing on total counts of fragments mapped to a region, ignoring any information encoded in the shape of the peaks.ResultsHere, we present MMDiff, a robust, broadly applicable method for detecting differences between sequence count data sets. Based on quantifying shape changes in signal profiles, it overcomes challenges imposed by the highly structured nature of the data and the paucity of replicates.We first use a simulated data set to compare the performance of MMDiff with results obtained by four alternative methods. We demonstrate that MMDiff excels when peak profiles change between samples. We next use MMDiff to re-analyse a recent data set of the histone modification H3K4me3 elucidating the establishment of this prominent epigenomic marker. Our empirical analysis shows that the method yields reproducible results across experiments, and is able to detect functional important changes in histone modifications. To further explore the broader applicability of MMDiff, we apply it to two ENCODE data sets: one investigating the histone modification H3K27ac and one measuring the genome-wide binding of the transcription factor CTCF. In both cases, MMDiff proves to be complementary to count-based methods. In addition, we can show that MMDiff is capable of directly detecting changes of homotypic binding events at neighbouring binding sites. MMDiff is readily available as a Bioconductor package.ConclusionsOur results demonstrate that higher order features of ChIP-Seq peaks carry relevant and often complementary information to total counts, and hence are important in assessing differential histone modifications and transcription factor binding. We have developed a new computational method, MMDiff, that is capable of exploring these features and therefore closes an existing gap in the analysis of ChIP-Seq data sets.

[1]  D. Skalnik,et al.  CXXC Finger Protein 1 Contains Redundant Functional Domains That Support Embryonic Stem Cell Cytosine Methylation, Histone Methylation, and Differentiation , 2009, Molecular and Cellular Biology.

[2]  Nicole I Bieberstein,et al.  First exon length controls active chromatin signatures and transcription. , 2012, Cell reports.

[3]  P. Park,et al.  Design and analysis of ChIP-seq experiments for DNA-binding proteins , 2008, Nature Biotechnology.

[4]  Marc D. Perry,et al.  ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia , 2012, Genome research.

[5]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[6]  Robert S. Illingworth,et al.  CpG islands influence chromatin structure via the CpG-binding protein Cfp1 , 2010, Nature.

[7]  Ian X. Y. Leung,et al.  Intra- and inter-chromosomal interactions correlate with CTCF binding genome wide , 2010, Molecular systems biology.

[8]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[9]  Stuart K. Kim,et al.  Integrative analysis of C. elegans modENCODE ChIP-seq data sets to infer gene regulatory interactions , 2013, Genome research.

[10]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[11]  Gunnar Rätsch,et al.  Accurate detection of differential RNA processing , 2013, Nucleic acids research.

[12]  Michael D. Wilson,et al.  Five-Vertebrate ChIP-seq Reveals the Evolutionary Dynamics of Transcription Factor Binding , 2010, Science.

[13]  Rory Stark Differential Oestrogen Receptor Binding is Associated with Clinical Outcome in Breast Cancer , 2012, RECOMB.

[14]  Pearlly Yan,et al.  Comparative study on ChIP-seq data: normalization and binding pattern characterization , 2009, Bioinform..

[15]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[16]  David A. Orlando,et al.  Revisiting Global Gene Expression Analysis , 2012, Cell.

[17]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[18]  Olga G. Troyanskaya,et al.  An effective statistical evaluation of ChIPseq dataset similarity , 2012, Bioinform..

[19]  R. Young,et al.  Histone H3K27ac separates active from poised enhancers and predicts developmental state , 2010, Proceedings of the National Academy of Sciences.

[20]  Brian J. Parker,et al.  Systematic Clustering of Transcription Start Site Landscapes , 2011, PloS one.

[21]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[22]  D. Skalnik,et al.  Reduced Genomic Cytosine Methylation and Defective Cellular Differentiation in Embryonic Stem Cells Lacking CpG Binding Protein , 2005, Molecular and Cellular Biology.

[23]  Giacomo Cavalli,et al.  Trithorax group proteins: switching genes on and keeping them active , 2011, Nature Reviews Molecular Cell Biology.

[24]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[25]  D. Skalnik,et al.  CpG-binding Protein (CXXC Finger Protein 1) Is a Component of the Mammalian Set1 Histone H3-Lys4 Methyltransferase Complex, the Analogue of the Yeast Set1/COMPASS Complex* , 2005, Journal of Biological Chemistry.

[26]  Sündüz Keles,et al.  Detecting differential binding of transcription factors with ChIP-seq , 2012, Bioinform..

[27]  Mark D. Robinson,et al.  Moderated statistical tests for assessing differences in tag abundance , 2007, Bioinform..

[28]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[29]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[30]  Winship Herr,et al.  E2F activation of S phase promoters via association with HCF-1 and the MLL family of histone H3K4 methyltransferases. , 2007, Molecular cell.

[31]  I. Ellis,et al.  Differential oestrogen receptor binding is associated with clinical outcome in breast cancer , 2011, Nature.

[32]  David A. Orlando,et al.  Global transcriptional and translational repression in human-embryonic-stem-cell-derived Rett syndrome neurons. , 2013, Cell stem cell.

[33]  P. Park ChIP–seq: advantages and challenges of a maturing technology , 2009, Nature Reviews Genetics.

[34]  M. Facciotti,et al.  Evaluation of Algorithm Performance in ChIP-Seq Peak Detection , 2010, PloS one.

[35]  C. Allis,et al.  Methylation of lysine 4 on histone H3: intricacy of writing and reading a single epigenetic mark. , 2007, Molecular cell.

[36]  T. Hung,et al.  HBO1 HAT complexes target chromatin throughout gene coding regions via multiple PHD finger interactions with histone H3 tail. , 2009, Molecular cell.

[37]  P. Bickel,et al.  Systematic evaluation of factors influencing ChIP-seq fidelity , 2012, Nature Methods.

[38]  Martin Vingron,et al.  Ontologizer 2.0 - a multifunctional tool for GO term enrichment analysis and data exploration , 2008, Bioinform..

[39]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[40]  D. Skalnik,et al.  DNA Methyltransferase protein synthesis is reduced in CXXC finger protein 1-deficient embryonic stem cells. , 2009, DNA and cell biology.

[41]  S. Orkin,et al.  METHOD Open Access , 2014 .

[42]  Z. Weng,et al.  Epigenetic signatures of autism: trimethylated H3K4 landscapes in prefrontal neurons. , 2012, Archives of general psychiatry.

[43]  Y. Benjamini,et al.  Summarizing and correcting the GC content bias in high-throughput sequencing , 2012, Nucleic acids research.

[44]  K. Hansen,et al.  Removing technical variability in RNA-seq data using conditional quantile normalization , 2012, Biostatistics.

[45]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[46]  Adrian Bird,et al.  Cfp1 integrates both CpG content and gene activity for accurate H3K4me3 deposition in embryonic stem cells. , 2012, Genes & development.

[47]  C. Vaziri,et al.  Role of Jade-1 in the Histone Acetyltransferase (HAT) HBO1 Complex*♦ , 2008, Journal of Biological Chemistry.