normR: Regime enrichment calling for ChIP-seq data

ChIP-seq probes genome-wide localization of DNA-associated proteins. To mitigate technical biases ChIP-seq read densities are normalized to read densities obtained by a control. Our statistical framework “normR” achieves a sensitive normalization by accounting for the effect of putative protein-bound regions on the overall read statistics. Here, we demonstrate normR’s suitability in three studies: (i) calling enrichment for high (H3K4me3) and low (H3K36me3) signal-to-ratio data; (ii) identifying two previously undescribed H3K27me3 and H3K9me3 heterochromatic regimes of broad and peak enrichment; and (iii) calling differential H3K4me3 or H3K27me3-enrichment between HepG2 hepatocarcinoma cells and primary human Hepatocytes. normR is readily available on http://bioconductor.org/packages/normr

[1]  Tobias Straub,et al.  Active promoters give rise to false positive ‘Phantom Peaks’ in ChIP-seq experiments , 2015, Nucleic acids research.

[2]  Michael Q. Zhang,et al.  Genome-Wide Localization of Protein-DNA Binding and Histone Modification by a Bayesian Change-Point Method with ChIP-seq Data , 2012, PLoS Comput. Biol..

[3]  C. Allis,et al.  DNMT3L connects unmethylated lysine 4 of histone H3 to de novo methylation of DNA , 2007, Nature.

[4]  Else Steenbuch Yde [Signal noise]. , 2004, Ugeskrift for laeger.

[5]  Jiyong Wang,et al.  New Insights into the Regulation of Heterochromatin. , 2016, Trends in genetics : TIG.

[6]  Brian T. Lee,et al.  The UCSC Genome Browser database: 2015 update , 2014, Nucleic Acids Res..

[7]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[8]  R. Myers,et al.  An Integrated Software System for Analyzing Chip-chip and Chip-seq Data (supplementary Information) , 2008 .

[9]  Jonathan Pevsner,et al.  Chromosomal variation in lymphoblastoid cell lines , 2012, Human mutation.

[10]  D. Brutlag,et al.  A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[11]  P. Laird,et al.  Bis-SNP: Combined DNA methylation and SNP calling for Bisulfite-seq data , 2012, Genome Biology.

[12]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[13]  Sunil Kumar,et al.  Probabilistic partitioning methods to find significant patterns in ChIP-Seq data , 2014, Bioinform..

[14]  Fidel Ramírez,et al.  deepTools2: a next generation web server for deep-sequencing data analysis , 2016, Nucleic Acids Res..

[15]  E. Levanon,et al.  Human housekeeping genes, revisited. , 2013, Trends in genetics : TIG.

[16]  Martin Vingron,et al.  histoneHMM: Differential analysis of histone modifications with broad genomic footprints , 2015, BMC Bioinformatics.

[17]  E. Furlong,et al.  Tissue-specific analysis of chromatin state identifies temporal signatures of enhancer activity during embryonic development , 2012, Nature Genetics.

[18]  Guo-Liang Xu,et al.  The N-terminus of histone H3 is required for de novo DNA methylation in chromatin , 2009, Proceedings of the National Academy of Sciences.

[19]  Ann Dean,et al.  Distinctive Signatures of Histone Methylation in Transcribed Coding and Noncoding Human β-Globin Sequences , 2006, Molecular and Cellular Biology.

[20]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[21]  Wing-Kin Sung,et al.  Inherent Signals in Sequencing-Based Chromatin-ImmunoPrecipitation Control Libraries , 2009, PloS one.

[22]  Ivan G. Costa,et al.  Detecting differential peaks in ChIP-seq signals with ODIN , 2015, Bioinform..

[23]  Ivan G. Costa,et al.  Detecting differential peaks in ChIP-seq signals with ODIN , 2014, Bioinform..

[24]  Jun S. Song,et al.  Statistical Applications in Genetics and Molecular Biology Normalization , bias correction , and peak calling for ChIP-seq , 2012 .

[25]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[26]  R. Weinberg,et al.  E2F-4 and E2F-5, two members of the E2F family, are expressed in the early phases of the cell cycle. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Feng Lin,et al.  A signal-noise model for significance analysis of ChIP-seq with negative control , 2010, Bioinform..

[28]  Javier De Las Rivas,et al.  Functional Gene Networks: R/Bioc package to generate and analyse gene networks derived from functional enrichment and clustering , 2013, Bioinform..

[29]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[30]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[31]  D. Reinberg,et al.  Histone methyltransferase activity associated with a human multiprotein complex containing the Enhancer of Zeste protein. , 2002, Genes & development.

[32]  Cesare Furlanello,et al.  A promoter-level mammalian expression atlas , 2015 .

[33]  Mark Gerstein,et al.  MUSIC: identification of enriched regions in ChIP-Seq experiments using a mappability-corrected multiscale signal processing framework , 2014, Genome Biology.

[34]  Vincent De Guire,et al.  An E2F/miR-20a Autoregulatory Feedback Loop* , 2007, Journal of Biological Chemistry.

[35]  Ho-Ryun Chung,et al.  reChIP-seq reveals widespread bivalency of H3K4me3 and H3K27me3 in CD4+ memory T cells , 2016, Nature Communications.

[36]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[37]  P. Park,et al.  Design and analysis of ChIP-seq experiments for DNA-binding proteins , 2008, Nature Biotechnology.

[38]  Brigitte Wild,et al.  Histone Methyltransferase Activity of a Drosophila Polycomb Group Repressor Complex , 2002, Cell.

[39]  Alexander van Oudenaarden,et al.  Highly expressed loci are vulnerable to misleading ChIP localization of multiple unrelated proteins , 2013, Proceedings of the National Academy of Sciences.

[40]  Feng Lin,et al.  An HMM approach to genome-wide identification of differential histone modification sites from ChIP-seq data , 2008, Bioinform..

[41]  Adam S. Hayward,et al.  Recent advances in 2D and 3D in vitro systems using primary hepatocytes, alternative hepatocyte sources and non-parenchymal liver cells and their use in investigating mechanisms of hepatotoxicity, cell signaling and ADME , 2013, Archives of Toxicology.

[42]  Yong Zhang,et al.  Identifying ChIP-seq enrichment using MACS , 2012, Nature Protocols.

[43]  Julia A. Lasserre,et al.  Histone modification levels are predictive for gene expression , 2010, Proceedings of the National Academy of Sciences.

[44]  Clifford A. Meyer,et al.  Identifying and mitigating bias in next-generation sequencing methods for chromatin biology , 2014, Nature Reviews Genetics.

[45]  Sündüz Keles,et al.  Normalization of ChIP-seq data with control , 2012, BMC Bioinformatics.

[46]  Nathaniel D. Heintzman,et al.  Histone modifications at human enhancers reflect global cell-type-specific gene expression , 2009, Nature.

[47]  Lukas Burger,et al.  Genomic profiling of DNA methyltransferases reveals a role for DNMT3B in genic methylation , 2015, Nature.

[48]  Michael J. Ziller,et al.  Transcription factor binding dynamics during human ESC differentiation , 2015, Nature.

[49]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[50]  Nathaniel D. Heintzman,et al.  Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome , 2007, Nature Genetics.

[51]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[52]  Hengbin Wang,et al.  Role of Histone H3 Lysine 27 Methylation in Polycomb-Group Silencing , 2002, Science.

[53]  John D. Storey A direct approach to false discovery rates , 2002 .

[54]  Tomas W. Fitzgerald,et al.  Origins and functional impact of copy number variation in the human genome , 2010, Nature.

[55]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[56]  James LaRue,et al.  Integrated software , 1993 .

[57]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[58]  Nadav S. Bar,et al.  Landscape of transcription in human cells , 2012, Nature.

[59]  Vera Rogiers,et al.  Strategies for immortalization of primary hepatocytes. , 2014, Journal of hepatology.

[60]  Henriette O'Geen,et al.  ZNF274 Recruits the Histone Methyltransferase SETDB1 to the 3′ Ends of ZNF Genes , 2010, PloS one.

[61]  Alicia Oshlack,et al.  A comparison of control samples for ChIP-seq of histone modifications , 2014, Front. Genet..

[62]  Martin Vingron,et al.  Inference of interactions between chromatin modifiers and histone modifications: from ChIP-Seq data to chromatin-signaling , 2014, bioRxiv.

[63]  Ho-Ryun Chung,et al.  Chromatin segmentation based on a probabilistic model for read counts explains a large portion of the epigenome , 2015, Genome Biology.

[64]  Manolis Kellis,et al.  ChromHMM: automating chromatin-state discovery and characterization , 2012, Nature Methods.

[65]  J. Helden,et al.  A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs , 2012, Nature Protocols.

[66]  C. Ponting,et al.  Sequencing depth and coverage: key considerations in genomic analyses , 2014, Nature Reviews Genetics.

[67]  Andreas S. Richter,et al.  Standardizing chromatin research: a simple and universal method for ChIP-seq , 2015, Nucleic acids research.

[68]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[69]  D. Weitz,et al.  Single-cell ChIP-seq reveals cell subpopulations defined by chromatin state , 2015, Nature Biotechnology.

[70]  Irina Ostrovnaya,et al.  ESTIMATING THE PROPORTION OF TRUE NULL HYPOTHESES UNDER DEPENDENCE , 2012 .

[71]  Ho-Ryun Chung,et al.  reChIP-seq reveals widespread bivalency of H 3 K 4 me 3 and H 3 K 27 me 3 in CD 4 þ memory T cells , 2016 .

[72]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[73]  Fidel Ramírez,et al.  deepTools: a flexible platform for exploring deep-sequencing data , 2014, Nucleic Acids Res..

[74]  Andreas Heger,et al.  Epigenetic conservation at gene regulatory elements revealed by non-methylated DNA profiling in seven vertebrates , 2013, eLife.

[75]  V. Pirrotta,et al.  Drosophila Enhancer of Zeste/ESC Complexes Have a Histone H3 Methyltransferase Activity that Marks Chromosomal Polycomb Sites , 2002, Cell.

[76]  Anne H. O'Donnell,et al.  Hyperconserved CpG domains underlie Polycomb-binding sites , 2007, Proceedings of the National Academy of Sciences.

[77]  H. Ng,et al.  Uniform, optimal signal processing of mapped deep-sequencing data , 2013, Nature Biotechnology.

[78]  Isaac Dialsingh,et al.  Estimating the proportion of true null hypotheses when the statistics are discrete , 2015, Bioinform..

[79]  Robert Gentleman,et al.  Software for Computing and Annotating Genomic Ranges , 2013, PLoS Comput. Biol..