A clustering approach for identification of enriched domains from histone modification ChIP-Seq data

MOTIVATION Chromatin states are the key to gene regulation and cell identity. Chromatin immunoprecipitation (ChIP) coupled with high-throughput sequencing (ChIP-Seq) is increasingly being used to map epigenetic states across genomes of diverse species. Chromatin modification profiles are frequently noisy and diffuse, spanning regions ranging from several nucleosomes to large domains of multiple genes. Much of the early work on the identification of ChIP-enriched regions for ChIP-Seq data has focused on identifying localized regions, such as transcription factor binding sites. Bioinformatic tools to identify diffuse domains of ChIP-enriched regions have been lacking. RESULTS Based on the biological observation that histone modifications tend to cluster to form domains, we present a method that identifies spatial clusters of signals unlikely to appear by chance. This method pools together enrichment information from neighboring nucleosomes to increase sensitivity and specificity. By using genomic-scale analysis, as well as the examination of loci with validated epigenetic states, we demonstrate that this method outperforms existing methods in the identification of ChIP-enriched signals for histone modification profiles. We demonstrate the application of this unbiased method in important issues in ChIP-Seq data analysis, such as data normalization for quantitative comparison of levels of epigenetic modifications across cell types and growth conditions. AVAILABILITY http://home.gwu.edu/ approximately wpeng/Software.htm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Feng Lin,et al.  An HMM approach to genome-wide identification of differential histone modification sites from ChIP-seq data , 2008, Bioinform..

[2]  P. Park,et al.  Design and analysis of ChIP-seq experiments for DNA-binding proteins , 2008, Nature Biotechnology.

[3]  Raymond K. Auerbach,et al.  PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls , 2009, Nature Biotechnology.

[4]  David A. Nix,et al.  Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks , 2008, BMC Bioinformatics.

[5]  Dustin E. Schones,et al.  Chromatin signatures in multipotent human hematopoietic stem cells indicate the fate of bivalent genes during differentiation. , 2009, Cell stem cell.

[6]  S. Batzoglou,et al.  Genome-Wide Analysis of Transcription Factor Binding Sites Based on ChIP-Seq Data , 2008, Nature Methods.

[7]  P. Evans,et al.  The structural basis for the recognition of acetylated histone H4 by the bromodomain of histone acetyltransferase Gcn5p , 2000, The EMBO journal.

[8]  G. Schotta,et al.  Functional mammalian homologues of the Drosophila PEV‐modifier Su(var)3‐9 encode centromere‐associated proteins which complex with the heterochromatin component M31 , 1999, The EMBO journal.

[9]  Michael O Dorschner,et al.  Comprehensive epigenetic profiling identifies multiple distal regulatory elements directing transcription of the gene encoding interferon-γ , 2007, Nature Immunology.

[10]  Juri Rappsilber,et al.  A model for transmission of the H3K27me3 epigenetic mark , 2008, Nature Cell Biology.

[11]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[12]  K. Sneppen,et al.  Theoretical Analysis of Epigenetic Cell Memory by Nucleosome Modification , 2007, Cell.

[13]  E. Lander,et al.  The Mammalian Epigenome , 2007, Cell.

[14]  Steven J. M. Jones,et al.  FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology , 2008, Bioinform..

[15]  Michael Q. Zhang,et al.  Combinatorial patterns of histone acetylations and methylations in the human genome , 2008, Nature Genetics.

[16]  Raja Jothi,et al.  Genome-wide identification of in vivo protein–DNA binding sites from ChIP-Seq data , 2008, Nucleic acids research.

[17]  J. Martens,et al.  EZH2 and Histone 3 Trimethyl Lysine 27 Associated with Il4 and Il13 Gene Silencing in TH1 Cells* , 2005, Journal of Biological Chemistry.

[18]  Terrence S. Furey,et al.  F-Seq: a feature density estimator for high-throughput sequence tags , 2008, Bioinform..

[19]  A. Feinberg,et al.  Large histone H3 lysine 9 dimethylated chromatin blocks distinguish differentiated from embryonic stem cells , 2009, Nature Genetics.

[20]  Vincenzo Pirrotta,et al.  Polycomb silencing mechanisms and the management of genomic programmes , 2007, Nature Reviews Genetics.

[21]  T. Mikkelsen,et al.  Genome-wide maps of chromatin state in pluripotent and lineage-committed cells , 2007, Nature.

[22]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[23]  Anthony P. Fejes,et al.  Genome-wide relationship between histone H3 lysine 4 mono- and tri-methylation and transcription factor binding. , 2008, Genome research.

[24]  S. Orkin,et al.  Glimpses of the epigenetic landscape. , 2009, Cell stem cell.

[25]  Mark Gerstein,et al.  Modeling ChIP Sequencing In Silico with Applications , 2008, PLoS Comput. Biol..

[26]  N. D. Clarke,et al.  Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells , 2008, Cell.

[27]  Dustin E. Schones,et al.  Genome-wide approaches to studying chromatin modifications , 2008, Nature Reviews Genetics.

[28]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[29]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[30]  Yuka Kanno,et al.  Global mapping of H3K4me3 and H3K27me3 reveals specificity and plasticity in lineage fate determination of differentiating CD4+ T cells. , 2009, Immunity.

[31]  M. Groudine,et al.  Controlling the double helix , 2003, Nature.

[32]  Andrew J. Bannister,et al.  Selective recognition of methylated lysine 9 on histone H3 by the HP1 chromo domain , 2001, Nature.

[33]  Istvan Albert,et al.  GeneTrack - a genomic data processing and visualization framework , 2008, Bioinform..

[34]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[35]  N. S. Barnett,et al.  Private communication , 1969 .

[36]  A. Aszódi,et al.  H3K27me3 forms BLOCs over silent genes and intergenic regions and specifies a histone banding pattern on a mouse autosomal chromosome. , 2009, Genome research.

[37]  R. Tjian,et al.  Structure and function of a human TAFII250 double bromodomain module. , 2000, Science.

[38]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[39]  Eric G Pamer Erratum: Immune responses to commensal and environmental microbes , 2007, Nature Immunology.

[40]  Karl Mechtler,et al.  Methylation of histone H3 lysine 9 creates a binding site for HP1 proteins , 2001, Nature.