HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data

BackgroundProtein-DNA interaction constitutes a basic mechanism for the genetic regulation of target gene expression. Deciphering this mechanism has been a daunting task due to the difficulty in characterizing protein-bound DNA on a large scale. A powerful technique has recently emerged that couples chromatin immunoprecipitation (ChIP) with next-generation sequencing, (ChIP-Seq). This technique provides a direct survey of the cistrom of transcription factors and other chromatin-associated proteins. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed to analyze the massive amount of data generated by this method.ResultsHere we introduce HPeak, a H idden Markov model (HMM)-based Peak-finding algorithm for analyzing ChIP-Seq data to identify protein-interacting genomic regions. In contrast to the majority of available ChIP-Seq analysis software packages, HPeak is a model-based approach allowing for rigorous statistical inference. This approach enables HPeak to accurately infer genomic regions enriched with sequence reads by assuming realistic probability distributions, in conjunction with a novel weighting scheme on the sequencing read coverage.ConclusionsUsing biologically relevant data collections, we found that HPeak showed a higher prevalence of the expected transcription factor binding motifs in ChIP-enriched sequences relative to the control sequences when compared to other currently available ChIP-Seq analysis approaches. Additionally, in comparison to the ChIP-chip assay, ChIP-Seq provides higher resolution along with improved sensitivity and specificity of binding site detection. Additional file and the HPeak program are freely available at http://www.sph.umich.edu/csg/qin/HPeak.

[1]  S. Batzoglou,et al.  Genome-Wide Analysis of Transcription Factor Binding Sites Based on ChIP-Seq Data , 2008, Nature Methods.

[2]  Dustin E. Schones,et al.  Dynamic Regulation of Nucleosome Positioning in the Human Genome , 2008, Cell.

[3]  David A. Nix,et al.  Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks , 2008, BMC Bioinformatics.

[4]  H. Stunnenberg,et al.  ChIP‐Seq of ERα and RNA polymerase II defines genes differentially responding to ligands , 2009, The EMBO journal.

[5]  T. Laajala,et al.  A practical comparison of methods for detecting transcription factor binding sites in ChIP-seq experiments , 2009, BMC Genomics.

[6]  David Botstein,et al.  Promoter-specific binding of Rap1 revealed by genome-wide maps of protein–DNA association , 2001, Nature Genetics.

[7]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[8]  Steven J. M. Jones,et al.  FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology , 2008, Bioinform..

[9]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[10]  Yijun Ruan,et al.  Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array- and sequencing-based technologies. , 2007, Genome research.

[11]  Hyungwon Choi,et al.  Hierarchical hidden Markov model with application to joint analysis of ChIP-chip and ChIP-seq data , 2009, Bioinform..

[12]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[13]  Renato Paro,et al.  Mapping polycomb-repressed domains in the bithorax complex using in vivo formaldehyde cross-linked chromatin , 1993, Cell.

[14]  Raja Jothi,et al.  Genome-wide identification of in vivo protein–DNA binding sites from ChIP-Seq data , 2008, Nucleic acids research.

[15]  Raymond K. Auerbach,et al.  Efficient yeast ChIP-Seq using multiplex short-read DNA sequencing , 2009 .

[16]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[17]  Raymond K. Auerbach,et al.  PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls , 2009, Nature Biotechnology.

[18]  Timothy E. Reddy,et al.  Distinct DNA methylation patterns characterize differentiated human embryonic stem cells and developing human fetal liver. , 2009, Genome research.

[19]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[20]  P. Park Epigenetics meets next-generation sequencing , 2008, Epigenetics.

[21]  A. Barski,et al.  Genomic location analysis by ChIP‐Seq , 2009, Journal of cellular biochemistry.

[22]  Paul P. Gardner,et al.  A hidden Markov model approach for determining expression from genomic tiling micro arrays , 2006, BMC Bioinformatics.

[23]  R. Durbin,et al.  A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis , 2008, Nature Biotechnology.

[24]  Simon Tavaré,et al.  BayesPeak: Bayesian analysis of ChIP-seq data , 2009, BMC Bioinformatics.

[25]  A. Visel,et al.  ChIP-seq accurately predicts tissue-specific activity of enhancers , 2009, Nature.

[26]  Zhaohui S. Qin,et al.  An integrated network of androgen receptor, polycomb, and TMPRSS2-ERG gene fusions in prostate cancer progression. , 2010, Cancer cell.

[27]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[28]  Alexander Varshavsky,et al.  Mapping proteinDNA interactions in vivo with formaldehyde: Evidence that histone H4 is retained on a highly transcribed gene , 1988, Cell.

[29]  L. Gold Generalized poisson distributions , 1957 .

[30]  P. Park,et al.  Design and analysis of ChIP-seq experiments for DNA-binding proteins , 2008, Nature Biotechnology.

[31]  C. Massie,et al.  ChIPping away at gene regulation , 2008, EMBO reports.

[32]  Zhaohui S. Qin,et al.  On the detection and refinement of transcription factor binding sites using ChIP-Seq data , 2010, Nucleic acids research.

[33]  Feng Lin,et al.  An HMM approach to genome-wide identification of differential histone modification sites from ChIP-seq data , 2008, Bioinform..

[34]  Wolfgang Huber,et al.  Transcript mapping with high-density oligonucleotide tiling arrays , 2006, Bioinform..

[35]  Wing Hung Wong,et al.  TileMap: create chromosomal map of tiling array hybridizations , 2005, Bioinform..

[36]  Clifford A. Meyer,et al.  A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences , 2005, ISMB.

[37]  T. Mikkelsen,et al.  Genome-wide maps of chromatin state in pluripotent and lineage-committed cells , 2007, Nature.

[38]  David Bulger,et al.  Parameter estimation for robust HMM analysis of ChIP-chip data , 2008, BMC Bioinformatics.

[39]  N. D. Clarke,et al.  Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells , 2008, Cell.

[40]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[41]  E. Mardis ChIP-seq: welcome to the new frontier , 2007, Nature Methods.

[42]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[43]  Heejung Shim,et al.  Integrating quantitative information from ChIP-chip experiments into motif finding. , 2008, Biostatistics.

[44]  A. W. Kemp,et al.  Univariate Discrete Distributions , 1993 .

[45]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[46]  Istvan Albert,et al.  GeneTrack - a genomic data processing and visualization framework , 2008, Bioinform..

[47]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[48]  Thomas Werner,et al.  MatInspector and beyond: promoter analysis based on transcription factor binding sites , 2005, Bioinform..

[49]  Michael D. Wilson,et al.  ChIP-seq: using high-throughput sequencing to discover protein-DNA interactions. , 2009, Methods.

[50]  Bing Ren,et al.  ChIP‐chip for Genome‐Wide Analysis of Protein Binding in Mammalian Cells , 2007, Current protocols in molecular biology.