BroadPeak: a novel algorithm for identifying broad peaks in diffuse ChIP-seq datasets

SUMMARY Although some histone modification chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) signals show abrupt peaks across narrow and specific genomic locations, others have diffuse distributions along chromosomes, and their large contiguous enrichment landscapes are better modeled as broad peaks. Here, we present BroadPeak, an algorithm for the identification of such broad peaks from diffuse ChIP-seq datasets. We show that BroadPeak is a linear time algorithm that requires only two parameters, and we validate its performance on real and simulated histone modification ChIP-seq datasets. BroadPeak calls peaks that are highly coincident with both the underlying ChIP-seq tag count distributions and relevant biological features, such as the gene bodies of actively transcribed genes, and it shows superior overall recall and precision of known broad peaks from simulated datasets. AVAILABILITY The source code and documentations are available at http://jordan.biology.gatech.edu/page/software/broadpeak/.

[1]  Timothy J. Durham,et al.  "Systematic" , 1966, Comput. J..

[2]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[3]  Timothy J. Durham,et al.  Systematic analysis of chromatin state dynamics in nine human cell types , 2011, Nature.

[4]  A. Raftery,et al.  Bayesian analysis of a Poisson process with a change-point , 1986 .

[5]  Keji Zhao,et al.  domains barrier regions reveals demarcation of active and repressive Global analysis of the insulator binding protein CTCF in chromatin Material , 2008 .

[6]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[7]  Walter L. Ruzzo,et al.  A Linear Time Algorithm for Finding All Maximal Scoring Subsequences , 1999, ISMB.

[8]  T. Laajala,et al.  A practical comparison of methods for detecting transcription factor binding sites in ChIP-seq experiments , 2009, BMC Genomics.

[9]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[11]  S. Karlin,et al.  Applications and statistics for multiple high-scoring segments in molecular sequences. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[12]  P. Park ChIP–seq: advantages and challenges of a maturing technology , 2009, Nature Reviews Genetics.

[13]  Chen Zeng,et al.  A clustering approach for identification of enriched domains from histone modification ChIP-Seq data , 2009, Bioinform..

[14]  Andrew D. Smith,et al.  Bioinformatics Applications Note Gene Expression Identifying Dispersed Epigenomic Domains from Chip-seq Data , 2022 .

[15]  Michael Q. Zhang,et al.  Combinatorial patterns of histone acetylations and methylations in the human genome , 2008, Nature Genetics.