A signal processing approach for enriched region detection in RNA polymerase II ChIP-seq data

BackgroundRNA polymerase II (PolII) is essential in gene transcription and ChIP-seq experiments have been used to study PolII binding patterns over the entire genome. However, since PolII enriched regions in the genome can be very long, existing peak finding algorithms for ChIP-seq data are not adequate for identifying such long regions.MethodsHere we propose an enriched region detection method for ChIP-seq data to identify long enriched regions by combining a signal denoising algorithm with a false discovery rate (FDR) approach. The binned ChIP-seq data for PolII are first processed using a non-local means (NL-means) algorithm for purposes of denoising. Then, a FDR approach is developed to determine the threshold for marking enriched regions in the binned histogram.ResultsWe first test our method using a public PolII ChIP-seq dataset and compare our results with published results obtained using the published algorithm HPeak. Our results show a high consistency with the published results (80-100%). Then, we apply our proposed method on PolII ChIP-seq data generated in our own study on the effects of hormone on the breast cancer cell line MCF7. The results demonstrate that our method can effectively identify long enriched regions in ChIP-seq datasets. Specifically, pertaining to MCF7 control samples we identified 5,911 segments with length of at least 4 Kbp (maximum 233,000 bp); and in MCF7 treated with E2 samples, we identified 6,200 such segments (maximum 325,000 bp).ConclusionsWe demonstrated the effectiveness of this method in studying binding patterns of PolII in cancer cells which enables further deep analysis in transcription regulation and epigenetics. Our method complements existing peak detection algorithms for ChIP-seq experiments.

[1]  John T. Lis,et al.  Defining mechanisms that regulate RNA polymerase II transcription in vivo , 2009, Nature.

[2]  Patrick Bouthemy,et al.  Patch-Based Nonlocal Functional for Denoising Fluorescence Microscopy Image Sequences , 2010, IEEE Transactions on Medical Imaging.

[3]  A. Barski,et al.  Genomic location analysis by ChIP‐Seq , 2009, Journal of cellular biochemistry.

[4]  Zhaohui S. Qin,et al.  HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data , 2010, BMC Bioinformatics.

[5]  Raymond K. Auerbach,et al.  PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls , 2009, Nature Biotechnology.

[6]  Manolis Kellis,et al.  RNA polymerase stalling at developmental control genes in the Drosophila melanogaster embryo , 2007, Nature Genetics.

[7]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[8]  Dimitri Van De Ville,et al.  Nonlocal Means With Dimensionality Reduction and SURE-Based Parameter Selection , 2011, IEEE Transactions on Image Processing.

[9]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[10]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Clifford A. Meyer,et al.  Nucleosome Dynamics Define Transcriptional Enhancers , 2010, Nature Genetics.

[12]  Patrick Bouthemy,et al.  Space-Time Adaptation for Patch-Based Image Sequence Restoration , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Christopher B. Burge,et al.  c-Myc Regulates Transcriptional Pause Release , 2010, Cell.

[14]  S. Batzoglou,et al.  Genome-Wide Analysis of Transcription Factor Binding Sites Based on ChIP-Seq Data , 2008, Nature Methods.

[15]  Bertram Ludäscher,et al.  Sole-Search: an integrated analysis program for peak detection and functional annotation using ChIP-seq data , 2009, Nucleic acids research.

[16]  Jean-Michel Morel,et al.  A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Chen Zeng,et al.  A clustering approach for identification of enriched domains from histone modification ChIP-Seq data , 2009, Bioinform..

[18]  F. J. Anscombe,et al.  THE TRANSFORMATION OF POISSON, BINOMIAL AND NEGATIVE-BINOMIAL DATA , 1948 .

[19]  Xiaoqin Xu,et al.  Genome-wide Analysis of Transcription Factor E2F1 Mutant Proteins Reveals That N- and C-terminal Protein Interaction Domains Do Not Participate in Targeting E2F1 to the Human Genome , 2011, The Journal of Biological Chemistry.

[20]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[21]  Zhaohui S. Qin,et al.  An integrated network of androgen receptor, polycomb, and TMPRSS2-ERG gene fusions in prostate cancer progression. , 2010, Cancer cell.