Comparative study on ChIP-seq data: normalization and binding pattern characterization

MOTIVATION Antibody-based Chromatin Immunoprecipitation assay followed by high-throughput sequencing technology (ChIP-seq) is a relatively new method to study the binding patterns of specific protein molecules over the entire genome. ChIP-seq technology allows scientist to get more comprehensive results in shorter time. Here, we present a non-linear normalization algorithm and a mixture modeling method for comparing ChIP-seq data from multiple samples and characterizing genes based on their RNA polymerase II (Pol II) binding patterns. RESULTS We apply a two-step non-linear normalization method based on locally weighted regression (LOESS) approach to compare ChIP-seq data across multiple samples and model the difference using an Exponential-Normal(K) mixture model. Fitted model is used to identify genes associated with differential binding sites based on local false discovery rate (fdr). These genes are then standardized and hierarchically clustered to characterize their Pol II binding patterns. As a case study, we apply the analysis procedure comparing normal breast cancer (MCF7) to tamoxifen-resistant (OHT) cell line. We find enriched regions that are associated with cancer (P < 0.0001). Our findings also imply that there may be a dysregulation of cell cycle and gene expression control pathways in the tamoxifen-resistant cells. These results show that the non-linear normalization method can be used to analyze ChIP-seq data across multiple samples. AVAILABILITY Data are available at http://www.bmi.osu.edu/~khuang/Data/ChIP/RNAPII/.

[1]  Michael Snyder,et al.  ChIP-chip: a genomic approach for identifying transcription factor binding sites. , 2002, Methods in enzymology.

[2]  H. Stunnenberg,et al.  ChIP‐Seq of ERα and RNA polymerase II defines genes differentially responding to ligands , 2009, The EMBO journal.

[3]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[4]  Rachel Schiff,et al.  Estrogen-receptor biology: continuing progress and therapeutic implications. , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[5]  F. Stossi,et al.  Whole-Genome Cartography of Estrogen Receptor α Binding Sites , 2007, PLoS genetics.

[6]  Clifford A. Meyer,et al.  Genome-wide analysis of estrogen receptor binding sites , 2006, Nature Genetics.

[7]  H. Teicher Identifiability of Mixtures , 1961 .

[8]  Thomas W. Parks,et al.  New results in the design of digital interpolators" ieee trans , 1975 .

[9]  Feng Lin,et al.  An HMM approach to genome-wide identification of differential histone modification sites from ChIP-seq data , 2008, Bioinform..

[10]  Allen Chong,et al.  Discovery of estrogen receptor α target genes and response elements in breast tumor cells , 2004, Genome Biology.

[11]  H. Teicher Identifiability of Finite Mixtures , 1963 .

[12]  Pearlly Yan,et al.  Genome-wide analysis of alternative promoters of human genes using a custom promoter tiling array , 2008, BMC Genomics.

[13]  Stanley Fields,et al.  Site-Seeing by Sequencing , 2007, Science.

[14]  P. Brown,et al.  Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[15]  P.J.M. van Laarhoven,et al.  Theoretical and Computational Aspects of Simulated Annealing. , 1990 .

[16]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[17]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[18]  Lang Li,et al.  Diverse gene expression and DNA methylation profiles correlate with differential adaptation of breast cancer cells to the antiestrogens tamoxifen and fulvestrant. , 2006, Cancer research.

[19]  Raymond K. Auerbach,et al.  PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls , 2009, Nature Biotechnology.

[20]  B. Hulka,et al.  Steroid hormones and risk of breast cancer , 1994, Cancer.

[21]  John T. Kent,et al.  Identifiability of Finite Mixtures for Directional Data , 1983 .

[22]  Adrian E. Raftery,et al.  Normal uniform mixture differential gene expression detection for cDNA microarrays , 2005, BMC Bioinformatics.

[23]  P. Park,et al.  Design and analysis of ChIP-seq experiments for DNA-binding proteins , 2008, Nature Biotechnology.

[24]  Tim Hui-Ming Huang,et al.  A robust unified approach to analyzing methylation and gene expression data , 2009, Comput. Stat. Data Anal..

[25]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[26]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[27]  Weixing Feng,et al.  A Poisson mixture model to identify changes in RNA polymerase II binding quantity using high-throughput sequencing technology , 2008, BMC Genomics.