Improved ChIP-chip analysis by a mixture model approach

BackgroundMicroarray analysis of immunoprecipitated chromatin (ChIP-chip) has evolved from a novel technique to a standard approach for the systematic study of protein-DNA interactions. In ChIP-chip, sites of protein-DNA interactions are identified by signals from the hybridization of selected DNA to tiled oligomers and are graphically represented as peaks. Most existing methods were designed for the identification of relatively sparse peaks, in the presence of replicates.ResultsWe propose a data normalization method and a statistical method for peak identification from ChIP-chip data based on a mixture model approach. In contrast to many existing methods, including methods that also employ mixture model approaches, our method is more flexible by imposing less restrictive assumptions and allowing a relatively large proportion of peak regions. In addition, our method does not require experimental replicates and is computationally efficient. We compared the performance of our method with several representative existing methods on three datasets, including a spike-in dataset. These comparisons demonstrate that our approach is more robust and has comparable or higher power than the other methods, especially in the context of abundant peak regions.ConclusionOur data normalization and peak detection methods have improved performance to detect peak regions in ChIP-chip data.

[1]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[2]  T. Wolfsberg,et al.  DNase-chip: a high-resolution method to identify DNase I hypersensitive sites using tiled microarrays , 2006, Nature Methods.

[3]  Leah Barrera,et al.  ChIP‐chip: Data, Model, and Analysis , 2007, Biometrics.

[4]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[5]  J. Steinier,et al.  Smoothing and differentiation of data by simplified least square procedure. , 1972, Analytical chemistry.

[6]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[7]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[9]  David Botstein,et al.  Promoter-specific binding of Rap1 revealed by genome-wide maps of protein–DNA association , 2001, Nature Genetics.

[10]  William Stafford Noble,et al.  Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays , 2006, Nature Methods.

[11]  Mark Gerstein,et al.  Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets. , 2008, Genome research.

[12]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[13]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[14]  Clifford A. Meyer,et al.  Model-based analysis of tiling-arrays for ChIP-chip , 2006, Proceedings of the National Academy of Sciences.

[15]  R. Myers,et al.  An Integrated Software System for Analyzing Chip-chip and Chip-seq Data (supplementary Information) , 2008 .

[16]  Sündüz Keleş,et al.  Mixture Modeling for Genome‐Wide Localization of Transcription Factors , 2007, Biometrics.

[17]  Michael Q. Zhang,et al.  Analysis of the Vertebrate Insulator Protein CTCF-Binding Sites in the Human Genome , 2007, Cell.

[18]  Raphael Gottardo,et al.  A Flexible and Powerful Bayesian Hierarchical Model for ChIP–Chip Experiments , 2008, Biometrics.

[19]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[20]  Leah Barrera,et al.  A high-resolution map of active promoters in the human genome , 2005, Nature.

[21]  Raymond K. Auerbach,et al.  PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls , 2009, Nature Biotechnology.

[22]  Wei Li,et al.  Model-based analysis of two-color arrays (MA2C) , 2007, Genome Biology.

[23]  Steven J. M. Jones,et al.  FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology , 2008, Bioinform..

[24]  S. Knudsen,et al.  A new non-linear normalization method for reducing variability in DNA microarray experiments , 2002, Genome Biology.

[25]  Wing Hung Wong,et al.  TileMap: create chromosomal map of tiling array hybridizations , 2005, Bioinform..

[26]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[27]  Clifford A. Meyer,et al.  A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences , 2005, ISMB.

[28]  Ker-Chau Li,et al.  Dissecting Nucleosome Free Regions by a Segmental Semi-Markov Model , 2009, PloS one.

[29]  Sandrine Dudoit,et al.  Multiple Testing Methods For ChIP - Chip High Density Oligonucleotide Array Data , 2006, J. Comput. Biol..

[30]  Sanjit K. Mitra,et al.  Optimized LOWESS normalization parameter selection for DNA microarray data , 2004, BMC Bioinformatics.

[31]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[32]  Andrew B Nobel,et al.  ChIPOTle: a user-friendly tool for the analysis of ChIP-chip data , 2005, Genome biology.

[33]  S. Cawley,et al.  Unbiased Mapping of Transcription Factor Binding Sites along Human Chromosomes 21 and 22 Points to Widespread Regulation of Noncoding RNAs , 2004, Cell.

[34]  P. Giresi,et al.  Isolation of active regulatory elements from eukaryotic chromatin using FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements). , 2009, Methods.

[35]  Michael Q. Zhang,et al.  Combinatorial patterns of histone acetylations and methylations in the human genome , 2008, Nature Genetics.

[36]  V. Iyer,et al.  FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. , 2007, Genome research.

[37]  E. Mardis ChIP-seq: welcome to the new frontier , 2007, Nature Methods.

[38]  J. Lieb,et al.  ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. , 2004, Genomics.

[39]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.