A Flexible and Powerful Bayesian Hierarchical Model for ChIP–Chip Experiments

Chromatin-immunoprecipitation microarrays (ChIP-chip) that enable researchers to identify regions of a given genome that are bound by specific DNA-binding proteins present new challenges for statistical analysis due to the large number of probes, the high noise-to-signal ratio, and the spatial dependence between probes. We propose a method called BAC (Bayesian analysis of ChIP-chip) to detect transcription factor bound regions, which incorporate the dependence between probes while making little assumptions about the bound regions (e.g., length). BAC is robust to probe outliers with an exchangeable prior for the variances, which allows different variances for the probes but still shrink extreme empirical variances. Parameter estimation is carried out using Markov chain Monte Carlo and inference is based on the joint distribution of the parameters. Bound regions are detected using posterior probabilities computed from the joint posterior distribution of neighboring probes. We show that these posterior probabilities are well calibrated and can be used to obtain an estimate of the false discovery rate. The method is illustrated using two publicly available ChIP-chip data sets containing 18 experimentally validated regions. We compare our method to four other baseline and commonly used techniques, namely, the Wilcoxon's rank sum test, TileMap, HGMM, and MAT. We found BAC and HGMM to perform best at detecting validated regions. However, HGMM appears to be very sensitive to probe outliers compared to BAC. In addition, we present a simulation study, which shows that BAC is more powerful than the other four techniques under various simulation scenarios while being robust to model misspecification.

[1]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[2]  Clifford A. Meyer,et al.  Model-based analysis of tiling-arrays for ChIP-chip , 2006, Proceedings of the National Academy of Sciences.

[3]  S. Cawley,et al.  Unbiased Mapping of Transcription Factor Binding Sites along Human Chromosomes 21 and 22 Points to Widespread Regulation of Noncoding RNAs , 2004, Cell.

[4]  Zhe Jiang,et al.  Spatial Statistics , 2013 .

[5]  Bani K. Mallick,et al.  Bayesian Robust Inference for Differential Gene Expression , 2010 .

[6]  Mark Reimers,et al.  Statistical Analysis of Microarray Data , 2005, Addiction biology.

[7]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[8]  J. Lieb,et al.  ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. , 2004, Genomics.

[9]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[10]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[11]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[12]  M. Newton Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis , 2008 .

[13]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[14]  J. Besag,et al.  On conditional and intrinsic autoregressions , 1995 .

[15]  Wing Hung Wong,et al.  TileMap: create chromosomal map of tiling array hybridizations , 2005, Bioinform..

[16]  Clifford A. Meyer,et al.  Chromosome-Wide Mapping of Estrogen Receptor Binding Reveals Long-Range Regulation Requiring the Forkhead Protein FoxA1 , 2005, Cell.

[17]  Sündüz Keleş,et al.  Mixture Modeling for Genome‐Wide Localization of Transcription Factors , 2007, Biometrics.

[18]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[19]  R. Gottardo,et al.  Statistical analysis of microarray data: a Bayesian approach. , 2003, Biostatistics.

[20]  Andrew B Nobel,et al.  RNA chaperone activity and RNA-binding properties of the E. coli protein StpA , 2007, Nucleic acids research.

[21]  Sylvia Richardson,et al.  Detection of gene copy number changes in CGH microarrays using a spatially correlated mixture model , 2006, Bioinform..

[22]  Andrew B Nobel,et al.  ChIPOTle: a user-friendly tool for the analysis of ChIP-chip data , 2005, Genome biology.

[23]  S. P. Fodor,et al.  Large-Scale Transcriptional Activity in Chromosomes 21 and 22 , 2002, Science.

[24]  G. Parmigiani,et al.  A statistical framework for expression‐based molecular classification in cancer , 2002 .

[25]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[26]  Clifford A. Meyer,et al.  A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences , 2005, ISMB.

[27]  S. Richardson,et al.  Bayesian Modeling of Differential Gene Expression , 2006, Biometrics.

[28]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[29]  Sandrine Dudoit,et al.  Multiple Testing Methods For ChIP - Chip High Density Oligonucleotide Array Data , 2006, J. Comput. Biol..