Spiked Dirichlet Process Prior for Bayesian Multiple Hypothesis Testing in Random Effects Models.

We propose a Bayesian method for multiple hypothesis testing in random effects models that uses Dirichlet process (DP) priors for a nonparametric treatment of the random effects distribution. We consider a general model formulation which accommodates a variety of multiple treatment conditions. A key feature of our method is the use of a product of spiked distributions, i.e., mixtures of a point-mass and continuous distributions, as the centering distribution for the DP prior. Adopting these spiked centering priors readily accommodates sharp null hypotheses and allows for the estimation of the posterior probabilities of such hypotheses. Dirichlet process mixture models naturally borrow information across objects through model-based clustering while inference on single hypotheses averages over clustering uncertainty. We demonstrate via a simulation study that our method yields increased sensitivity in multiple hypothesis testing and produces a lower proportion of false discoveries than other competitive methods. While our modeling framework is general, here we present an application in the context of gene expression from microarray experiments. In our application, the modeling framework allows simultaneous inference on the parameters governing differential expression and inference on the clustering of genes. We use experimental data on the transcriptional response to oxidative stress in mouse heart muscle and compare the results from our procedure with existing nonparametric Bayesian methods that provide only a ranking of the genes by their evidence for differential expression.

[1]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[2]  Jeffrey T Leek,et al.  The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments. , 2007, Biostatistics.

[3]  Larry Wasserman,et al.  Bayesian and Frequentist Multiple Testing , 2002 .

[4]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[5]  David B. Dunson,et al.  Bayesian Methods for Highly Correlated Exposure Data , 2007, Epidemiology.

[6]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[7]  P. Green,et al.  Bayesian Model-Based Clustering Procedures , 2007 .

[8]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[9]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[10]  John D. Storey The optimal discovery procedure: a new approach to simultaneous significance testing , 2007 .

[11]  D. B. Dahl Bayesian Inference for Gene Expression and Proteomics: Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model , 2006 .

[12]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[13]  P. Westfall,et al.  Multiple Tests with Discrete Distributions , 1997 .

[14]  M. Newton,et al.  Multiple Hypothesis Testing by Clustering Treatment Effects , 2007 .

[15]  Carlos M. Carvalho,et al.  Sparse Statistical Modelling in Gene Expression Genomics , 2006 .

[16]  Robert Tibshirani,et al.  Correlation-sharing for detection of differential gene expression , 2006, math/0608061.

[17]  C. Kendziorski,et al.  A Unified Approach for Simultaneous Gene Clustering and Differential Expression Identification , 2006, Biometrics.

[18]  D. Dunson,et al.  Bayesian Selection and Clustering of Polymorphisms in Functionally Related Genes , 2008 .

[19]  D. Berry,et al.  Bayesian perspectives on multiple comparisons , 1999 .

[20]  James G. Scott,et al.  An exploration of aspects of Bayesian multiple testing , 2006 .

[21]  Z. Q. John Lu Bayesian Inference for Gene Expression and Proteomics , 2007 .

[22]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[23]  John D. Storey A direct approach to false discovery rates , 2002 .

[24]  P. Müller,et al.  A Bayesian mixture model for differential gene expression , 2005 .

[25]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[26]  T. Fearn,et al.  Multivariate Bayesian variable selection and prediction , 1998 .

[27]  D. Berry,et al.  Bayesian multiple comparisons using dirichlet process priors , 1998 .

[28]  P. Müller,et al.  10 Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model , 2006 .

[29]  Mario Medvedovic,et al.  Bayesian infinite mixture model based clustering of gene expression profiles , 2002, Bioinform..

[30]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[31]  G. Hommel A stagewise rejective multiple test procedure based on a modified Bonferroni test , 1988 .

[32]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[33]  D. Dunson,et al.  Variable Selection in Nonparametric Random Effects Models , 2022 .

[34]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[35]  Sandrine Dudoit,et al.  Resampling-based empirical Bayes multiple testing procedures for controlling generalized tail probability and expected value error rates: focus on the false discovery rate and simulation study. , 2008, Biometrical journal. Biometrische Zeitschrift.

[36]  P. Müller,et al.  Optimal Sample Size for Multiple Testing , 2004 .

[37]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[38]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[39]  Yogendra P. Chaubey Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[40]  D. Binder Bayesian cluster analysis , 1978 .

[41]  Y. Hochberg A sharper Bonferroni procedure for multiple tests of significance , 1988 .

[42]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[43]  Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model , 2008 .