Multiple Hypothesis Testing by Clustering Treatment Effects

Multiple hypothesis testing and clustering have been the subject of extensive research in high-dimensional inference, yet these problems usually have been treated separately. By defining true clusters in terms of shared parameter values, we could improve the sensitivity of individual tests, because more data bearing on the same parameter values are available. We develop and evaluate a hybrid methodology that uses clustering information to increase testing sensitivity and accommodates uncertainty in the true clustering. To investigate the potential efficacy of the hybrid approach, we first study a stylized example in which each object is evaluated with a standard z score but different objects are connected by shared parameter values. We show that there is increased testing power when the clustering is estimated sufficiently well. We next develop a model-based analysis using a conjugate Dirichlet process mixture model. The method is general, but for specificity we focus attention on microarray gene expression data, to which both clustering and multiple testing methods are actively applied. Clusters provide the means for sharing information among genes, and the hybrid methodology averages over uncertainty in these clusters through Markov chain sampling. Simulations show that the hybrid method performs substantially better than other methods when clustering is heavy or moderate and performs well even under weak clustering. The proposed method is illustrated on microarray data from a study of the effects of aging on gene expression in heart tissue.

[1]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[2]  M. Newton,et al.  Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis , 2007, 0708.4350.

[3]  Ping Ma,et al.  Bayesian Inference for Gene Expression and Proteomics , 2007, Briefings Bioinform..

[4]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[5]  Radford M. Neal,et al.  A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model , 2004 .

[6]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[7]  C M Kendziorski,et al.  On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles , 2003, Statistics in medicine.

[8]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[9]  Deepayan Sarkar,et al.  Age-related impairment of the transcriptional responses to oxidative stress in the mouse heart. , 2003, Physiological genomics.

[10]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[11]  Paola Sebastiani,et al.  Statistical Challenges in Functional Genomics , 2003 .

[12]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[13]  Mario Medvedovic,et al.  Bayesian infinite mixture model based clustering of gene expression profiles , 2002, Bioinform..

[14]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[15]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[16]  M. Newton,et al.  Computational Aspects of Nonparametric Bayesian Analysis with Applications to the Modeling of Multiple Binary Sequences , 2000 .

[17]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[18]  Jun S. Liu,et al.  Sequential importance sampling for nonparametric Bayes models: The next generation , 1999 .

[19]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[20]  H. Saito,et al.  A Family of Stress-Inducible GADD45-like Proteins Mediate Activation of the Stress-Responsive MTK1/MEKK4 MAPKKK , 1998, Cell.

[21]  Radford M. Neal,et al.  Markov Chain Monte Carlo in Practice: A Roundtable Discussion , 1998 .

[22]  Jun S. Liu Nonparametric hierarchical Bayes via sequential imputations , 1996 .

[23]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[24]  M. Escobar Estimating Normal Means with a Dirichlet Process Prior , 1994 .

[25]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[26]  D. Binder Bayesian cluster analysis , 1978 .

[27]  J. Hartigan Clustering Algorithms , 1975 .

[28]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[29]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[30]  P. Müller,et al.  10 Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model , 2006 .

[31]  G. Kaur,et al.  Mitochondrial electron transport chain complexes in aging rat brain and lymphocytes , 2004, Biogerontology.

[32]  D. B. Dahl An improved merge-split sampler for conjugate dirichlet process mixture models , 2003 .

[33]  M. Oja,et al.  Expression Data , 2001 .

[34]  P. Spellman,et al.  Cluster analysis and display of genome-wide expression patterns , 1998 .

[35]  J. Skilling,et al.  Bayesian Density Estimation , 1996 .

[36]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[37]  S. MacEachern Estimating normal means with a conjugate style dirichlet process prior , 1994 .

[38]  Radford M. Neal Bayesian Mixture Modeling , 1992 .

[39]  B. D. Ripley Stochastic simulation , 1987, Wiley series in probability and mathematical statistics : applied probability and statistics.

[40]  Peter Müller,et al.  Nonparametric Bayesian data analysis , 2004 .

[41]  Kristen J Antonellis,et al.  Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data , 2022 .