Bayesian nonparametric multiple testing

Multiple testing, or multiplicity problems often require testing several means with the assumption of rejecting infrequently, as motivated by the need to analyze DNA microarray data. The goal is to keep the combined rate of false discoveries and non-discoveries as small as possible. A discrete approximation to a Polya tree prior that enjoys fast, conjugate updating, centered at the usual Gaussian distribution is proposed. This new technique and the advantages of this approach are demonstrated using extensive simulation and data analysis accompanied by a Java web application. The numerical studies demonstrate that this new procedure shows promising false discovery rate and estimation of key values in the mixture model with very reasonable computational speed.

[1]  A. Butte,et al.  Coordinated reduction of genes of oxidative metabolism in humans with insulin resistance and diabetes: Potential role of PGC1 and NRF1 , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  U. Alon,et al.  Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. , 2001, Cancer research.

[4]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[5]  John D. Storey A direct approach to false discovery rates , 2002 .

[6]  J. Tukey Comparing individual means in the analysis of variance. , 1949, Biometrics.

[7]  T. Ferguson Prior Distributions on Spaces of Probability Measures , 1974 .

[8]  Omkar Muralidharan,et al.  An empirical Bayes mixture method for effect size and false discovery rate estimation , 2010, 1010.1425.

[9]  Z. Šidák Rectangular Confidence Regions for the Means of Multivariate Normal Distributions , 1967 .

[10]  Marina Vannucci,et al.  Spiked Dirichlet Process Prior for Bayesian Multiple Hypothesis Testing in Random Effects Models. , 2009, Bayesian analysis.

[11]  James G. Scott,et al.  An exploration of aspects of Bayesian multiple testing , 2006 .

[12]  Olivier Scaillet,et al.  Financial Valuation and Risk Management Working Paper No . 452 Technical Trading Revisited : False Discoveries , Persistence Tests , and Transaction Costs , 2011 .

[13]  Kathryn Roeder,et al.  DAWN: a framework to identify autism genes and subnetworks using gene expression and genetics , 2014, Molecular Autism.

[14]  R. Rowan Coral bleaching: Thermal adaptation in reef coral symbionts , 2004, Nature.

[15]  Y. Hochberg A sharper Bonferroni procedure for multiple tests of significance , 1988 .

[16]  J. Ghosh,et al.  A comparison of the Benjamini-Hochberg procedure with some Bayesian rules for multiple testing , 2008, 0805.2479.

[17]  Jiashun Jin Proportion of non‐zero normal means: universal oracle equivalences and uniformly consistent estimators , 2008 .

[18]  H. Haario,et al.  An adaptive Metropolis algorithm , 2001 .

[19]  Alexander C. McLain,et al.  Multiple Testing of Composite Null Hypotheses in Heteroscedastic Models , 2012 .

[20]  Kenneth Rice,et al.  FDR and Bayesian Multiple Comparisons Rules , 2006 .

[21]  Multiple Inference and Market Integration: An Application to Swedish Fish Markets , 2015 .

[22]  L. Wasserman,et al.  Operating characteristics and extensions of the false discovery rate procedure , 2002 .

[23]  Edsel A. Peña,et al.  POWER-ENHANCED MULTIPLE DECISION FUNCTIONS CONTROLLING FAMILY-WISE ERROR AND FALSE DISCOVERY RATES. , 2009, Annals of statistics.

[24]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[25]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[26]  Ryan Martin,et al.  A nonparametric empirical Bayes framework for large-scale multiple testing. , 2011, Biostatistics.

[27]  M. Lavine More Aspects of Polya Tree Distributions for Statistical Modelling , 1992 .

[28]  Wenguang Sun,et al.  Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control , 2007 .

[29]  M. T. Rodríguez-Bernal,et al.  Bayesian Analysis of Multiple Hypothesis Testing with Applications to Microarray Experiments , 2011 .

[30]  C. Molony,et al.  Genetic analysis of genome-wide variation in human gene expression , 2004, Nature.

[31]  A. Gelfand,et al.  Bayesian Model Choice: Asymptotics and Exact Calculations , 1994 .

[32]  T. Cai,et al.  Estimating the Null and the Proportion of Nonnull Effects in Large-Scale Multiple Comparisons , 2006, math/0611108.

[33]  M. Fletcher,et al.  Using gene expression signatures to identify novel treatment strategies in gulf war illness , 2015, BMC Medical Genomics.

[34]  T. Hanson Inference for Mixtures of Finite Polya Tree Models , 2006 .

[35]  P. Müller,et al.  A Bayesian mixture model for differential gene expression , 2005 .

[36]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Karl J. Friston,et al.  False discovery rate revisited: FDR and topological inference using Gaussian random fields , 2009, NeuroImage.

[38]  W. Sudderth,et al.  Polya Trees and Random Distributions , 1992 .

[39]  P. Müller,et al.  A Bayesian discovery procedure , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.