Bayes multiple decision functions.

This paper deals with the problem of simultaneously making many (M) binary decisions based on one realization of a random data matrix X. M is typically large and X will usually have M rows associated with each of the M decisions to make, but for each row the data may be low dimensional. Such problems arise in many practical areas such as the biological and medical sciences, where the available dataset is from microarrays or other high-throughput technology and with the goal being to decide which among of many genes are relevant with respect to some phenotype of interest; in the engineering and reliability sciences; in astronomy; in education; and in business. A Bayesian decision-theoretic approach to this problem is implemented with the overall loss function being a cost-weighted linear combination of Type I and Type II loss functions. The class of loss functions considered allows for use of the false discovery rate (FDR), false nondiscovery rate (FNR), and missed discovery rate (MDR) in assessing the quality of decision. Through this Bayesian paradigm, the Bayes multiple decision function (BMDF) is derived and an efficient algorithm to obtain the optimal Bayes action is described. In contrast to many works in the literature where the rows of the matrix X are assumed to be stochastically independent, we allow a dependent data structure with the associations obtained through a class of frailty-induced Archimedean copulas. In particular, non-Gaussian dependent data structure, which is typical with failure-time data, can be entertained. The numerical implementation of the determination of the Bayes optimal action is facilitated through sequential Monte Carlo techniques. The theory developed could also be extended to the problem of multiple hypotheses testing, multiple classification and prediction, and high-dimensional variable selection. The proposed procedure is illustrated for the simple versus simple hypotheses setting and for the composite hypotheses setting through simulation studies. The procedure is also applied to a subset of a microarray data set from a colon cancer study.

[1]  Tim Hesterberg,et al.  Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.

[2]  J. Ghosh,et al.  A comparison of the Benjamini-Hochberg procedure with some Bayesian rules for multiple testing , 2008, 0805.2479.

[3]  Bill Ravens,et al.  An Introduction to Copulas , 2000, Technometrics.

[4]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[5]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[6]  David J. Spiegelhalter,et al.  Microarrays, Empirical Bayes and the Two-Groups Model. Comment. , 2008 .

[7]  Bradley Efron,et al.  Large-scale inference , 2010 .

[8]  James O. Berger,et al.  Statistical Decision Theory and Bayesian Analysis, Second Edition , 1985 .

[9]  B. Minasny The Elements of Statistical Learning, Second Edition, Trevor Hastie, Robert Tishirani, Jerome Friedman. (2009), Springer Series in Statistics, ISBN 0172-7397, 745 pp , 2009 .

[10]  Debashis Ghosh,et al.  A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE , 2008 .

[11]  Kenneth Rice,et al.  FDR and Bayesian Multiple Comparisons Rules , 2006 .

[12]  Edsel A. Peña,et al.  POWER-ENHANCED MULTIPLE DECISION FUNCTIONS CONTROLLING FAMILY-WISE ERROR AND FALSE DISCOVERY RATES. , 2009, Annals of statistics.

[13]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[14]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[15]  Bradley Efron,et al.  The Future of Indirect Evidence. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[16]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[17]  Étienne Roquain,et al.  On false discovery rate thresholding for classification under sparsity , 2011, 1106.6147.

[18]  James G. Scott,et al.  An exploration of aspects of Bayesian multiple testing , 2006 .

[19]  Wenguang Sun,et al.  Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control , 2007 .

[20]  Brian D. Ripley,et al.  Stochastic Simulation , 2005 .

[21]  P. Müller,et al.  Optimal Sample Size for Multiple Testing , 2004 .

[22]  N. J. Gordon,et al.  Approximate Non-Gaussian Bayesian Estimation and Modal Consistency , 1993 .