Empirical Bayes cumulative $\ell$-value multiple testing procedure for sparse sequences

In the sparse sequence model, we consider a popular Bayesian multiple testing procedure and investigate for the first time its behaviour from the frequentist point of view. Given a spike–and–slab prior on the high-dimensional sparse unknown parameter, one can easily compute posterior probabilities of coming from the spike, which correspond to the well known local-fdr values [25], also called l–values. The spike–and–slab weight parameter is calibrated in an empirical Bayes fashion, using marginal maximum likelihood. The multiple testing procedure under study, called here the cumulative l–value procedure, ranks coordinates according to their empirical l–values and thresholds so that the cumulative ranked sum does not exceed a user–specified level t. We validate the use of this method from the multiple testing perspective: for alternatives of appropriately large signal strength, the false discovery rate (FDR) of the procedure is shown to converge to the target level t, while its false negative rate (FNR) goes to 0. We complement this study by providing convergence rates for the method. Additionally, we prove that the q–value multiple testing procedure [40, 14] shares similar convergence rates in this model. MSC2020 subject classifications: Primary 62G20 Secondary 62G07, 62G15.

[1]  M. Stephens,et al.  Solving the Empirical Bayes Normal Means Problem with Correlated Noise , 2018, 1812.07488.

[2]  David Gerard,et al.  Empirical Bayes shrinkage and false discovery rate estimation, allowing for unwanted variation. , 2017, Biostatistics.

[3]  Etienne Roquain,et al.  New FDR bounds for discrete and heterogeneous tests , 2018 .

[4]  Guillermo Durand Adaptive $p$-value weighting with power optimality , 2017, Electronic Journal of Statistics.

[5]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[6]  Xiongzhi Chen,et al.  Multiple testing with discrete data: Proportion of true null hypotheses and two adaptive FDR procedures , 2014, Biometrical journal. Biometrische Zeitschrift.

[7]  Ang Li,et al.  Multiple testing with the structure‐adaptive Benjamini–Hochberg algorithm , 2016, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[8]  Aad van der Vaart,et al.  Uncertainty Quantification for the Horseshoe (with Discussion) , 2016, 1607.01892.

[9]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[10]  Debashis Ghosh,et al.  A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE , 2008 .

[11]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[12]  Matthew Stephens,et al.  False discovery rates: a new deal , 2016, bioRxiv.

[13]  Anders M. Dale,et al.  Covariate-modulated local false discovery rate for genome-wide association studies , 2014, Bioinform..

[14]  M. Newton Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis , 2008 .

[15]  Étienne Roquain,et al.  Optimal weighting for false discovery rate control , 2008, 0807.4081.

[16]  B. Efron Size, power and false discovery rates , 2007, 0710.2245.

[17]  Étienne Roquain,et al.  Graph inference with clustering and false discovery rate control , 2019, 1907.10176.

[18]  Weidong Liu Gaussian graphical model estimation with false discovery rate control , 2013, 1306.0976.

[19]  Adel Javanmard,et al.  False Discovery Rate Control via Debiased Lasso , 2018, Electronic Journal of Statistics.

[20]  S. Ghosal,et al.  Bayesian inference in high-dimensional models , 2021, 2101.04491.

[21]  Uncertainty quantification for the horseshoe , 2016 .

[22]  Ron Shamir,et al.  Extracting replicable associations across multiple studies: Empirical Bayes algorithms for controlling the false discovery rate , 2016, PLoS Comput. Biol..

[23]  I. Castillo,et al.  Spike and slab empirical Bayes sparse credible sets , 2018, Bernoulli.

[24]  C. Robert,et al.  Optimal Sample Size for Multiple Testing : the Case of Gene Expression Mi roarraysPeter , 2004 .

[25]  Wenguang Sun,et al.  Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control , 2007 .

[26]  Judith B. Zaugg,et al.  Data-driven hypothesis weighting increases detection power in genome-scale multiple testing , 2016, Nature Methods.

[27]  Bradley Efron,et al.  Microarrays, Empirical Bayes and the Two-Groups Model. Rejoinder. , 2008, 0808.0572.

[28]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[29]  Wenguang Sun,et al.  CARS: Covariate Assisted Ranking and Screening for Large-Scale Two-Sample Inference , 2018 .

[30]  E. Candès,et al.  A knockoff filter for high-dimensional selective inference , 2016, The Annals of Statistics.

[31]  Wei Jiang,et al.  Controlling the joint local false discovery rate is more powerful than meta‐analysis methods in joint analysis of summary statistics from multiple genome‐wide association studies , 2016, Bioinform..

[32]  I. Johnstone,et al.  Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences , 2004, math/0410088.

[33]  Weijie J. Su,et al.  SLOPE-ADAPTIVE VARIABLE SELECTION VIA CONVEX OPTIMIZATION. , 2014, The annals of applied statistics.

[34]  J. Salomond Risk quantification for the thresholding rule for multiple testing using Gaussian scale mixtures , 2017, 1711.08705.

[35]  I. Castillo,et al.  Empirical Bayes analysis of spike and slab posterior distributions , 2018, 1801.01696.

[36]  Thorsten Dickhaus,et al.  Simultaneous Statistical Inference: With Applications in the Life Sciences , 2014 .

[37]  Elisabeth Gassiat,et al.  Multiple Testing in Nonparametric Hidden Markov Models: An Empirical Bayes Approach , 2021 .

[38]  A. Schwartzman,et al.  The Empirical Distribution of a Large Number of Correlated Normal Variables , 2015, Journal of the American Statistical Association.

[39]  Gilles Blanchard,et al.  Adaptive False Discovery Rate Control under Independence and Dependence , 2009, J. Mach. Learn. Res..

[40]  Étienne Roquain,et al.  On spike and slab empirical Bayes multiple testing , 2018, The Annals of Statistics.

[41]  Wenguang Sun,et al.  Large‐scale multiple testing under dependence , 2009 .

[42]  Namgil Lee,et al.  An improvement on local FDR analysis applied to functional MRI data , 2016, Journal of Neuroscience Methods.

[43]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[44]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[45]  Y. Benjamini,et al.  Adaptive linear step-up procedures that control the false discovery rate , 2006 .