The Power of Batching in Multiple Hypothesis Testing

One important partition of algorithms for controlling the false discovery rate (FDR) in multiple testing is into offline and online algorithms. The first generally achieve significantly higher power of discovery, while the latter allow making decisions sequentially as well as adaptively formulating hypotheses based on past observations. Using existing methodology, it is unclear how one could trade off the benefits of these two broad families of algorithms, all the while preserving their formal FDR guarantees. To this end, we introduce $\text{Batch}_{\text{BH}}$ and $\text{Batch}_{\text{St-BH}}$, algorithms for controlling the FDR when a possibly infinite sequence of batches of hypotheses is tested by repeated application of one of the most widely used offline algorithms, the Benjamini-Hochberg (BH) method or Storey's improvement of the BH method. We show that our algorithms interpolate between existing online and offline methodology, thus trading off the best of both worlds.

[1]  K. K. Lan,et al.  Discrete sequential boundaries for clinical trials , 1983 .

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[4]  John D. Storey A direct approach to false discovery rates , 2002 .

[5]  L. Wasserman,et al.  Operating characteristics and extensions of the false discovery rate procedure , 2002 .

[6]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[7]  Alexander Gordon,et al.  Control of the mean number of false discoveries, Bonferroni and stability of multiple testing , 2007, 0709.0366.

[8]  Wenguang Sun,et al.  Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control , 2007 .

[9]  G. Blanchard,et al.  Two simple sufficient conditions for FDR control , 2008, 0802.1406.

[10]  E. J. van den Oord,et al.  Controlling false discoveries in genetic studies , 2008, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[11]  Dean P. Foster,et al.  α‐investing: a procedure for sequential control of expected false discoveries , 2008 .

[12]  S. Rosset,et al.  Generalized α‐investing: definitions, optimality results and application to public databases , 2014 .

[13]  Adel Javanmard,et al.  On Online Control of False Discovery Rate , 2015, ArXiv.

[14]  Reid A. Johnson,et al.  Calibrating Probability with Undersampling for Unbalanced Classification , 2015, 2015 IEEE Symposium Series on Computational Intelligence.

[15]  Adel Javanmard,et al.  Online Rules for Control of False Discovery Rate and False Discovery Exceedance , 2016, ArXiv.

[16]  Martin J. Wainwright,et al.  Online control of the false discovery rate with decaying memory , 2017, NIPS.

[17]  Ron Kohavi,et al.  Online Controlled Experiments and A/B Testing , 2017, Encyclopedia of Machine Learning and Data Mining.

[18]  D. Robertson,et al.  Online control of the false discovery rate in biomedical research , 2018, 1809.07292.

[19]  Martin J. Wainwright,et al.  SAFFRON: an adaptive algorithm for online control of the false discovery rate , 2018, ICML.

[20]  Aaditya Ramdas,et al.  ADDIS: adaptive algorithms for online FDR control with conservative nulls , 2019 .

[21]  Michael I. Jordan,et al.  A unified treatment of multiple testing with prior knowledge using the p-filter , 2017, The Annals of Statistics.

[22]  Michael I. Jordan,et al.  Asynchronous Online Testing of Multiple Hypotheses , 2018, J. Mach. Learn. Res..