A unified framework for bandit multiple testing

In bandit multiple hypothesis testing, each arm corresponds to a different null hypothesis that we wish to test, and the goal is to design adaptive algorithms that correctly identify large set of interesting arms (true discoveries), while only mistakenly identifying a few uninteresting ones (false discoveries). One common metric in non-bandit multiple testing is the false discovery rate (FDR). We propose a unified, modular framework for bandit FDR control that emphasizes the decoupling of exploration and summarization of evidence. We utilize the powerful martingale-based concept of “e-processes” to ensure FDR control for arbitrary composite nulls, exploration rules and stopping times in generic problem settings. In particular, valid FDR control holds even if the reward distributions of the arms could be dependent, multiple arms may be queried simultaneously, and multiple (cooperating or competing) agents may be querying arms, covering combinatorial semi-bandit type settings as well. Prior work has considered in great detail the setting where each arm’s reward distribution is independent and sub-Gaussian, and a single arm is queried at each step. Our framework recovers matching sample complexity guarantees in this special case, and performs comparably or better in practice. For other settings, sample complexities will depend on the finer details of the problem (composite nulls being tested, exploration algorithm, data dependence structure, stopping rule) and we do not explore these; our contribution is to show that the FDR guarantee is clean and entirely agnostic to these details.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  G. Blanchard,et al.  Two simple sufficient conditions for FDR control , 2008, 0802.1406.

[3]  Alessandro Rinaldo,et al.  On the bias, risk and consistency of sample means in multi-armed bandits , 2019, SIAM J. Math. Data Sci..

[4]  Xinkun Nie,et al.  Why adaptively collected data have negative bias and how to correct for it , 2017, AISTATS.

[5]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[6]  Vladimir Vovk,et al.  Testing Randomness Online , 2019, Statistical Science.

[7]  V. Vovk,et al.  Admissible ways of merging p-values under arbitrary dependence , 2020, The Annals of Statistics.

[8]  Alexander Gammerman,et al.  Retrain or not retrain: Conformal test martingales for change-point detection , 2021, COPA.

[9]  Wouter M. Koolen,et al.  Admissible anytime-valid sequential inference must rely on nonnegative martingales. , 2020, 2009.03167.

[10]  Yajun Wang,et al.  Combinatorial Multi-Armed Bandit and Its Extension to Probabilistically Triggered Arms , 2014, J. Mach. Learn. Res..

[11]  Michal Valko,et al.  Bandits on Graphs and Structures , 2016 .

[12]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[13]  Martin J. Wainwright,et al.  A framework for Multi-A(rmed)/B(andit) Testing with Online FDR Control , 2017, NIPS.

[14]  Allan Birnbaum,et al.  Combining Independent Tests of Significance , 1954 .

[15]  Jasjeet S. Sekhon,et al.  Time-uniform, nonparametric, nonasymptotic confidence sequences , 2020, The Annals of Statistics.

[16]  V. Vovk,et al.  E-values: Calibration, combination, and applications , 2019 .

[17]  Dominik D. Freydenberger,et al.  Can We Learn to Gamble Efficiently? , 2010, COLT.

[18]  Jon D. McAuliffe,et al.  Time-uniform Chernoff bounds via nonnegative supermartingales , 2018, Probability Surveys.

[19]  Lalit Jain,et al.  A Bandit Approach to Sequential Experimental Design with False Discovery Control , 2018, NeurIPS.

[20]  J. Bartroff,et al.  Sequential tests of multiple hypotheses controlling false discovery and nondiscovery rates , 2013, Sequential analysis.

[21]  Ruosong Wang,et al.  Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration , 2017, COLT.

[22]  Andreas Krause,et al.  Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback , 2021, ALT.

[23]  Michael I. Jordan,et al.  A sequential algorithm for false discovery rate control on directed acyclic graphs , 2019, Biometrika.

[24]  Jay Bartroff,et al.  Multiple Hypothesis Tests Controlling Generalized Error Rates for Sequential Data , 2014, 1406.5933.

[25]  Ambuj Tewari,et al.  PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[26]  Wouter M. Koolen,et al.  Testing exchangeability: Fork-convexity, supermartingales and e-processes , 2021, Int. J. Approx. Reason..

[27]  Aaditya Ramdas,et al.  A general interactive framework for false discovery rate control under structural constraints , 2017, 1710.02776.

[28]  Vladimir Vovk,et al.  Game‐Theoretic Foundations for Probability and Finance , 2019, Wiley Series in Probability and Statistics.

[29]  Lalit Jain,et al.  A New Perspective on Pool-Based Active Classification and False-Discovery Control , 2020, NeurIPS.

[30]  Alessandro Rinaldo,et al.  Are sample means in multi-armed bandits positively or negatively biased? , 2019, NeurIPS.

[31]  Jay Bartroff,et al.  A Rejection Principle for Sequential Tests of Multiple Hypotheses Controlling Familywise Error Rates , 2013, Scandinavian journal of statistics, theory and applications.

[32]  Glenn Shafer,et al.  Author's reply to the Discussion of ‘Testing by betting: A strategy for statistical and scientific communication’ by Glenn Shafer , 2021, Journal of the Royal Statistical Society: Series A (Statistics in Society).

[33]  John L. Kelly,et al.  A new interpretation of information rate , 1956, IRE Trans. Inf. Theory.

[34]  Wei Chen,et al.  Combinatorial Pure Exploration of Multi-Armed Bandits , 2014, NIPS.

[35]  Tatiana Tommasi,et al.  Training Deep Networks without Learning Rates Through Coin Betting , 2017, NIPS.

[36]  J. Bartroff,et al.  Sequential Tests of Multiple Hypotheses Controlling Type I and II Familywise Error Rates. , 2013, Journal of statistical planning and inference.

[37]  Rebecca Willett,et al.  Online Learning for Changing Environments using Coin Betting , 2017, ArXiv.

[38]  Shie Mannor,et al.  From Bandits to Experts: On the Value of Side-Observations , 2011, NIPS.

[39]  Francesco Orabona,et al.  Coin Betting and Parameter-Free Online Learning , 2016, NIPS.