Asynchronous Online Testing of Multiple Hypotheses

We consider the problem of asynchronous online testing, aimed at providing control of the false discovery rate (FDR) during a continual stream of data collection and testing, where each test may be a sequential test that can start and stop at arbitrary times. This setting increasingly characterizes real-world applications in science and industry, where teams of researchers across large organizations may conduct tests of hypotheses in a decentralized manner. The overlap in time and space also tends to induce dependencies among test statistics, a challenge for classical methodology, which either assumes (overly optimistically) independence or (overly pessimistically) arbitrary dependence between test statistics. We present a general framework that addresses both of these issues via a unified computational abstraction that we refer to as "conflict sets." We show how this framework yields algorithms with formal FDR guarantees under a more intermediate, local notion of dependence. We illustrate our algorithms in simulations by comparing to existing algorithms for online FDR control.

[1]  A. Albert The Sequential Design of Experiments for Infinitely Many States of Nature , 1961 .

[2]  Walter T. Federer,et al.  Sequential Design of Experiments , 1967 .

[3]  R. Khan,et al.  Sequential Tests of Statistical Hypotheses. , 1972 .

[4]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[5]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[6]  John D. Storey A direct approach to false discovery rates , 2002 .

[7]  L. Wasserman,et al.  Operating characteristics and extensions of the false discovery rate procedure , 2002 .

[8]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[9]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[10]  G. Blanchard,et al.  Two simple sufficient conditions for FDR control , 2008, 0802.1406.

[11]  Edwin J. C. G. van den Oord,et al.  Controlling false discoveries in genetic studies. , 2008 .

[12]  Dean P. Foster,et al.  α‐investing: a procedure for sequential control of expected false discoveries , 2008 .

[13]  Ashish Agarwal,et al.  Overlapping experiment infrastructure: more, better, faster experimentation , 2010, KDD.

[14]  Tara Javidi,et al.  Active Sequential Hypothesis Testing , 2012, ArXiv.

[15]  Aaron Roth,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[16]  R. Tibshirani,et al.  Sequential selection procedures and false discovery rate control , 2013, 1309.5352.

[17]  S. Rosset,et al.  Generalized α‐investing: definitions, optimality results and application to public databases , 2014 .

[18]  Avrim Blum,et al.  The Ladder: A Reliable Leaderboard for Machine Learning Competitions , 2015, ICML.

[19]  Toniann Pitassi,et al.  Preserving Statistical Validity in Adaptive Data Analysis , 2014, STOC.

[20]  Anmol Bhasin,et al.  From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks , 2015, KDD.

[21]  Ang Li,et al.  Accumulation Tests for FDR Control in Ordered Hypothesis Testing , 2015, 1505.07352.

[22]  Adel Javanmard,et al.  On Online Control of False Discovery Rate , 2015, ArXiv.

[23]  Toniann Pitassi,et al.  The reusable holdout: Preserving validity in adaptive data analysis , 2015, Science.

[24]  Lihua Lei,et al.  Power of Ordered Hypothesis Testing , 2016, ICML.

[25]  Raef Bassily,et al.  Algorithmic stability for adaptive data analysis , 2015, STOC.

[26]  Aaditya Ramdas,et al.  Sequential Nonparametric Testing with the Law of the Iterated Logarithm , 2015, UAI.

[27]  G. Lynch,et al.  The Control of the False Discovery Rate in Fixed Sequence Multiple Testing , 2016, 1611.03146.

[28]  Adel Javanmard,et al.  Online Rules for Control of False Discovery Rate and False Discovery Exceedance , 2016, ArXiv.

[29]  Martin J. Wainwright,et al.  A framework for Multi-A(rmed)/B(andit) Testing with Online FDR Control , 2017, NIPS.

[30]  Martin J. Wainwright,et al.  Online control of the false discovery rate with decaying memory , 2017, NIPS.

[31]  Steve D. M. Brown,et al.  Prevalence of sexual dimorphism in mammalian phenotypic traits , 2017, Nature Communications.

[32]  D. Robertson,et al.  Online control of the false discovery rate in biomedical research , 2018, 1809.07292.

[33]  Lalit Jain,et al.  A Bandit Approach to Multiple Testing with False Discovery Control , 2018, ArXiv.

[34]  Martin J. Wainwright,et al.  SAFFRON: an adaptive algorithm for online control of the false discovery rate , 2018, ICML.

[35]  Michael I. Jordan,et al.  A unified treatment of multiple testing with prior knowledge using the p-filter , 2017, The Annals of Statistics.

[36]  Adel Javanmard,et al.  onlineFDR: an R package to control the false discovery rate for growing data repositories , 2019, Bioinform..