Structure–Adaptive Sequential Testing for Online False Discovery Rate Control

Consider the online testing of a stream of hypotheses where a real--time decision must be made before the next data point arrives. The error rate is required to be controlled at {all} decision points. Conventional \emph{simultaneous testing rules} are no longer applicable due to the more stringent error constraints and absence of future data. Moreover, the online decision--making process may come to a halt when the total error budget, or alpha--wealth, is exhausted. This work develops a new class of structure--adaptive sequential testing (SAST) rules for online false discover rate (FDR) control. A key element in our proposal is a new alpha--investment algorithm that precisely characterizes the gains and losses in sequential decision making. SAST captures time varying structures of the data stream, learns the optimal threshold adaptively in an ongoing manner and optimizes the alpha-wealth allocation across different time periods. We present theory and numerical results to show that the proposed method is valid for online FDR control and achieves substantial power gain over existing online testing rules.

[1]  L. Wasserman,et al.  False discovery control with p-value weighting , 2006 .

[2]  D. Robertson,et al.  Online control of the false discovery rate in biomedical research , 2018, 1809.07292.

[3]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[4]  Wenguang Sun,et al.  Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control , 2007 .

[5]  S. Rosset,et al.  Generalized α‐investing: definitions, optimality results and application to public databases , 2014 .

[6]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[7]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[8]  G. Lynch,et al.  The Control of the False Discovery Rate in Fixed Sequence Multiple Testing , 2016, 1611.03146.

[9]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[10]  Martin J. Wainwright,et al.  SAFFRON: an adaptive algorithm for online control of the false discovery rate , 2018, ICML.

[11]  Steve D. M. Brown,et al.  Prevalence of sexual dimorphism in mammalian phenotypic traits , 2017, Nature Communications.

[12]  Saharon Rosset,et al.  The Quality Preserving Database: A Computational Framework for Encouraging Collaboration, Enhancing Power and Controlling False Discovery , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[14]  T. Cai,et al.  Estimating the Null and the Proportion of Nonnull Effects in Large-Scale Multiple Comparisons , 2006, math/0611108.

[15]  Ang Li,et al.  Multiple testing with the structure‐adaptive Benjamini–Hochberg algorithm , 2016, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[16]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[17]  Subutai Ahmad,et al.  Unsupervised real-time anomaly detection for streaming data , 2017, Neurocomputing.

[18]  Martin J. Wainwright,et al.  Online control of the false discovery rate with decaying memory , 2017, NIPS.

[19]  Yin Xia,et al.  GAP: A General Framework for Information Pooling in Two-Sample Sparse Inference , 2020, Journal of the American Statistical Association.

[20]  William Fithian,et al.  AdaPT: an interactive procedure for multiple testing with side information , 2016, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[21]  Adel Javanmard,et al.  Online Rules for Control of False Discovery Rate and False Discovery Exceedance , 2016, ArXiv.

[22]  Harrison H. Zhou,et al.  False Discovery Rate Control With Groups , 2010, Journal of the American Statistical Association.

[23]  Dean P. Foster,et al.  α‐investing: a procedure for sequential control of expected false discoveries , 2008 .

[24]  Wenguang Sun,et al.  Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks , 2009 .

[25]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .