Post hoc false positive control for structured hypotheses

In a high‐dimensional multiple testing framework, we present new confidence bounds on the false positives contained in subsets S of selected null hypotheses. These bounds are post hoc in the sense that the coverage probability holds simultaneously over all S, possibly chosen depending on the data. This article focuses on the common case of structured null hypotheses, for example, along a tree, a hierarchy, or geometrically (spatially or temporally). Following recent advances in post hoc inference, we build confidence bounds for some prespecified forest‐structured subsets and deduce a bound for any subset S by interpolation. The proposed bounds are shown to improve substantially previous ones when the signal is locally structured. Our findings are supported both by theoretical results and numerical experiments. Moreover, our bounds can be obtained by an algorithm (with complexity bilinear in the sizes of the reference hierarchy and of the selected subset) that is implemented in the open‐source R package sansSouci available from https://github.com/pneuvial/sanssouci, making our approach operational.

[1]  H. Scheffé A METHOD FOR JUDGING ALL CONTRASTS IN THE ANALYSIS OF VARIANCE , 1953 .

[2]  J. Kiefer,et al.  Asymptotic Minimax Character of the Sample Distribution Function and of the Classical Multinomial Estimator , 1956 .

[3]  D. Cox A note on data-splitting for the evaluation of significance levels , 1975 .

[4]  K. Gabriel,et al.  On closed testing procedures with special reference to ordered analysis of variance , 1976 .

[5]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[6]  R. Simes,et al.  An improved Bonferroni procedure for multiple tests of significance , 1986 .

[7]  P. Massart The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality , 1990 .

[8]  John D. Storey A direct approach to false discovery rates , 2002 .

[9]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[10]  L. Wasserman,et al.  A stochastic process approach to false discovery control , 2004, math/0406519.

[11]  D. Geman,et al.  Hierarchical testing designs for pattern recognition , 2005, math/0507421.

[12]  Y. Benjamini,et al.  False Discovery Rate–Adjusted Multiple Confidence Intervals for Selected Parameters , 2005 .

[13]  L. Wasserman,et al.  Exceedance Control of the False Discovery Proportion , 2006 .

[14]  Nicolai Meinshausen,et al.  False Discovery Control for Multiple Tests of Association Under General Dependence , 2006 .

[15]  N. Meinshausen Hierarchical testing of variable importance , 2008 .

[16]  Ulrich Mansmann,et al.  Multiple testing on the directed acyclic graph of gene ontology , 2008, Bioinform..

[17]  D. Yekutieli Hierarchical False Discovery Rate–Controlling Methodology , 2008 .

[18]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data , 2009 .

[19]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data: Methods and Models , 2009 .

[20]  Gilles Blanchard,et al.  Adaptive False Discovery Rate Control under Independence and Dependence , 2009, J. Mach. Learn. Res..

[21]  J. Goeman,et al.  The Sequential Rejection Principle of Familywise Error Control , 2010, 1211.3313.

[22]  Étienne Roquain,et al.  Spatial Clustering of Array CGH Features in Combination with Hierarchical Multiple Testing , 2010, Statistical applications in genetics and molecular biology.

[23]  Sven P. Heinrich,et al.  Multiple testing along a tree , 2010 .

[24]  A. Farcomeni,et al.  A conservative estimator for the proportion of false nulls based on Dvoretzky, Kiefer and Wolfowitz inequality , 2011 .

[25]  J. Goeman,et al.  Multiple Testing for Exploratory Research , 2011, 1208.2841.

[26]  A. Buja,et al.  Valid post-selection inference , 2013, 1306.1059.

[27]  R. Tibshirani,et al.  Exact Post-Selection Inference for Sequential Regression Procedures , 2014, 1401.3889.

[28]  R. Tibshirani,et al.  Selecting the number of principal components: estimation of the true rank of a noisy matrix , 2014, 1410.8260.

[29]  Peter Bühlmann,et al.  High-dimensional variable screening and bias in subsequent inference, with an empirical comparison , 2013, Computational Statistics.

[30]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[31]  Dennis L. Sun,et al.  Optimal Inference After Model Selection , 2014, 1410.2597.

[32]  Yoav Benjamini,et al.  Selective inference on multiple families of hypotheses , 2014 .

[33]  N. Meinshausen,et al.  High-Dimensional Inference: Confidence Intervals, $p$-Values and R-Software hdi , 2014, 1408.4026.

[34]  J. Goeman,et al.  A multiple testing method for hypotheses structured in a directed acyclic graph , 2015, Biometrical journal. Biometrische Zeitschrift.

[35]  Jonathan Taylor,et al.  Statistical learning and selective inference , 2015, Proceedings of the National Academy of Sciences.

[36]  A region-based multiple testing method for hypotheses ordered in space or time , 2015, Statistical applications in genetics and molecular biology.

[37]  Aaditya Ramdas,et al.  Simultaneous high-probability bounds on the false discovery proportion in structured, regression and online settings , 2018 .

[38]  William Fithian,et al.  AdaPT: an interactive procedure for multiple testing with side information , 2016, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[39]  Joseph P. Romano,et al.  A New Approach for Large Scale Multiple Testing with Application to FDR Control for Graphically Structured Hypotheses , 2018, 1812.00258.

[40]  Robert Tibshirani,et al.  Post‐selection inference for ℓ1 ‐penalized likelihood models , 2016, The Canadian journal of statistics = Revue canadienne de statistique.

[41]  J. Hess,et al.  Analysis of variance , 2018, Transfusion.

[42]  J. Goeman,et al.  False discovery proportion estimation by permutations: confidence for significance analysis of microarrays , 2018 .

[43]  G. Blanchard,et al.  On the Post Selection Inference constant under Restricted Isometry Properties , 2018, 1804.07566.

[44]  Michael I. Jordan,et al.  A sequential algorithm for false discovery rate control on directed acyclic graphs , 2019, Biometrika.

[45]  F. Bachoc,et al.  Valid confidence intervals for post-model-selection predictors , 2014, The Annals of Statistics.

[46]  Joe Cheng,et al.  Web Application Framework for R [R package shiny version 1.5.0] , 2020 .

[47]  Asaf Weinstein,et al.  Online Control of the False Coverage Rate and False Sign Rate , 2019, ICML.

[48]  G. Blanchard,et al.  Post hoc confidence bounds on false positives using reference families , 2020, The Annals of Statistics.