Large-scale simultaneous inference under dependence

Simultaneous, post-hoc inference is desirable in large-scale hypotheses testing as it allows for exploration of data while deciding on criteria for proclaiming discoveries. It was recently proved that all admissible posthoc inference methods for the number of true discoveries must be based on closed testing. In this paper we investigate tractable and efficient closed testing with local tests of different properties, such as monotonicty, symmetry and separability, meaning that the test thresholds a monotonic or symmetric function or a function of sums of test scores for the individual hypotheses. This class includes well-known global null tests by Fisher, Stouffer and Rüschendorf, as well as newly proposed ones based on harmonic means and Cauchy combinations. Under monotonicity, we propose a new linear time statistic (“coma”) that quantifies the cost of multiplicity adjustments. If the tests are also symmetric and separable, we develop several fast (mostly linear-time) algorithms for post-hoc inference, making closed testing tractable. Paired with recent advances in global null tests based on generalized means, our work immediately instantiates a series of simultaneous inference methods that can handle many complex dependence structures and signal compositions. We provide guidance on choosing from these methods via theoretical investigation of the conservativeness and sensitivity for different local tests, as well as simulations that find analogous behavior for local tests and full closed testing. One result of independent interest is the following: if P1, . . . , Pd are p-values from a multivariate Gaussian with arbitrary covariance, then their arithmetic average P satisfies Pr(P ≤ t) ≤ t for t ≤ 1 2d .

[1]  L. Wasserman,et al.  Exceedance Control of the False Discovery Proportion , 2006 .

[2]  E. Suchman,et al.  The American soldier: Adjustment during army life. (Studies in social psychology in World War II), Vol. 1 , 1949 .

[3]  K. S. Tan,et al.  Trade-off between validity and efficiency of merging p-values under arbitrary dependence , 2020, 2007.12366.

[4]  M. J. Frank,et al.  Best-possible bounds for the distribution of a sum — a problem of Kolmogorov , 1987 .

[5]  R. D. Gordon Values of Mills' Ratio of Area to Bounding Ordinate and of the Normal Probability Integral for Large Values of the Argument , 1941 .

[6]  V. Vovk,et al.  Combining P-Values Via Averaging , 2012, Biometrika.

[7]  Aldo Solari,et al.  Simultaneous control of all false discovery proportions in large-scale multiple hypothesis testing , 2019, Biometrika.

[8]  Jun Xie,et al.  Cauchy Combination Test: A Powerful Test With Analytic p-Value Calculation Under Arbitrary Dependency Structures , 2018, Journal of the American Statistical Association.

[9]  Fast closed testing for exchangeable local tests , 2020 .

[10]  J. Goeman,et al.  Multiple Testing for Exploratory Research , 2011, 1208.2841.

[11]  Aldo Solari,et al.  Only closed testing procedures are admissible for controlling false discovery proportions , 2019, The Annals of Statistics.

[12]  K. Gabriel,et al.  On closed testing procedures with special reference to ordered analysis of variance , 1976 .

[13]  G. Blanchard,et al.  Post hoc confidence bounds on false positives using reference families , 2020, The Annals of Statistics.

[14]  V. Zolotarev,et al.  Chance and Stability, Stable Distributions and Their Applications , 1999 .

[15]  D. Wilson Generalized mean p-values for combining dependent tests: comparison of generalized central limit theorem and robust risk analysis , 2020, Wellcome open research.

[16]  B. Gnedenko,et al.  Limit Distributions for Sums of Independent Random Variables , 1955 .

[17]  Daniel J. Wilson,et al.  The harmonic mean p-value for combining dependent tests , 2019, Proceedings of the National Academy of Sciences.

[18]  D. Donoho,et al.  Higher criticism for detecting sparse heterogeneous mixtures , 2004, math/0410072.

[19]  E. Suchman,et al.  The American Soldier: Adjustment During Army Life. , 1949 .

[20]  Stochastic Inequalities,et al.  RANDOM VARIABLES WITH MAXIMUM SUMS , 1982 .

[21]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.