Safe Testing

We present a new theory of hypothesis testing. The main concept is the s-value, a notion of evidence which, unlike p-values, allows for effortlessly combining evidence from several tests, even in the common scenario where the decision to perform a new test depends on the previous test outcome: safe tests based on s-values generally preserve Type-I error guarantees under such ‘optional continuation’. S-values exist for completely general testing problems with composite null and alternatives. Their prime interpretation is in terms of gambling or investing, each S-value corresponding to a particular investment. Surprisingly, optimal "GROW" S-values, which lead to fastest capital growth, are fully characterized by the joint information projection (JIPr) between the set of all Bayes marginal distributions on ${\mathcal{H}_0}$ and ${\mathcal{H}_1}$. Thus, optimal s-values also have an interpretation as Bayes factors, with priors given by the JIPr. We illustrate the theory using two classical testing scenarios: the one-sample t-test and the 2 × 2-contingency table. In the t-test setting, GROW S-values correspond to adopting the right Haar prior on the variance, like in Jeffreys’ Bayesian t-test. However, unlike Jeffreys’, the default safe t-test puts a discrete 2-point prior on the effect size, leading to better behaviour in terms of statistical power. Sharing Fisherian, Neymanian and Jeffreys-Bayesian interpretations, S-values and safe tests may provide a methodology acceptable to adherents of all three schools.

[1]  P. Grunwald,et al.  E-Statistics, Group Invariance and Anytime Valid Testing , 2022, 2208.07610.

[2]  Francesco Orabona,et al.  Tight Concentrations and Confidence Sequences from the Regret of Universal Portfolio , 2021, ArXiv.

[3]  Wouter M. Koolen,et al.  Log-optimal anytime-valid E-values , 2021, International Journal of Approximate Reasoning.

[4]  L. Pekelis,et al.  Always Valid Inference: Continuous Monitoring of A/B Tests , 2021, Oper. Res..

[5]  Glenn Shafer,et al.  Author's reply to the Discussion of ‘Testing by betting: A strategy for statistical and scientific communication’ by Glenn Shafer , 2021, Journal of the Royal Statistical Society: Series A (Statistics in Society).

[6]  J. Ziegel,et al.  Valid sequential inference on probability forecast performance , 2021, Biometrika.

[7]  Wouter M. Koolen,et al.  Testing exchangeability: Fork-convexity, supermartingales and e-processes , 2021, Int. J. Approx. Reason..

[8]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[9]  Aaditya Ramdas,et al.  False discovery rate control with e‐values , 2020, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[10]  V. Vovk,et al.  Combining e-values and p-values , 2019, SSRN Electronic Journal.

[11]  V. Vovk,et al.  E-values: Calibration, combination, and applications , 2019 .

[12]  L. Pace,et al.  Likelihood, Replicability and Robbins' Confidence Sequences , 2019, International Statistical Review.

[13]  T. Roos,et al.  Minimum Description Length Revisited , 2019, International Journal of Mathematics for Industry.

[14]  Judith ter Schure,et al.  Accumulation Bias in meta-analysis: the need to consider time in error control , 2019, F1000Research.

[15]  G. Shafer The Language of Betting as a Strategy for Statistical and Scientific Communication , 2019, 1903.06991.

[16]  S. Greenland,et al.  Scientists rise up against statistical significance , 2019, Nature.

[17]  Jon D. McAuliffe,et al.  Uniform, nonparametric, non-asymptotic confidence sequences , 2018 .

[18]  Peter Grünwald,et al.  Optional Stopping with Bayes Factors: a categorization and extension of folklore results, with an application to invariant situations , 2018, ArXiv.

[19]  O. Tamuz,et al.  Stochastic Dominance under Independent Noise , 2018, Journal of Political Economy.

[20]  David Gal,et al.  Abandon Statistical Significance , 2017, The American Statistician.

[21]  Peter Grünwald,et al.  Fast Rates for General Unbounded Loss Functions: From ERM to Generalized Bayes , 2016, J. Mach. Learn. Res..

[22]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[23]  James O. Berger,et al.  Rejection odds and rejection ratios: A proposal for statistical practice in testing hypotheses , 2015, Journal of mathematical psychology.

[24]  L. Pekelis,et al.  Always Valid Inference: Bringing Sequential Analysis to A/B Testing , 2015, 1512.04922.

[25]  Aaditya Ramdas,et al.  Sequential Nonparametric Testing with the Law of the Iterated Logarithm , 2015, UAI.

[26]  V. Johnson Revised standards for statistical evidence , 2013, Proceedings of the National Academy of Sciences.

[27]  V. Johnson UNIFORMLY MOST POWERFUL BAYESIAN TESTS. , 2013, Annals of statistics.

[28]  H. Hughes,et al.  Beyond the Scope of this Paper , 2013 .

[29]  Peter D. Grunwald,et al.  Maximum Entropy and the Glasses You Are Looking Through , 2013, 1301.3860.

[30]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[31]  M. J. Bayarri,et al.  Criteria for Bayesian model choice with application to variable selection , 2012, 1209.5240.

[32]  P. Grünwald,et al.  Catching up faster by switching sooner: a predictive approach to adaptive estimation with an application to the AIC–BIC dilemma , 2012 .

[33]  E. Knill,et al.  Asymptotically optimal data analysis for rejecting local realism , 2011, 1108.2468.

[34]  G. Cumming Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis , 2011 .

[35]  G. Shafer,et al.  Test Martingales, Bayes Factors and p-Values , 2009, 0912.4269.

[36]  Jeffrey N. Rouder,et al.  Bayesian t tests for accepting and rejecting the null hypothesis , 2009, Psychonomic bulletin & review.

[37]  M. Clyde,et al.  Mixtures of g Priors for Bayesian Variable Selection , 2008 .

[38]  Anne Gundel,et al.  Robust utility maximization for complete and incomplete market models , 2005, Finance Stochastics.

[39]  A. Dawid,et al.  Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory , 2004, math/0410076.

[40]  Péter Gács,et al.  Uniform test of algorithmic randomness over a general space , 2003, Theor. Comput. Sci..

[41]  Peter Grünwald,et al.  The statistical strength of nonlocality proofs , 2003, IEEE Transactions on Information Theory.

[42]  V. A. Monarev,et al.  Using Information Theory Approach to Randomness Testing , 2003, IACR Cryptol. ePrint Arch..

[43]  J. Berger,et al.  Unified Conditional Frequentist and Bayesian Testing of Composite Hypotheses , 2003 .

[44]  J. Berger Could Fisher, Jeffreys and Neyman Have Agreed on Testing? , 2003 .

[45]  G. Shafer,et al.  Probability and Finance: It's Only a Game! , 2001 .

[46]  M. J. Bayarri,et al.  Calibration of ρ Values for Testing Precise Null Hypotheses , 2001 .

[47]  R. Royall On the Probability of Observing Misleading Statistical Evidence , 2000 .

[48]  I. Csiszár,et al.  Information projections revisited , 2000, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[49]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[50]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[51]  V. Vieland,et al.  Statistical Evidence: A Likelihood Paradigm , 1998 .

[52]  L. Brown,et al.  A Unified Conditional Frequentist and Bayesian Test for Fixed and Sequential Simple Hypothesis Testing , 1994 .

[53]  A. Barron,et al.  Jeffreys' prior is asymptotically least favorable under entropy risk , 1994 .

[54]  P. Walley Statistical Reasoning with Imprecise Probabilities , 1990 .

[55]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[56]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[57]  D. Siegmund Sequential Analysis: Tests and Confidence Intervals , 1985 .

[58]  I. Good Good Thinking: The Foundations of Probability and Its Applications , 1983 .

[59]  O. Barndorff-Nielsen Information and Exponential Families in Statistical Theory , 1980 .

[60]  J. Kiefer Conditional Confidence Statements and Confidence Estimators , 1977 .

[61]  T. Lai On Confidence Sequences , 1976 .

[62]  Edward C. Posner,et al.  Random coding strategies for minimum entropy , 1975, IEEE Trans. Inf. Theory.

[63]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[64]  J. Dickey,et al.  Bayes factors for independence in contingency tables , 1974 .

[65]  H. Robbins Statistical Methods Related to the Law of the Iterated Logarithm , 1970 .

[66]  H. Robbins,et al.  Confidence sequences for mean, variance, and median. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[67]  L. Breiman Optimal Gambling Systems for Favorable Games , 1962 .

[68]  John L. Kelly,et al.  A new interpretation of information rate , 1956, IRE Trans. Inf. Theory.

[69]  Peter Grünwald,et al.  Safe Tests and Always-Valid Confidence Intervals for contingency tables and beyond , 2021, ArXiv.

[70]  M. F. Pérez-Ortiz,et al.  The Safe Logrank Test: Error Control under Continuous Monitoring with Unlimited Horizon , 2021 .

[71]  R. Bhattacharya,et al.  Random Walk, Brownian Motion, and Martingales , 2021, Graduate Texts in Mathematics.

[72]  R. Turner Safe tests for 2 x 2 contingency tables and the Cochran-Mantel-Haenszel test , 2019 .

[73]  Christopher D. Chambers,et al.  Redefine statistical significance , 2017, Nature Human Behaviour.

[74]  P. Grünwald,et al.  Hendriksen Betting as an alternative to p-values , 2017 .

[75]  Jonathon Love,et al.  UvA-DARE ( Digital Academic Repository ) Default “ Gunel and Dickey ” Bayes factors for contingency tables , 2016 .

[76]  L. Pericchi,et al.  BAYES FACTORS AND MARGINAL DISTRIBUTIONS IN INVARIANT SITUATIONS , 2016 .

[77]  Peter Grünwald,et al.  Safe Probability , 2016, ArXiv.

[78]  Yanbao Zhang Analysis of tests of local realism , 2013 .

[79]  J. Rissanen Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[80]  T. Lai Martingales in Sequential Analysis and Time Series, 1945-1985 ∗ , 2009 .

[81]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[82]  James O. Berger,et al.  Objective Bayesian Analysis for the Multivariate Normal Model , 2006 .

[83]  H. Föllmer,et al.  Robust projections in the class of martingale measures , 2006 .

[84]  A. Barron,et al.  Estimation of mixture models , 1999 .

[85]  Robert L. Wolpert,et al.  Testing Simple Hypotheses , 1996 .

[86]  V. Vovk A logic of probability, with application to the foundations of statistics , 1993 .

[87]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[88]  David Williams,et al.  Probability with Martingales , 1991, Cambridge mathematical textbooks.

[89]  M. L. Eaton Group invariance applications in statistics , 1989 .

[90]  Flemming Topsøe,et al.  Information-theoretical optimization techniques , 1979, Kybernetika.

[91]  J. Andel Sequential Analysis , 2022, The SAGE Encyclopedia of Research Design.

[92]  Jean-Luc Ville Étude critique de la notion de collectif , 1939 .

[93]  L. M. M.-T. Theory of Probability , 1929, Nature.