论文信息 - Safe Testing

Safe Testing

We present a new theory of hypothesis testing. The main concept is the s-value, a notion of evidence which, unlike p-values, allows for effortlessly combining evidence from several tests, even in the common scenario where the decision to perform a new test depends on the previous test outcome: safe tests based on s-values generally preserve Type-I error guarantees under such ‘optional continuation’. S-values exist for completely general testing problems with composite null and alternatives. Their prime interpretation is in terms of gambling or investing, each S-value corresponding to a particular investment. Surprisingly, optimal "GROW" S-values, which lead to fastest capital growth, are fully characterized by the joint information projection (JIPr) between the set of all Bayes marginal distributions on ${\mathcal{H}_0}$ and ${\mathcal{H}_1}$. Thus, optimal s-values also have an interpretation as Bayes factors, with priors given by the JIPr. We illustrate the theory using two classical testing scenarios: the one-sample t-test and the 2 × 2-contingency table. In the t-test setting, GROW S-values correspond to adopting the right Haar prior on the variance, like in Jeffreys’ Bayesian t-test. However, unlike Jeffreys’, the default safe t-test puts a discrete 2-point prior on the effect size, leading to better behaviour in terms of statistical power. Sharing Fisherian, Neymanian and Jeffreys-Bayesian interpretations, S-values and safe tests may provide a methodology acceptable to adherents of all three schools.

Wouter M. Koolen | P. Grünwald | R. D. Heide

[1] P. Grunwald,et al. E-Statistics, Group Invariance and Anytime Valid Testing , 2022, 2208.07610.

[2] Francesco Orabona,et al. Tight Concentrations and Confidence Sequences from the Regret of Universal Portfolio , 2021, ArXiv.

[3] Wouter M. Koolen,et al. Log-optimal anytime-valid E-values , 2021, International Journal of Approximate Reasoning.

[4] L. Pekelis,et al. Always Valid Inference: Continuous Monitoring of A/B Tests , 2021, Oper. Res..

[5] Glenn Shafer,et al. Author's reply to the Discussion of ‘Testing by betting: A strategy for statistical and scientific communication’ by Glenn Shafer , 2021, Journal of the Royal Statistical Society: Series A (Statistics in Society).

[6] J. Ziegel,et al. Valid sequential inference on probability forecast performance , 2021, Biometrika.

[7] Wouter M. Koolen,et al. Testing exchangeability: Fork-convexity, supermartingales and e-processes , 2021, Int. J. Approx. Reason..

[8] Stephen E. Fienberg,et al. Testing Statistical Hypotheses , 2005 .

[9] Aaditya Ramdas,et al. False discovery rate control with e‐values , 2020, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[10] V. Vovk,et al. Combining e-values and p-values , 2019, SSRN Electronic Journal.

[11] V. Vovk,et al. E-values: Calibration, combination, and applications , 2019 .

[12] L. Pace,et al. Likelihood, Replicability and Robbins' Confidence Sequences , 2019, International Statistical Review.

[13] T. Roos,et al. Minimum Description Length Revisited , 2019, International Journal of Mathematics for Industry.

[14] Judith ter Schure,et al. Accumulation Bias in meta-analysis: the need to consider time in error control , 2019, F1000Research.

[15] G. Shafer. The Language of Betting as a Strategy for Statistical and Scientific Communication , 2019, 1903.06991.

[16] S. Greenland,et al. Scientists rise up against statistical significance , 2019, Nature.

[17] Jon D. McAuliffe,et al. Uniform, nonparametric, non-asymptotic confidence sequences , 2018 .

[18] Peter Grünwald,et al. Optional Stopping with Bayes Factors: a categorization and extension of folklore results, with an application to invariant situations , 2018, ArXiv.

[19] O. Tamuz,et al. Stochastic Dominance under Independent Noise , 2018, Journal of Political Economy.

[20] David Gal,et al. Abandon Statistical Significance , 2017, The American Statistician.

[21] Peter Grünwald,et al. Fast Rates for General Unbounded Loss Functions: From ERM to Generalized Bayes , 2016, J. Mach. Learn. Res..

[22] N. Lazar,et al. The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[23] James O. Berger,et al. Rejection odds and rejection ratios: A proposal for statistical practice in testing hypotheses , 2015, Journal of mathematical psychology.

[24] L. Pekelis,et al. Always Valid Inference: Bringing Sequential Analysis to A/B Testing , 2015, 1512.04922.

[25] Aaditya Ramdas,et al. Sequential Nonparametric Testing with the Law of the Iterated Logarithm , 2015, UAI.

[26] V. Johnson. Revised standards for statistical evidence , 2013, Proceedings of the National Academy of Sciences.

[27] V. Johnson. UNIFORMLY MOST POWERFUL BAYESIAN TESTS. , 2013, Annals of statistics.

[28] H. Hughes,et al. Beyond the Scope of this Paper , 2013 .

[29] Peter D. Grunwald,et al. Maximum Entropy and the Glasses You Are Looking Through , 2013, 1301.3860.

[30] Peter Harremoës,et al. Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[31] M. J. Bayarri,et al. Criteria for Bayesian model choice with application to variable selection , 2012, 1209.5240.

[32] P. Grünwald,et al. Catching up faster by switching sooner: a predictive approach to adaptive estimation with an application to the AIC–BIC dilemma , 2012 .

[33] E. Knill,et al. Asymptotically optimal data analysis for rejecting local realism , 2011, 1108.2468.

[34] G. Cumming. Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis , 2011 .

[35] G. Shafer,et al. Test Martingales, Bayes Factors and p-Values , 2009, 0912.4269.

[36] Jeffrey N. Rouder,et al. Bayesian t tests for accepting and rejecting the null hypothesis , 2009, Psychonomic bulletin & review.

[37] M. Clyde,et al. Mixtures of g Priors for Bayesian Variable Selection , 2008 .

[38] Anne Gundel,et al. Robust utility maximization for complete and incomplete market models , 2005, Finance Stochastics.

[39] A. Dawid,et al. Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory , 2004, math/0410076.

[40] Péter Gács,et al. Uniform test of algorithmic randomness over a general space , 2003, Theor. Comput. Sci..

[41] Peter Grünwald,et al. The statistical strength of nonlocality proofs , 2003, IEEE Transactions on Information Theory.

[42] V. A. Monarev,et al. Using Information Theory Approach to Randomness Testing , 2003, IACR Cryptol. ePrint Arch..

[43] J. Berger,et al. Unified Conditional Frequentist and Bayesian Testing of Composite Hypotheses , 2003 .

[44] J. Berger. Could Fisher, Jeffreys and Neyman Have Agreed on Testing? , 2003 .

[45] G. Shafer,et al. Probability and Finance: It's Only a Game! , 2001 .

[46] M. J. Bayarri,et al. Calibration of ρ Values for Testing Precise Null Hypotheses , 2001 .

[47] R. Royall. On the Probability of Observing Misleading Statistical Evidence , 2000 .

[48] I. Csiszár,et al. Information projections revisited , 2000, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[49] Jorma Rissanen,et al. The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[50] David A. McAllester. Some PAC-Bayesian Theorems , 1998, COLT' 98.

[51] V. Vieland,et al. Statistical Evidence: A Likelihood Paradigm , 1998 .

[52] L. Brown,et al. A Unified Conditional Frequentist and Bayesian Test for Fixed and Sequential Simple Hypothesis Testing , 1994 .

[53] A. Barron,et al. Jeffreys' prior is asymptotically least favorable under entropy risk , 1994 .

[54] P. Walley. Statistical Reasoning with Imprecise Probabilities , 1990 .

[55] Jorma Rissanen,et al. Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[56] J. Berger. Statistical Decision Theory and Bayesian Analysis , 1988 .

[57] D. Siegmund. Sequential Analysis: Tests and Confidence Intervals , 1985 .

[58] I. Good. Good Thinking: The Foundations of Probability and Its Applications , 1983 .

[59] O. Barndorff-Nielsen. Information and Exponential Families in Statistical Theory , 1980 .

[60] J. Kiefer. Conditional Confidence Statements and Confidence Estimators , 1977 .

[61] T. Lai. On Confidence Sequences , 1976 .

[62] Edward C. Posner,et al. Random coding strategies for minimum entropy , 1975, IEEE Trans. Inf. Theory.

[63] I. Csiszár. $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[64] J. Dickey,et al. Bayes factors for independence in contingency tables , 1974 .

[65] H. Robbins. Statistical Methods Related to the Law of the Iterated Logarithm , 1970 .

[66] H. Robbins,et al. Confidence sequences for mean, variance, and median. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[67] L. Breiman. Optimal Gambling Systems for Favorable Games , 1962 .

[68] John L. Kelly,et al. A new interpretation of information rate , 1956, IRE Trans. Inf. Theory.

[69] Peter Grünwald,et al. Safe Tests and Always-Valid Confidence Intervals for contingency tables and beyond , 2021, ArXiv.

[70] M. F. Pérez-Ortiz,et al. The Safe Logrank Test: Error Control under Continuous Monitoring with Unlimited Horizon , 2021 .

[71] R. Bhattacharya,et al. Random Walk, Brownian Motion, and Martingales , 2021, Graduate Texts in Mathematics.

[72] R. Turner. Safe tests for 2 x 2 contingency tables and the Cochran-Mantel-Haenszel test , 2019 .

[73] Christopher D. Chambers,et al. Redefine statistical significance , 2017, Nature Human Behaviour.

[74] P. Grünwald,et al. Hendriksen Betting as an alternative to p-values , 2017 .

[75] Jonathon Love,et al. UvA-DARE ( Digital Academic Repository ) Default “ Gunel and Dickey ” Bayes factors for contingency tables , 2016 .

[76] L. Pericchi,et al. BAYES FACTORS AND MARGINAL DISTRIBUTIONS IN INVARIANT SITUATIONS , 2016 .

[77] Peter Grünwald,et al. Safe Probability , 2016, ArXiv.

[78] Yanbao Zhang. Analysis of tests of local realism , 2013 .

[79] J. Rissanen. Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[80] T. Lai. Martingales in Sequential Analysis and Time Series, 1945-1985 ∗ , 2009 .

[81] Susan A. Murphy,et al. Monographs on statistics and applied probability , 1990 .

[82] James O. Berger,et al. Objective Bayesian Analysis for the Multivariate Normal Model , 2006 .

[83] H. Föllmer,et al. Robust projections in the class of martingale measures , 2006 .

[84] A. Barron,et al. Estimation of mixture models , 1999 .

[85] Robert L. Wolpert,et al. Testing Simple Hypotheses , 1996 .

[86] V. Vovk. A logic of probability, with application to the foundations of statistics , 1993 .

[87] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[88] David Williams,et al. Probability with Martingales , 1991, Cambridge mathematical textbooks.

[89] M. L. Eaton. Group invariance applications in statistics , 1989 .

[90] Flemming Topsøe,et al. Information-theoretical optimization techniques , 1979, Kybernetika.

[91] J. Andel. Sequential Analysis , 2022, The SAGE Encyclopedia of Research Design.

[92] Jean-Luc Ville. Étude critique de la notion de collectif , 1939 .

[93] L. M. M.-T.. Theory of Probability , 1929, Nature.