Safe Testing

We present a new theory of hypothesis testing. The main concept is the s-value, a notion of evidence which, unlike p-values, allows for effortlessly combining evidence from several tests, even in the common scenario where the decision to perform a new test depends on the previous test outcome: safe tests based on s-values generally preserve Type-I error guarantees under such ‘optional continuation’. S-values exist for completely general testing problems with composite null and alternatives. Their prime interpretation is in terms of gambling or investing, each S-value corresponding to a particular investment. Surprisingly, optimal "GROW" S-values, which lead to fastest capital growth, are fully characterized by the joint information projection (JIPr) between the set of all Bayes marginal distributions on ${\mathcal{H}_0}$ and ${\mathcal{H}_1}$. Thus, optimal s-values also have an interpretation as Bayes factors, with priors given by the JIPr. We illustrate the theory using two classical testing scenarios: the one-sample t-test and the 2 × 2-contingency table. In the t-test setting, GROW S-values correspond to adopting the right Haar prior on the variance, like in Jeffreys’ Bayesian t-test. However, unlike Jeffreys’, the default safe t-test puts a discrete 2-point prior on the effect size, leading to better behaviour in terms of statistical power. Sharing Fisherian, Neymanian and Jeffreys-Bayesian interpretations, S-values and safe tests may provide a methodology acceptable to adherents of all three schools.

[1]  Christopher D. Chambers,et al.  Redefine statistical significance , 2017, Nature Human Behaviour.

[2]  V. Johnson Revised standards for statistical evidence , 2013, Proceedings of the National Academy of Sciences.

[3]  M. L. Eaton Group invariance applications in statistics , 1989 .

[4]  J. Dickey,et al.  Bayes factors for independence in contingency tables , 1974 .

[5]  Péter Gács,et al.  Uniform test of algorithmic randomness over a general space , 2003, Theor. Comput. Sci..

[6]  V. Johnson UNIFORMLY MOST POWERFUL BAYESIAN TESTS. , 2013, Annals of statistics.

[7]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[8]  L. Pekelis,et al.  Always Valid Inference: Bringing Sequential Analysis to A/B Testing , 2015, 1512.04922.

[9]  Jeffrey N. Rouder,et al.  Bayesian t tests for accepting and rejecting the null hypothesis , 2009, Psychonomic bulletin & review.

[10]  V. A. Monarev,et al.  Using Information Theory Approach to Randomness Testing , 2003, IACR Cryptol. ePrint Arch..

[11]  Peter Grünwald,et al.  Fast Rates for General Unbounded Loss Functions: From ERM to Generalized Bayes , 2016, J. Mach. Learn. Res..

[12]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[13]  G. Shafer,et al.  Test Martingales, Bayes Factors and p-Values , 2009, 0912.4269.

[14]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[15]  I. Good Good Thinking: The Foundations of Probability and Its Applications , 1983 .

[16]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[17]  O. Barndorff-Nielsen Information and Exponential Families in Statistical Theory , 1980 .

[18]  Peter Grünwald,et al.  Safe Probability , 2016, ArXiv.

[19]  G. Shafer,et al.  Probability and Finance: It's Only a Game! , 2001 .

[20]  R. Royall On the Probability of Observing Misleading Statistical Evidence , 2000 .

[21]  A. Barron,et al.  Estimation of mixture models , 1999 .

[22]  John L. Kelly,et al.  A new interpretation of information rate , 1956, IRE Trans. Inf. Theory.

[23]  L. Brown,et al.  A Unified Conditional Frequentist and Bayesian Test for Fixed and Sequential Simple Hypothesis Testing , 1994 .

[24]  L. Pericchi,et al.  BAYES FACTORS AND MARGINAL DISTRIBUTIONS IN INVARIANT SITUATIONS , 2016 .

[25]  M. J. Bayarri,et al.  Criteria for Bayesian model choice with application to variable selection , 2012, 1209.5240.

[26]  G. Shafer The Language of Betting as a Strategy for Statistical and Scientific Communication , 2019, 1903.06991.

[27]  J. Berger,et al.  Unified Conditional Frequentist and Bayesian Testing of Composite Hypotheses , 2003 .

[28]  A. Dawid,et al.  Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory , 2004, math/0410076.

[29]  J. Kiefer Conditional Confidence Statements and Confidence Estimators , 1977 .

[30]  Sander Greenland,et al.  Scientists rise up against statistical significance , 2019, Nature.

[31]  Edward C. Posner,et al.  Random coding strategies for minimum entropy , 1975, IEEE Trans. Inf. Theory.

[32]  J. F. C. Kingman,et al.  Information and Exponential Families in Statistical Theory , 1980 .

[33]  V. Vovk A logic of probability, with application to the foundations of statistics , 1993 .

[34]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[35]  J. Berger Could Fisher, Jeffreys and Neyman Have Agreed on Testing? , 2003 .

[36]  David Gal,et al.  Abandon Statistical Significance , 2017, The American Statistician.

[37]  P. Walley Statistical Reasoning with Imprecise Probabilities , 1990 .

[38]  M. Clyde,et al.  Mixtures of g Priors for Bayesian Variable Selection , 2008 .

[39]  Jean-Luc Ville Étude critique de la notion de collectif , 1939 .

[40]  Jonathon Love,et al.  UvA-DARE ( Digital Academic Repository ) Default “ Gunel and Dickey ” Bayes factors for contingency tables , 2016 .

[41]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[42]  T. Lai Martingales in Sequential Analysis and Time Series, 1945-1985 ∗ , 2009 .

[43]  M. J. Bayarri,et al.  Calibration of ρ Values for Testing Precise Null Hypotheses , 2001 .

[44]  James O. Berger,et al.  Rejection odds and rejection ratios: A proposal for statistical practice in testing hypotheses , 2015, Journal of mathematical psychology.

[45]  L. M. M.-T. Theory of Probability , 1929, Nature.

[46]  Robert L. Wolpert,et al.  Testing Simple Hypotheses , 1996 .

[47]  Peter D. Grunwald,et al.  Maximum Entropy and the Glasses You Are Looking Through , 2013, 1301.3860.

[48]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[49]  Peter Grünwald,et al.  Optional Stopping with Bayes Factors: a categorization and extension of folklore results, with an application to invariant situations , 2018, ArXiv.

[50]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[51]  G. Cumming Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis , 2011 .

[52]  P. Grünwald,et al.  Hendriksen Betting as an alternative to p-values , 2017 .

[53]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.