Adaptative significance levels using optimal decision rules: Balancing by weighting the error probabilities

Our purpose is to recommend a change in the paradigm of testing by generalizing a very natural idea, originated perhaps in Jeffreys (1935, 1961) and clearly exposed by DeGroot (1975), with the aim of developing an approach that is attractive to all schools of statistics, re- sulting in a procedure better suited to the needs of science. The essential idea is to base testing statistical hypotheses on minimizing a weighted sum of type I and type II error probabilities instead of the prevailing paradigm, which is fixing type I error and minimizing type II error. For simple vs. simple hypotheses, the optimal criterion is to reject the null using the likelihood ratio as the evidence (ordering) statistic, with a fixed threshold value instead of a fixed tail probability. By defining expected type I and type II errors, we generalize the weighting approach and find that the optimal region is defined by the evidence ratio, that is, a ratio of averaged likelihoods (with respect to a prior measure) and a fixed threshold. This approach yields an optimal theory in complete general- ity, which the classical theory of testing does not. This can be seen as a Bayesian/Non-Bayesian compromise: using a weighted sum of type I and type II error probabilities is Frequentist, but basing the test criterion on a ratio of marginalized likelihoods is Bayesian. We give arguments to push the theory still further, so that the weighting measures (priors) of the likelihoods do not have to be proper and highly informative, but just "well calibrated". That is, priors that give rise to the same evidence (marginal likelihoods) using minimal (smallest) training samples. The theory that emerges, similar to the theories based on objective Bayesian approaches, is a powerful response to criticisms of the pre- vailing approach of hypothesis testing. For criticisms see, for example, Ioannidis (2005 )a ndSiegfried (2010), among many others.

[1]  H. Jeffreys Some Tests of Significance, Treated by the Theory of Probability , 1935, Mathematical Proceedings of the Cambridge Philosophical Society.

[2]  James O. Berger,et al.  A Comparison of Testing Methodologies , 2008 .

[3]  I. Good The Bayes/Non-Bayes Compromise: A Brief Review , 1992 .

[4]  D. Lindley,et al.  Inference for a Bernoulli Process (a Bayesian View) , 1976 .

[5]  L. Pericchi,et al.  Changing Statistical Significance with the Amount of Information: The Adaptive α Significance Level. , 2014, Statistics & probability letters.

[6]  J. Berger,et al.  The Intrinsic Bayes Factor for Model Selection and Prediction , 1996 .

[7]  Walter Zucchini,et al.  Model Selection , 2011, International Encyclopedia of Statistical Science.

[8]  T. Siegfried Odds are, it's wrong: Science fails to face the shortcomings of statistics , 2010 .

[9]  Carlos Alberto de Bragança Pereira,et al.  ON THE CONCEPT OF P-VALUE , 1988 .

[10]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[11]  Arthur P. Dempster,et al.  The direct use of likelihood for significance testing , 1997, Stat. Comput..

[12]  L. Pericchi,et al.  BAYES FACTORS AND MARGINAL DISTRIBUTIONS IN INVARIANT SITUATIONS , 2016 .

[13]  H. Jeffreys A Treatise on Probability , 1922, Nature.

[14]  M. Degroot,et al.  Probability and Statistics , 2021, Examining an Operational Approach to Teaching Probability.

[15]  Luis R. Pericchi,et al.  Model Selection and Hypothesis Testing based on Objective Probabilities and Bayes Factors , 2005 .

[16]  P. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[17]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[18]  D. Lindley A STATISTICAL PARADOX , 1957 .