In a recent article, Killeen (2005a) proposed an alternative to traditional null-hypothesis significance testing (NHST). This alternative test is based on the statistic prep, which is the probability of replicating an effect. We share Killeen’s skepticism with respect to null-hypothesis testing, and we sympathize with the proposed conceptual shift toward issues such as replicability. One of the problems associated with NHST is that p values are prone to misinterpretation (cf. Nickerson, 2000, pp. 246– 263). Another problem with NHST is that it can provide highly misleading evidence against the null hypothesis (Killeen, 2005a, p. 345): NHST can lead one to reject the null hypothesis when there is really not enough evidence to do so. Killeen’s prep statistic successfully addresses the problem of misinterpretation, and this is a major contribution (cf. Cumming, 2005; Doros & Geier, 2005; Killeen, 2005b; Macdonald, 2005). However, the prep statistic does not remedy the second, more fundamental NHST problem mentioned by Killeen. Here we perform the standard analysis to show that prep can provide misleading evidence against the null hypothesis (cf. Berger & Sellke, 1987; Edwards, Lindman, & Savage, 1963). This analysis demonstrates the discrepancy between Bayesian hypothesis testing and prep, and highlights the necessity of considering the plausibility of both the null hypothesis and the alternative hypothesis. Consider an experiment in taste perception in which a participant has to determine which of two beverage samples contains sugar. After n trials, with s successes (i.e., correct decisions) and f failures, we wish to choose between two hypotheses: H0 (i.e., random guessing) and H1 (i.e., gustatory discriminability). For inference, we use the binomial model, in which the likelihood L(y) is proportional to y(1 y), where y denotes the probability of a correct decision on any one trial. A Bayesian hypothesis test (Jeffreys, 1961) proceeds by contrasting two quantities: the probability of the observed data D given H0 (i.e., y 1⁄4 12) and the probability of the observed data D given H1 (i.e., y 6 1⁄4 12). The ratio B01 1⁄4 pðDjH0Þ=pðDjH1Þ is the Bayes factor, and it quantifies the evidence that the data provide for H0 vis-à-vis H1. Assuming equal prior plausibility for the models, the posterior probability forH0 is given byB01=ð1þ B01Þ. In the taste perception experiment, pðDjH0Þ 1⁄4 12 n . The quantity pðDjH1Þ is more difficult to calculate, because it depends on our prior beliefs about y. Specifically, when prior knowledge of y is given by a prior distribution p(y), one obtains pðDjH1Þ by integrating L(y) over all possible values of y, weighted by the prior distribution p(y): pðDjH1Þ 1⁄4 R 1 0 LðyÞpðyÞdy. We consider two classes of priors.
[1]
L. M. M.-T..
Theory of Probability
,
1929,
Nature.
[2]
J. Berger,et al.
Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence
,
1987
.
[3]
M. Kendall,et al.
Kendall's advanced theory of statistics
,
1995
.
[4]
Anthony O'Hagan,et al.
Kendall's Advanced Theory of Statistics: Vol. 2B, Bayesian Inference.
,
1996
.
[5]
A. O'Hagan,et al.
Kendall's Advanced Theory of Statistics, Vol. 2b: Bayesian Inference.
,
1996
.
[6]
R. Nickerson,et al.
Null hypothesis significance testing: a review of an old and continuing controversy.
,
2000,
Psychological methods.
[7]
G. Cumming.
Understanding the Average Probability of Replication
,
2005,
Psychological science.
[8]
R. R. Macdonald.
Why Replication Probabilities Depend on Prior Probability Distributions
,
2005,
Psychological science.
[9]
P. Killeen,et al.
An Alternative to Null-Hypothesis Significance Tests
,
2005,
Psychological science.
[10]
G. Doros,et al.
Probability of Replication Revisited
,
2005,
Psychological science.
[11]
P. Killeen.
Replicability, Confidence, and Priors
,
2005,
Psychological science.
[12]
Jie W Weiss,et al.
Bayesian Statistical Inference for Psychological Research
,
2008
.