Of the many honorifics bestowed on the articles in this historical series, it is doubtful that any have had applied the best—funny. The rhetorical zest and smiling outrage that Joseph Berkson brings to his puncturing of the quasi-religious precepts of traditional statistics in his classic article 1 recalls for me a public debate I witnessed in the 1980s between a highly respected statistician and a surgeon clinical-trialist. It was a debate on issues related to the adjustment of P-values in clinical trials, and what I remember best was the entrance of the physician in full surgical regalia; green operating scrubs, face mask, shoe covers, the whole bit. Playing effectively the role of the ‘aw-shucks, I’m just a country doc who don’t know nuthin’ ‘bout statistics’ he parodied traditional statistical precepts so effectively, contrasting them unfavourably with common-sense judgements, that the statistician, however meritorious his rebuttal may have been, was left sputtering, helplessly pounding the lectern. So it seems with this commentary, which asks in an innocent yet seemingly unanswerable way, ‘If the population [of people] is not human, what is it?’ This is the leading edge of an attack on Fisher’s P-value which should still be required reading for all students of epidemiology and biostatistics today. The commentary shows us several things. First, it demonstrates just how old are some current criticisms, often presented as enlightened insights from a modern era. His first sentence has almost a nostalgic quality that looks surprising over 60 years later, ‘There was a time when we did not talk about tests of significance; we simply did them.’ These words described the future as much as the pre-1942 past. Second, although it may not be immediately obvious, the argument presented here is closely related to ones that underlie modern recommendations to use CI and even Bayesian methods in lieu of P-values in biomedical research. Third, Berkson makes important distinctions between hypothesis testing and significance tests that continue to be ignored today. Fourth, and perhaps most subtly, he brings in a notion of ‘evidence’, a positive, relative concept that is critical to have on the table as separate and distinct from the P-value. And finally, he provides modern statisticians with a model for how to communicate technical concepts to applied users in an accessible and lively way. All that said, it must be admitted that Berkson’s critique is frustratingly incomplete. While he offers a scathing critique of the P-value, and shows us how standard interpretations contravene scientific intuition (grounded mainly in appeals to common sense) he does not offer a real alternative. He does call for more research, particularly into the meaning of what he calls ‘middle P’s’. It is in this gap that I will spend most of my time in this commentary; linking his insights with the ‘further research’ that indeed occurred over the succeeding 60 years.
[1]
Taylor Francis Online,et al.
The American statistician
,
1947
.
[2]
S. Goodman,et al.
Toward Evidence-Based Medical Statistics. 2: The Bayes Factor
,
1999,
Annals of Internal Medicine.
[3]
R. Fisher,et al.
STATISTICAL METHODS AND SCIENTIFIC INDUCTION
,
1955
.
[4]
Joseph B. Kadane,et al.
Rethinking the Foundations of Statistics: Subject Index
,
1999
.
[5]
J. Cornfield.
A BAYESIAN TEST OF SOME CLASSICAL HYPOTHESES- WITH APPLICATIONS TO SEQUENTIAL CLINICAL TRIALS
,
1966
.
[6]
H. Jeffreys,et al.
The Theory of Probability
,
1896
.
[7]
G. A. Barnard,et al.
THE LOGIC OF STATISTICAL INFERENCE1
,
1972,
The British Journal for the Philosophy of Science.
[8]
S. Goodman,et al.
Evidence and scientific research.
,
1988,
American journal of public health.
[9]
J. Berger,et al.
Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence
,
1987
.
[10]
Joseph B. Kadane,et al.
Rethinking the Foundations of Statistics: Subject Index
,
1999
.
[11]
J. Cornfield.
Sequential Trials, Sequential Analysis and the Likelihood Principle
,
1966
.
[12]
S Greenland,et al.
Probability Logic and Probabilistic Induction
,
1998,
Epidemiology.
[13]
Leonard J. Savage,et al.
The foundations of statistical inference : a discussion
,
1962
.
[14]
K J Rothman,et al.
That confounded P-value.
,
1998,
Epidemiology.
[15]
L. Stein,et al.
Probability and the Weighing of Evidence
,
1950
.
[16]
E. S. Pearson,et al.
On the Problem of the Most Efficient Tests of Statistical Hypotheses
,
1933
.
[17]
M. S. Bartlett,et al.
Statistical methods and scientific inference.
,
1957
.
[18]
Joseph Berkson,et al.
Tests of significance considered as evidence
,
1942
.
[19]
R. Royall.
The Effect of Sample Size on the Meaning of Significance Tests
,
1986
.
[20]
R. Royall.
Statistical Evidence: A Likelihood Paradigm
,
1997
.
[21]
R. Fisher.
Note on Dr. Berkson's Criticism of Tests of Significance
,
1943
.
[22]
E. S. Pearson,et al.
On the Problem of the Most Efficient Tests of Statistical Hypotheses
,
1933
.