Even statisticians are not immune to misinterpretations of Null Hypothesis Significance Tests

We investigated the way experienced users interpret Null Hypothesis Significance Testing (NHST) outcomes. An empirical study was designed to compare the reactions of two populations of NHST users, psychological researchers and professional applied statisticians, when faced with contradictory situations. The subjects were presented with the results of an experiment designed to test the efficacy of a drug by comparing two groups (treatment/placebo). Four situations were constructed by combining the outcome of the t test (significant vs. nonsignificant) and the observed difference between the two means D (large vs. small). Two of these situations appeared as conflicting (t significant/D small and t nonsignificant/D large). Three fundamental aspects of statistical inference of statistical inference were investigated by means of open questions: drawing inductive conclusions about the magnitude of the true difference from the data in hand, making predictions for future data, and making decisions about stopping ...

[1]  Robert Rosenthal,et al.  The Interpretation of Levels of Significance by Psychological Researchers , 1963 .

[2]  Neil Thomason,et al.  Reporting of statistical inference in the Journal of Applied Psychology : Little evidence of reform. , 2001 .

[3]  S. Goodman,et al.  The Use of Predicted Confidence Intervals When Planning Experiments and the Misuse of Power When Interpreting Results , 1994, Annals of Internal Medicine.

[4]  Leland Wilkinson,et al.  Statistical Methods in Psychology Journals Guidelines and Explanations , 2005 .

[5]  Heiko Haller,et al.  Misinterpretations of significance: A problem students share with their teachers? , 2002 .

[6]  B. Thompson Research news and Comment: AERA Editorial Policies Regarding Statistical Significance Testing: Three Suggested Reforms , 1996 .

[7]  Henry Rouanet,et al.  Bayesian methods for assessing importance of effects. , 1996 .

[8]  R. Falk,et al.  Significance Tests Die Hard , 1995 .

[9]  Eduard Brandstätter,et al.  Confidence Intervals as an Alternative to Significance Testing , 1999 .

[10]  J. Poitevineau Méthodologie de l'analyse des données expérimentales : Etude de la pratique des tests statistiques chez les chercheures en psychologie, approche normative, prescriptive et descriptive , 1998 .

[11]  K. Schmidt Statistical Tests and Estimations , 1995 .

[12]  Edwin G. Boring,et al.  Mathematical vs. scientific significance. , 1919 .

[13]  Henry Rouanet New Ways in Statistical Methodology: From Significance Tests to Bayesian Inference , 2000 .

[14]  G. Cumming,et al.  A Primer on the Understanding, Use, and Calculation of Confidence Intervals that are Based on Central and Noncentral Distributions , 2001 .

[15]  B. Thompson,et al.  Research news and Comment: A National Survey of AERA Members’ Perceptions of Statistical Significance Tests and Other Statistical Issues , 2000 .

[16]  G. Gigerenzer How to Make Cognitive Illusions Disappear: Beyond “Heuristics and Biases” , 1991 .

[17]  Bruno Lecoutre,et al.  Two useful distributions for Bayesian predictive procedures under normal models , 1999 .

[18]  Robert Rosenthal,et al.  Contemporary Issues in the Analysis of Data: A Survey of 551 Psychologists , 1993 .

[19]  B. Lecoutre,et al.  Interpretation of significance levels by psychological researchers: The .05 cliff effect may be overstated , 2001, Psychonomic bulletin & review.

[20]  M. Oakes Statistical Inference: A Commentary for the Social and Behavioural Sciences , 1986 .

[21]  Patricia Snyder,et al.  Guidelines for Reporting Results of Group Quantitative Investigations , 2000 .

[22]  Henry Rouanet,et al.  Predictive Judgments in Situations of Statistical Analysis , 1993 .

[23]  D. Bakan,et al.  The test of significance in psychological research. , 1966, Psychological bulletin.

[24]  Marks R. Nester,et al.  An Applied Statistician's Creed , 1996 .

[25]  L. V. Jones,et al.  A sensible formulation of the significance test. , 2000, Psychological methods.

[26]  Lawrence D. Phillips,et al.  Bayesian Statistics for Social Scientists. , 1973 .

[27]  J. L. Rogers,et al.  Using significance tests to evaluate equivalence between two experimental groups. , 1993, Psychological bulletin.

[28]  P. Freeman,et al.  The role of p-values in analysing trial results. , 1993, Statistics in medicine.

[29]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[30]  A. Tversky,et al.  Subjective Probability: A Judgment of Representativeness , 1972 .

[31]  John T. E. Richardson Measures of effect size , 1996 .

[32]  Robert Rosenthal,et al.  Interpretation of significance levels and effect sizes by psychological researchers. , 1986 .

[33]  Gerd Gigerenzer,et al.  How to Make Cognitive Illusions Disappear , 2002 .

[34]  B. Lecoutre,et al.  Bayesian predictive approach for inference about proportions. , 1995, Statistics in medicine.

[35]  R. P. Carver The Case Against Statistical Significance Testing , 1978 .

[36]  Bruce Thompson,et al.  A National Survey of AERA Members' Perceptions of Statistical Significance Tests and Other Statistical Issues. , 2000 .

[37]  H. Gordon,et al.  American Vocational Education Research Association Members' Perceptions of Statistical Significance Tests and Other Statistical Controversies. , 2001 .

[38]  R. Frick Accepting the null hypothesis , 1995, Memory & cognition.

[39]  Jacques Poitevineau,et al.  Uses, Abuses and Misuses of Significance Tests in the Scientific Community: Won't the Bayesian Choice be Unavoidable? , 2001 .

[40]  M. Masson,et al.  Using confidence intervals in within-subject designs , 1994, Psychonomic bulletin & review.

[41]  F. Schmidt Statistical Significance Testing and Cumulative Knowledge in Psychology: Implications for Training of Researchers , 1996 .

[42]  E. R. Harcum Methodological versus empirical literature: Two views on casual acceptance of the null hypothesis. , 1990 .

[43]  Jacques Poitevineau,et al.  Aller au-delà des tests de signification traditionnels : vers de nouvelles normes de publication , 2000 .

[44]  S. Goodman Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy , 1999, Annals of Internal Medicine.