论文信息 - Replication and p Intervals: p Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better - 字舞流文

Replication and p Intervals: p Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better

Replication is fundamental to science, so statistical analysis should give information about replication. Because p values dominate statistical analysis in psychology, it is important to ask what p says about replication. The answer to this question is “Surprisingly little.” In one simulation of 25 repetitions of a typical experiment, p varied from <.001 to .76, thus illustrating that p is a very unreliable measure. This article shows that, if an initial experiment results in two-tailed p = .05, there is an 80% chance the one-tailed p value from a replication will fall in the interval (.00008, .44), a 10% chance that p <.00008, and fully a 10% chance that p >.44. Remarkably, the interval—termed a p interval—is this wide however large the sample size. p is so unreliable and gives such dramatically vague information that it is a poor basis for inference. Confidence intervals, however, give much better information about replication. Researchers should minimize the role of p by using confidence intervals and model-fitting techniques and by adopting meta-analytic thinking.

[1] R. A. Fisher,et al. Statistical methods and scientific inference. , 1957 .

[2] Robert Rosenthal,et al. The Interpretation of Levels of Significance by Psychological Researchers , 1963 .

[3] Robert Rosenthal,et al. Further Evidence for the Cliff Effect in the Interpretation of Levels of Significance , 1964 .

[4] A. Tversky,et al. BELIEF IN THE LAW OF SMALL NUMBERS , 1971, Pediatrics.

[5] Rory A. Fisher,et al. Statistical methods and scientific inference. , 1957 .

[6] P. Meehl. Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. , 1978 .

[7] M. Mulkay,et al. Replication and Mere Replication , 1986 .

[8] M. Oakes. Statistical Inference: A Commentary for the Social and Behavioural Sciences , 1986 .

[9] J. Berger,et al. Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence , 1987 .

[10] S. Goodman,et al. Evidence and scientific research. , 1988, American journal of public health.

[11] R. Rosenthal,et al. Statistical Procedures and the Justification of Knowledge in Psychological Science , 1989 .

[12] John E. Hunter,et al. Methods of Meta-Analysis , 1989 .

[13] S. Goodman,et al. A comment on replication, p-values and evidence. , 1992, Statistics in medicine.

[14] Jacob Cohen. The earth is round (p < .05) , 1994 .

[15] R. Abelson. Statistics As Principled Argument , 1995 .

[16] A. Greenwald,et al. Effect sizes and p values: what should be reported and what should be replicated? , 1996, Psychophysiology.

[17] F. Schmidt. Statistical Significance Testing and Cumulative Knowledge in Psychology: Implications for Training of Researchers , 1996 .

[18] B. Thompson. Editorial Policies Regarding Statistical Significance Testing : Three Suggested Reforms , 2012 .

[19] B. Thompson. Research news and Comment: AERA Editorial Policies Regarding Statistical Significance Testing: Three Suggested Reforms , 1996 .

[20] R T O'Neill,et al. The behavior of the P-value when the alternative hypothesis is true. , 1997, Biometrics.

[21] William K. Estes,et al. On the communication of information by displays of standard errors and confidence intervals , 1997 .

[22] Peter Dixon,et al. Why scientists valuep values , 1998 .

[23] Stephen Turnbull. Alphabet Soup , 1999 .

[24] E. Samuel-Cahn,et al. P Values as Random Variables—Expected P Values , 1999 .

[25] Leland Wilkinson,et al. Statistical Methods in Psychology Journals Guidelines and Explanations , 2005 .

[26] G. Cumming,et al. A Primer on the Understanding, Use, and Calculation of Confidence Intervals that are Based on Central and Noncentral Distributions , 2001 .

[27] S. Goodman,et al. Of P-values and Bayes: a modest proposal. , 2001, Epidemiology.

[28] Emil J. Posavac,et al. Using p values to estimate the probability of a statistically significant replication. , 2002 .

[29] Heiko Haller,et al. Misinterpretations of significance: A problem students share with their teachers? , 2002 .

[30] S. Maxwell. The persistence of underpowered studies in psychological research: causes, consequences, and remedies. , 2004, Psychological methods.

[31] Rex B. Kline,et al. Beyond Significance Testing: Reforming Data Analysis Methods in Behavioral Research , 2004 .

[32] G. Cumming,et al. Replication and Researchers' Understanding of Confidence Intervals and Standard Error Bars. , 2004 .

[33] Raymond Hubbard,et al. Alphabet Soup , 2004 .

[34] G. Gigerenzer. Mindless statistics , 2004 .

[35] Fiona Fidler,et al. Statistical reform in medicine, psychology and ecology , 2004 .

[36] G. Cumming. Understanding the Average Probability of Replication , 2005, Psychological science.

[37] J. Ioannidis. Contradicted and initially stronger effects in highly cited clinical research. , 2005, JAMA.

[38] P. Killeen,et al. An Alternative to Null-Hypothesis Significance Tests , 2005, Psychological science.

[39] G. Cumming,et al. Inference by eye: confidence intervals and how to read pictures of data. , 2005, The American psychologist.

[40] Geoff Cumming,et al. Confidence intervals and replication: where will the next mean fall? , 2006, Psychological methods.

[41] B. Thompson. Foundations of behavioral statistics : an insight-based approach , 2006 .

[42] Neil Thomason,et al. Impact of Criticism of Null‐Hypothesis Significance Testing on Statistical Reporting Practices in Conservation Biology , 2006, Conservation biology : the journal of the Society for Conservation Biology.

[43] P. Killeen. Beyond statistical inference: a decision theory for science. , 2006, Psychonomic bulletin & review.

[44] Peter R. Kileen. Beyond statistical inference: A decision theory for science , 2006 .

[45] Geoff Cumming,et al. Inference by Eye: Pictures of Confidence Intervals and Thinking About Levels of Confidence , 2007 .

[46] E. Wagenmakers. A practical solution to the pervasive problems ofp values , 2007, Psychonomic bulletin & review.

[47] The New Stats: Attitudes for the Twenty-First Century , 2008 .

[48] J. Osborne. Best Practices in Quantitative Methods , 2009 .