Replication and p Intervals: p Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better

Replication is fundamental to science, so statistical analysis should give information about replication. Because p values dominate statistical analysis in psychology, it is important to ask what p says about replication. The answer to this question is “Surprisingly little.” In one simulation of 25 repetitions of a typical experiment, p varied from <.001 to .76, thus illustrating that p is a very unreliable measure. This article shows that, if an initial experiment results in two-tailed p = .05, there is an 80% chance the one-tailed p value from a replication will fall in the interval (.00008, .44), a 10% chance that p <.00008, and fully a 10% chance that p >.44. Remarkably, the interval—termed a p interval—is this wide however large the sample size. p is so unreliable and gives such dramatically vague information that it is a poor basis for inference. Confidence intervals, however, give much better information about replication. Researchers should minimize the role of p by using confidence intervals and model-fitting techniques and by adopting meta-analytic thinking.

[1]  R. A. Fisher,et al.  Statistical methods and scientific inference. , 1957 .

[2]  Robert Rosenthal,et al.  The Interpretation of Levels of Significance by Psychological Researchers , 1963 .

[3]  Robert Rosenthal,et al.  Further Evidence for the Cliff Effect in the Interpretation of Levels of Significance , 1964 .

[4]  A. Tversky,et al.  BELIEF IN THE LAW OF SMALL NUMBERS , 1971, Pediatrics.

[5]  Rory A. Fisher,et al.  Statistical methods and scientific inference. , 1957 .

[6]  P. Meehl Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. , 1978 .

[7]  M. Mulkay,et al.  Replication and Mere Replication , 1986 .

[8]  M. Oakes Statistical Inference: A Commentary for the Social and Behavioural Sciences , 1986 .

[9]  J. Berger,et al.  Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence , 1987 .

[10]  S. Goodman,et al.  Evidence and scientific research. , 1988, American journal of public health.

[11]  R. Rosenthal,et al.  Statistical Procedures and the Justification of Knowledge in Psychological Science , 1989 .

[12]  John E. Hunter,et al.  Methods of Meta-Analysis , 1989 .

[13]  S. Goodman,et al.  A comment on replication, p-values and evidence. , 1992, Statistics in medicine.

[14]  Jacob Cohen The earth is round (p < .05) , 1994 .

[15]  R. Abelson Statistics As Principled Argument , 1995 .

[16]  A. Greenwald,et al.  Effect sizes and p values: what should be reported and what should be replicated? , 1996, Psychophysiology.

[17]  F. Schmidt Statistical Significance Testing and Cumulative Knowledge in Psychology: Implications for Training of Researchers , 1996 .

[18]  B. Thompson Editorial Policies Regarding Statistical Significance Testing : Three Suggested Reforms , 2012 .

[19]  B. Thompson Research news and Comment: AERA Editorial Policies Regarding Statistical Significance Testing: Three Suggested Reforms , 1996 .

[20]  R T O'Neill,et al.  The behavior of the P-value when the alternative hypothesis is true. , 1997, Biometrics.

[21]  William K. Estes,et al.  On the communication of information by displays of standard errors and confidence intervals , 1997 .

[22]  Peter Dixon,et al.  Why scientists valuep values , 1998 .

[23]  Stephen Turnbull Alphabet Soup , 1999 .

[24]  E. Samuel-Cahn,et al.  P Values as Random Variables—Expected P Values , 1999 .

[25]  Leland Wilkinson,et al.  Statistical Methods in Psychology Journals Guidelines and Explanations , 2005 .

[26]  G. Cumming,et al.  A Primer on the Understanding, Use, and Calculation of Confidence Intervals that are Based on Central and Noncentral Distributions , 2001 .

[27]  S. Goodman,et al.  Of P-values and Bayes: a modest proposal. , 2001, Epidemiology.

[28]  Emil J. Posavac,et al.  Using p values to estimate the probability of a statistically significant replication. , 2002 .

[29]  Heiko Haller,et al.  Misinterpretations of significance: A problem students share with their teachers? , 2002 .

[30]  S. Maxwell The persistence of underpowered studies in psychological research: causes, consequences, and remedies. , 2004, Psychological methods.

[31]  Rex B. Kline,et al.  Beyond Significance Testing: Reforming Data Analysis Methods in Behavioral Research , 2004 .

[32]  G. Cumming,et al.  Replication and Researchers' Understanding of Confidence Intervals and Standard Error Bars. , 2004 .

[33]  Raymond Hubbard,et al.  Alphabet Soup , 2004 .

[34]  G. Gigerenzer Mindless statistics , 2004 .

[35]  Fiona Fidler,et al.  Statistical reform in medicine, psychology and ecology , 2004 .

[36]  G. Cumming Understanding the Average Probability of Replication , 2005, Psychological science.

[37]  J. Ioannidis Contradicted and initially stronger effects in highly cited clinical research. , 2005, JAMA.

[38]  P. Killeen,et al.  An Alternative to Null-Hypothesis Significance Tests , 2005, Psychological science.

[39]  G. Cumming,et al.  Inference by eye: confidence intervals and how to read pictures of data. , 2005, The American psychologist.

[40]  Geoff Cumming,et al.  Confidence intervals and replication: where will the next mean fall? , 2006, Psychological methods.

[41]  B. Thompson Foundations of behavioral statistics : an insight-based approach , 2006 .

[42]  Neil Thomason,et al.  Impact of Criticism of Null‐Hypothesis Significance Testing on Statistical Reporting Practices in Conservation Biology , 2006, Conservation biology : the journal of the Society for Conservation Biology.

[43]  P. Killeen Beyond statistical inference: a decision theory for science. , 2006, Psychonomic bulletin & review.

[44]  Peter R. Kileen Beyond statistical inference: A decision theory for science , 2006 .

[45]  Geoff Cumming,et al.  Inference by Eye: Pictures of Confidence Intervals and Thinking About Levels of Confidence , 2007 .

[46]  E. Wagenmakers A practical solution to the pervasive problems ofp values , 2007, Psychonomic bulletin & review.

[47]  The New Stats: Attitudes for the Twenty-First Century , 2008 .

[48]  J. Osborne Best Practices in Quantitative Methods , 2009 .