Inference by eye: Reading the overlap of independent confidence intervals

When 95 per cent confidence intervals (CIs) on independent means do not overlap, the two-tailed p-value is less than 0.05 and there is a statistically significant difference between the means. However, p for non-overlapping 95 per cent CIs is actually considerably smaller than 0.05: If the two CIs just touch, p is about 0.01, and the intervals can overlap by as much as about half the length of one CI arm before p becomes as large as 0.05. Keeping in mind this rule-that overlap of half the length of one arm corresponds approximately to statistical significance at p = 0.05-can be helpful for a quick appreciation of figures that display CIs, especially if precise p-values are not reported. The author investigated the robustness of this and similar rules, and found them sufficiently accurate when sample sizes are at least 10, and the two intervals do not differ in width by more than a factor of 2. The author reviewed previous discussions of CI overlap and extended the investigation to p-values other than 0.05 and 0.01. He also studied 95 per cent CIs on two proportions, and on two Pearson correlations, and found similar rules apply to overlap of these asymmetric CIs, for a very broad range of cases. Wider use of figures with 95 per cent CIs is desirable, and these rules may assist easy and appropriate understanding of such figures.

[1]  N. Schenker,et al.  Overlapping confidence intervals or standard error intervals: What do they mean in terms of statistical significance? , 2003, Journal of insect science.

[2]  James Hanley,et al.  If we're so different, why do we keep overlapping? When 1 plus 1 doesn't make 2. , 2002, CMAJ : Canadian Medical Association journal = journal de l'Association medicale canadienne.

[3]  Lloyd S. Nelson,et al.  Evaluating overlapping confidence intervals , 1989 .

[4]  V W Rahlfs,et al.  Understanding and evaluating clinical trials. , 1997, Journal of the American Academy of Dermatology.

[5]  A. J. Bailer,et al.  Comparing median lethal concentration values using confidence interval overlap or ratio tests , 2006, Environmental toxicology and chemistry.

[6]  L. Moses Graphical methods in statistical analysis. , 1987, Annual review of public health.

[7]  H. Goldstein,et al.  The Graphical Presentation of a Collection of Means , 1995 .

[8]  Donald R. Barr,et al.  Using Confidence Intervals to Test Hypotheses , 1969 .

[9]  R. Newcombe,et al.  Interval estimation for the difference between independent proportions: comparison of eleven methods. , 1998, Statistics in medicine.

[10]  Gordon M. Burghardt,et al.  Comparative Prey-Attack Studies in Newborn Snakes of the Genus Thamnophis , 1969 .

[11]  R. H. Browne,et al.  On Visual Assessment of the Significance of a Mean Difference , 1979 .

[12]  Mark E. Payton,et al.  Testing statistical hypotheses using standard error bars and confidence intervals , 2000 .

[13]  Satterthwaite Fe An approximate distribution of estimates of variance components. , 1946 .

[14]  B. L. Welch THE SIGNIFICANCE OF THE DIFFERENCE BETWEEN TWO MEANS WHEN THE POPULATION VARIANCES ARE UNEQUAL , 1938 .

[15]  Steven A. Julious,et al.  Using confidence intervals around individual means to assess statistical significance between two means , 2004 .

[16]  G. Cumming,et al.  Statistical Reform in Psychology , 2007, Psychological science.

[17]  R. Newcombe Two-sided confidence intervals for the single proportion: comparison of seven methods. , 1998, Statistics in medicine.

[18]  Peter C Austin,et al.  A brief note on overlapping confidence intervals. , 2002, Journal of vascular surgery.

[19]  D. C. Howell Statistical Methods for Psychology , 1987 .

[20]  M. Bigby,et al.  Understanding and evaluating clinical trials. , 1996, Journal of the American Academy of Dermatology.

[21]  George W. Ryan,et al.  On The Misuse Of Confidence Intervals For Two Means In Testing For The Significance Of The Difference Between The Means , 2002 .

[22]  G. Cumming,et al.  Inference by eye: confidence intervals and how to read pictures of data. , 2005, The American psychologist.

[23]  W. Tierney,et al.  Hate Speech and Academic Freedom in the Academy , 2006 .

[24]  April Rasala Lehman,et al.  A Guide to Statistical and Data Analysis Using JMP and JMP IN Software , 1999 .

[25]  R C Blair,et al.  Overlapping confidence intervals. , 1999, Journal of the American Academy of Dermatology.

[26]  N. Schenker,et al.  On Judging the Significance of Differences by Examining the Overlap Between Confidence Intervals , 2001 .

[27]  B. Schneider,et al.  Standards for Reporting on Empirical Social Science Research in AERA Publications American Educational Research Association , 2006 .

[28]  W. Tryon Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: an integrated alternative method of conducting null hypothesis statistical tests. , 2001, Psychological methods.

[29]  D. Saville,et al.  Basic statistics and the inconsistency of multiple comparison procedures. , 2003, Canadian journal of experimental psychology = Revue canadienne de psychologie experimentale.

[30]  G. Cumming,et al.  Researchers misunderstand confidence intervals and standard error bars. , 2005, Psychological methods.

[31]  Fiona Fidler,et al.  Statistical reform in medicine, psychology and ecology , 2004 .

[32]  G. Cumming Replication and p Intervals: p Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better , 2008, Perspectives on psychological science : a journal of the Association for Psychological Science.