Effect size for comparing two or more normal distributions based on maximal contrasts in outcomes

Effect size is a concept that can be especially useful in bioequivalence and studies designed to find important and not just statistically significant differences among responses to treatments based on independent random samples. We develop and explore a new effect size related to a maximal superiority ordering for assessing the separation among two or more normal distributions, possibly having different means and different variances. Confidence intervals and tests of hypothesis for this effect size are developed using a p value obtained by averaging over a distribution on variances. Since there is almost always some difference among treatments, instead of the usual hypothesis test of exactly no effect, researchers should consider testing that an appropriate effect size has at least, or at most, some meaningful magnitude, when one is available, possibly established using the framework developed here. A simulation study of type I error rate, power and interval length is presented. R-code for constructing the confidence intervals and carrying out the tests here can be downloaded from Author’s website.

[1]  Gwowen Shieh,et al.  Confidence intervals and sample size calculations for the weighted eta-squared effect sizes in one-way heteroscedastic ANOVA , 2012, Behavior Research Methods.

[2]  J. W. Tilton The measurement of overlapping. , 1937 .

[3]  Carl J. Huberty,et al.  Group Overlap as a Basis for Effect Size , 2000 .

[4]  D. Bonett Confidence intervals for standardized linear contrasts of means. , 2008, Psychological methods.

[5]  R G Staudte,et al.  Interval estimates of weighted effect sizes in the one-way heteroscedastic ANOVA. , 2006, The British journal of mathematical and statistical psychology.

[6]  C. J. Huberty,et al.  The Efficacy of two Improvement-Over-Chance Effect Sizes for Two-Group Univariate Comparisons under Variance Heterogeneity and Nonnormality , 2001 .

[7]  Jennifer J. Richler,et al.  Effect size estimates: current use, calculations, and interpretation. , 2012, Journal of experimental psychology. General.

[8]  James Algina,et al.  A generally robust approach for testing hypotheses and setting confidence intervals for effect sizes. , 2008, Psychological methods.

[9]  Xiao-Li Meng,et al.  Posterior Predictive $p$-Values , 1994 .

[10]  P. I. Nelson,et al.  Consistency of p-Values Obtained by Averaging Over Nuisance Parameters , 2013 .

[11]  Ken Kelley,et al.  Confidence Intervals for Standardized Effect Sizes: Theory, Application, and Implementation , 2007 .

[12]  Henry Rouanet,et al.  Bayesian methods for assessing importance of effects. , 1996 .

[13]  A. V. D. Vaart,et al.  Asymptotic Statistics: Frontmatter , 1998 .

[14]  John M. Ferron,et al.  Interval Estimates of Multivariate Effect Sizes , 2007 .

[15]  R. Grissom,et al.  Effect Sizes for Research : Univariate and Multivariate Applications, Second Edition , 2005 .

[16]  K. McGraw,et al.  A common language effect size statistic. , 1992 .

[17]  J. H. Steiger,et al.  Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. , 2004, Psychological methods.

[18]  Separation Among Distributions Related by Linear Regression , 2003 .

[19]  Bradley Efron,et al.  Large-scale inference , 2010 .

[20]  An asymptotically distribution freetest for assessing the separationbetween two distributions , 1993 .

[21]  R. Kuehl Design of Experiments: Statistical Principles of Research Design and Analysis , 1999 .

[22]  R. Wilcox Introduction to Robust Estimation and Hypothesis Testing , 1997 .

[23]  Testing for a separation between two normal distributions , 1989 .

[24]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[25]  R. Newcombe,et al.  Confidence intervals for an effect size measure based on the Mann–Whitney statistic. Part 2: asymptotic methods and evaluation , 2006, Statistics in medicine.

[26]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[27]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[28]  B. W. Turnbull,et al.  Nonparametric Methods for Evaluating Diagnostic Tests , 1992 .

[29]  M. J. Bayarri,et al.  P Values for Composite Null Models , 2000 .

[30]  S. Stigler Do Robust Estimators Work with Real Data , 1977 .

[31]  R. H. Browne,et al.  The t-Test p Value and Its Relationship to the Effect Size and P(X>Y) , 2010 .

[32]  Rand R. Wilcox,et al.  Measuring effect size: a robust heteroscedastic approach for two or more groups , 2011 .