Pairwise Multiple Comparison Tests when Data are Nonnormal

Numerous authors suggest that the data gathered by investigators are not normal in shape. Accordingly, methods for assessing pairwise multiple comparisons of means with traditional statistics will frequently result in biased rates of Type I error and depressed power to detect effects. One solution is to obtain a critical value to assess statistical significance through bootstrap methods. The SAS system can be used to conduct step-down bootstrapped tests. The authors investigated this approach when data were neither normal in form nor equal in variability in balanced and unbalanced designs. They found that the step-down bootstrap method resulted in substantially inflated rates of error when variances and group sizes were negatively paired. Based on their results, and those reported elsewhere, the authors recommend that researchers should use trimmed means and Winsorized variances with a heteroscedastic test statistic. When group sizes are equal, the bootstrap procedure effectively controlled Type I error rates.

[1]  Thomas M. Loughin,et al.  Data Analysis by Resampling: Concepts and Applications , 2001, Technometrics.

[2]  H. Keselman,et al.  Using Trimmed Means to Compare K Measures Corresponding to Two Independent Groups , 2001, Multivariate behavioral research.

[3]  Y. Hochberg,et al.  Multiple Comparisons and Multiple Tests , 2000 .

[4]  Carl J. Huberty,et al.  Statistical Practices of Educational Researchers: An Analysis of their ANOVA, MANOVA, and ANCOVA Analyses , 1998 .

[5]  Bruno D. Zumbo,et al.  Investigation of the Robust Rank-Order Test for Non-Normal Populations with Unequal Variances: The Case of Reaction Time , 1997 .

[6]  P. Westfall,et al.  Multiple Tests with Discrete Distributions , 1997 .

[7]  Rand R. Wilcox,et al.  ANOVA: The practical importance of heteroscedastic methods, using trimmed means versus means, and designing simulation studies , 1995 .

[8]  R. Wilcox ANOVA: A Paradigm for Low Power and Misleading Measures of Effect Size? , 1995 .

[9]  Léopold Simar,et al.  Computer Intensive Methods in Statistics , 1994 .

[10]  Rand R. Wilcox,et al.  A one-way random effects model for trimmed means , 1994 .

[11]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[12]  R. Blair,et al.  A more realistic look at the robustness and Type II error properties of the t test to departures from population normality. , 1992 .

[13]  S. Sheather,et al.  Robust Estimation & Testing: Staudte/Robust , 1990 .

[14]  S. Sheather,et al.  Robust Estimation and Testing , 1990 .

[15]  Y. Hochberg A sharper Bonferroni procedure for multiple tests of significance , 1988 .

[16]  A. Hayter The Maximum Familywise Error Rate of Fisher's Least Significant Difference Test , 1986 .

[17]  J. Shaffer Modified Sequentially Rejective Multiple Test Procedures , 1986 .

[18]  G. A. Barnard,et al.  Comparing the Means of Two Independent Samples , 1984 .

[19]  A. M. Gross Confidence Interval Robustness with Long-Tailed Symmetric Distributions , 1976 .

[20]  R. H. Moore,et al.  Statistical Distributions: A Handbook for Students and Practitioners , 1975 .

[21]  W. J. Dixon,et al.  The approximate behaviour and performance of the two-sample trimmed t , 1973 .

[22]  W. R. Buckland,et al.  Contributions to Probability and Statistics , 1960 .

[23]  B. L. Welch ON THE COMPARISON OF SEVERAL MEAN VALUES: AN ALTERNATIVE APPROACH , 1951 .

[24]  B. L. Welch THE SIGNIFICANCE OF THE DIFFERENCE BETWEEN TWO MEANS WHEN THE POPULATION VARIANCES ARE UNEQUAL , 1938 .

[25]  E. S. Pearson THE ANALYSIS OF VARIANCE IN CASES OF NON-NORMAL VARIATION , 1931 .

[26]  Lisa M. Lix,et al.  Multiple comparison procedures for trimmed means , 1998 .

[27]  R. Wilcox Three Multiple Comparison Procedures for Trimmed Means , 1995 .

[28]  Rand R. Wilcox,et al.  Some Results on the Tukey-Mclaughlin and Yuen Methods for Trimmed Means when Distributions are Skewed , 1994 .

[29]  T. Micceri The unicorn, the normal curve, and other improbable creatures. , 1989 .

[30]  J. Miller,et al.  A warning about median reaction time. , 1988, Journal of experimental psychology. Human perception and performance.

[31]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .