Reexamining the Impact of Nonnormality in Two-Group Comparison Procedures

The authors performed a Monte Carlo simulation to empirically investigate the robustness and power of 4 methods in testing mean differences for 2 independent groups under conditions in which 2 populations may not demonstrate the same pattern of nonnormality. The approaches considered were the t test, Wilcoxon rank-sum test, Welch-James test with trimmed means and Winsorized variances, and a nonparametric bootstrap test. Results showed that the Wilcoxon rank-sum test and Welch-James test with trimmed means and Winsorized variances were not robust in terms of type I error control when the 2 populations showed different patterns of nonnormality. The nonparametric bootstrap test provided power advantages over the t test. The authors discuss other results from the simulation study and provide recommendations.

[1]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[2]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[3]  Scott E. Maxwell,et al.  Designing Experiments and Analyzing Data: A Model Comparison Perspective , 1990 .

[4]  Stefan Van Aelst,et al.  Fast and robust bootstrap , 2008, Stat. Methods Appl..

[5]  Debashis Kushary,et al.  Bootstrap Methods and Their Application , 2000, Technometrics.

[6]  H. J. Whitford,et al.  How to Use the Two Sample t‐Test , 1986 .

[7]  J. J. Higgins,et al.  A Comparison of the Power of Wilcoxon's Rank-Sum Statistic to that of Student'st Statistic Under Various Nonnormal Distributions , 1980 .

[8]  P. Bridge,et al.  Increasing physicians' awareness of the impact of statistics on research outcomes: comparative power of the t-test and and Wilcoxon Rank-Sum test in small samples applied research. , 1999, Journal of clinical epidemiology.

[9]  Robert A. Cribbie,et al.  Effect of non-normality on test statistics for one-way independent groups designs. , 2012, The British journal of mathematical and statistical psychology.

[10]  Xitao Fan,et al.  Designing simulation studies. , 2012 .

[11]  S. Thompson,et al.  Analysis of cost data in randomized trials: an application of the non-parametric bootstrap. , 2000, Statistics in medicine.

[12]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[13]  Rand R. Wilcox,et al.  Trimming, Transforming Statistics, And Bootstrapping: Circumventing the Biasing Effects Of Heterescedasticity And Nonnormality , 2002 .

[14]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[15]  Morten W Fagerland,et al.  The Wilcoxon–Mann–Whitney test under scrutiny , 2009, Statistics in medicine.

[16]  R. Blair,et al.  A more realistic look at the robustness and Type II error properties of the t test to departures from population normality. , 1992 .

[17]  E. Ziegel Introduction to Robust Estimation and Hypothesis Testing (2nd ed.) , 2005 .

[18]  Rand R. Wilcox,et al.  Comparing the means of two independent groups , 2007 .

[19]  Robust Means Modeling , 2012 .

[20]  R. Wilcox A Note on Testing Hypotheses about Trimmed Means , 1996 .

[21]  R. Lyman Ott.,et al.  An introduction to statistical methods and data analysis , 1977 .

[22]  James Algina,et al.  A generally robust approach for testing hypotheses and setting confidence intervals for effect sizes. , 2008, Psychological methods.

[23]  T. C. Oshima,et al.  Type I Error Rates for Welch’s Test and James’s Second-Order Test Under Nonnormality and Inequality of Variance When There Are Two Groups , 1994 .

[24]  Allen I. Fleishman A method for simulating non-normal distributions , 1978 .

[25]  H. Keselman,et al.  Modern robust data analysis methods: measures of central tendency. , 2003, Psychological methods.

[26]  A. Cross,et al.  An Experimental Evaluation of the All Stars Prevention Curriculum in a Community After School Setting , 2009, Prevention Science.

[27]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[28]  Julia Kastner,et al.  Introduction to Robust Estimation and Hypothesis Testing , 2005 .

[29]  X H Zhou,et al.  Methods for comparing the means of two independent log-normal samples. , 1997, Biometrics.

[30]  Lisa M. Lix,et al.  To Trim or Not to Trim: Tests of Location Equality Under Heteroscedasticityand Nonnormality , 1998 .

[31]  Andrew S Zieffler,et al.  Comparing Groups: Randomization and Bootstrap Methods Using R , 2011 .

[32]  András Vargha,et al.  The Effect of Nonnormality on Student's Two-Sample T Test. , 2000 .

[33]  Frederick Mosteller,et al.  Understanding robust and exploratory data analysis , 1983 .

[34]  Paul L. MacDonald Power, Type I, and Type III Error Rates of Parametric and Nonparametric Statistical Tests , 1999 .

[35]  Rand R. Wilcox,et al.  ANOVA: The practical importance of heteroscedastic methods, using trimmed means versus means, and designing simulation studies , 1995 .

[36]  T. Micceri The unicorn, the normal curve, and other improbable creatures. , 1989 .

[37]  K. Krishnamoorthy,et al.  A parametric bootstrap approach for ANOVA with unequal variances: Fixed and random models , 2007, Comput. Stat. Data Anal..