Nonrobustness in Z, t, and F tests at large sample sizes

The alleged robustness of Z, t, and F tests against nonnormality and, when sample sizes are equal, of t and F tests against heterogeneity as well was investigated in a large-scale sampling study under conditions realistic to experimentation and testing in the behavioral sciences. Factors varied were: population shape (L or bell), σ1/σ2 (1/2, 1, or 2), size N of smallest sample (2, 4, 8, 16, 32, 64, 128, 256, 512, or 1,024), N1/N2 (1/3,1/2,1, 2, or 3), α (.05,.01, or.001), and test tailedness (left, right, or two). In about 25% of the situations investigated, the test failed to meet a very lax criterion for robustness at every examined N value less than 100, and in 8% at every value less than 1,000; no test met the criterion in all of the situations studied before N=512. Robustness was strongly influenced by all of the factors investigated, and interactions among the influencing factors were often strong and complex.