The Robustness of Two Sample Tests for Means - a Reply on Von Eye's Comment

Abstract The question, which test; the t-test or the Wilcoxon test for comparing means of two distributions has to be preferred must be answered by "the t-test". Both tests are robust against non-normality with a little advantage for the Wilcoxon test what the power concerns. But while the t-test is robust against variance heterogeneity or against discrete underlying distributions the Wilcoxon test is not. Both tests fail in robustness if observations within the samples are dependent. Such a situation should be avoided by carefully designing surveys and experiments. Key words: robustness, non-normality, variance heterogeneity, discrete distributions, autocorrelation, t-test, Wilcoxon test 1. Introductory Remarks on Robustness In a paper (Rasch & Guiard, 2004) which was commented by von Eye (2005) we collected results of a systematic robustness research which was done by a statistical research group between 1978 and 1989. Robustness of statistical procedures is mainly of interest for the application of such procedures. Robustness research should therefore be directed on often occurring or expected violations of assumptions needed for the derivation of statistical procedures. As it was shown by analysing data from different areas of application, non-normality occurs relatively often. Another often occurring violation of an assumption is variance heterogeneity in two populations. In von Eye's comment the problem of dependency within "samples" was additionally discussed. In the comment on our paper, von Eye considered only a small part of our results, namely the t-test for two independent samples. We therefore concentrate here on just this test and its non-parametric counterpart, the Wilcoxon Two-Sample test. 2. Robustness of the two sample t-test and the Wilcoxon test Because the two sample t-test is one of the most applied statistical procedures, robustness results have been published long before we started our systematic research. Posten's (1978) results mentioned in our paper was obtained by the first systematic investigation over the Pearson system of probability distributions. These results where supplemented by those of Tuchscherer & Pierer (1985) considering not only non-normality but also variance heterogeneity both in the Fleishman system of probability distributions. In further investigations we also used the system of truncated normal distributions with different sets of first four moments each. Truncation of normal distributions often occurs after selection (for instance of pupils in the school system and of course in artificial selection in agriculture). Exact and simulation results on robustness are valid only for the distributions used in the corresponding investigation. In earlier theoretical papers on which the paper of Rasch & Guiard (2004) is based we noticed this several times. If we fix the first four moments, there exist infinitely many distributions with just these four moments. In so far it is correct that robustness is parameter-specific - but this is well known. Let us first repeat the assumptions for the two-sample t-test and the two-sample Wilcoxon test for testing the hypothesis H^sub 0^ : µ^sub 1^= µ^sub 2^ of the equality of two means against a one-or two-sided alternative. We assume two continuous distributions with existing first four moments and expectations µ^sub 1^;µ^sub 2^ and both variances equal to σ^sup 2^ (the third and fourth moment is needed for the simulation only). For the t-test we assume additionally normality of the two distributions and for the Wilcoxon tests that all existing moments higher than the second one are equal in both distributions; otherwise the test will not only compare the means. We further assume that we draw two independent samples (x^sub 11^,x^sub 12^,...,x^sub 1n^) and (x^sub 21^,x^sub 22^,...,x^sub 2n^) from distribution 1 and 2 respectively. The sizes of the two samples may be unequal; this is a problem only in combination with variance heterogeneity. …