Assessing Group Differences

This paper is about the legitimacy of certain kinds of quantitative evidence, specifically differences in educational performance between groups, especially those defined by ethnicity, gender and class. The thrust is methodological, and because the quantitative evidence ultimately is dependent on particular mathematical and statistical assumptions, something needs to be said about these. One of the useful things about mathematical and statistical models of educational realities is that, so long as one states the assumptions clearly and follows the rules correctly, one can obtain conclusions which are, in their own terms, beyond reproach. The awkward thing about these models is the snares they set for the casual user; the person who needs the conclusions, and perhaps also supplies the data, but is untrained in questioning the assumptions. What makes things more difficult is that, in trying to communicate with the casual user, the modeller is obliged to speak his or her language-to use familiar terms in an attempt to capture the essence of the model. It is hardly surprising that such an enterprise is fraught with difficulties, even when the attempt is genuinely one of honest communication rather than compliance with custom or even subtle indoctrination. An example familiar to many concerned with testing is the use of the term 'specific objectivity' by exponents of the so-called 'Rasch' model. The use of this term leaves many casual users with the erroneous impression that it implies a sound and empirically verifiable justification for whatever conclusion are being drawn (Goldstein, 1979). More pertinent to the concerns of this chapter, terms such as 'test bias' have been used by modellers to refer to group differences which have nothing necessarily to do with the common understanding of bias as distortion [1]. Some practitioners (see for example Shepard et al., 1981) have attempted to inject more precision and acceptability into this term by defining test bias thus: "A test (or item) is biased if, 'two individuals with equal ability but from different groups do not have the same probability of success' on the test or item" (my italics). If anything, such a definition clouds the issue even further since it falls back upon another term 'ability' which is undefined and indeed can only be defined in terms of other tests (or items) which do not exhibit 'bias', and the resulting circularity is fairly clear.