The Effect of Limitations on the Number of Criterion Score Values on the Significance Level of theF-Test

Many educational and psychological measurements are made with instruments that yield only a limited number of score values. Ratings and course grades, for example, are frequently recorded on a five-point scale. The effectiveness of various educational procedures, such as those used in counseling, may be evaluated on a four-point scale. Pupil traits in non-academic areas are sometimes assessed on a threepoint scale—"above average," "average," or "below average." In situations where change in status is measured, the assessment may be simply one of "Improved" or "No change." If such measures, numerically coded, are taken in an experiment and the investigator is concerned solely with means, he may be tempted to analyze the data via the techniques of analysis of variance. This is especially likely if there are several factors of interest in the experiment, and one or more of the hypotheses are concerned with interactions between factors. The issue is then raised, "Can data of this kind be validly analyzed via F-tests?" Investigation of this question was the primary purpose of this study.