论文信息 - A COMPARISON OF SEVERAL METHODS OF ASSESSING PARTIAL KNOWLEDGE IN MULTIPLE-CHOICE TESTS: I. SCORING PROCEDURES*

A COMPARISON OF SEVERAL METHODS OF ASSESSING PARTIAL KNOWLEDGE IN MULTIPLE-CHOICE TESTS: I. SCORING PROCEDURES*

The notion of going beyond simple right-wrong scoring of multiple-choice items to assess intermediate states of knowledge has received considerable attention in the last 10 years. The techniques proposed have involved differential weighting of the item response alternatives. The accumulated literature (to 1970) on differential weightingof not only item alternatives but also item scores-has been well summarized by Wang and Stanley (1970). Motivated by the apparent continued optimism surrounding these techniques (see, e.g., Hambleton, Roberts, & Traub, 1970; Patnaik & Traub, 1973), the authors conducted a large-scale study of the effects, on reliability and validity, of differential option weighting. Two independent variables were examined: (1) the manner in which examinees are instructed to respond, and (2) the manner in which responses obtained by some method are scored. To investigate these two variables, a large sample was randomly divided into three experimental test-taking groups. The effects of different item-response instructions are dealt with in a forthcoming paper (Hakstian & Kansup, 1975). The effects of different scoring procedures, applied to the same set of responses, are of concern in the present paper. In the present study, one group of 346 subjects responded to multiple-choice items after being given conventional test-taking instructions, i.e., simply marking the correct answer. There have been attempts in the past to increase the reliability and validity of conventionally administered tests by scoring incorrect choices according to some a priori-determined degree of correctness each possesses, a scoring procedure we refer to as logical weighting. Somewhat similar is empirical weighting, in which the weights for each option of each item are determined by their contributions to the overall psychometric qualities of the test, for the particular examinee sample. Investigations of the latter-usually employing modifications of Guttman's (1941) procedure-have shown insubstantial increases in reliability and no increase in validity (Davis & Fifer, 1959; Hendrickson, 1971; Sabers & White, 1969). Some investigations of logical weighting have shown increases in reliability for logically-weighted scores (Nedelsky, 1954; Patnaik & Traub, 1973), whereas at least one study (Hambleton et al., 1970) has shown a decrease (although not statistically significant). None of these studies has demonstrated a statistically significant increase in validity. In the present study, the tests were scored both conventionally and by a logical weighting procedure. Internal consistency was compared for the two scoring procedures, as well as, unlike earlier

A. Ralph Hakstian | A. Hakstian | Wanlop Kansup | Wanlop. Kansup