A COMPARISON OF SEVERAL METHODS OF ASSESSING PARTIAL KNOWLEDGE IN MULTIPLE-CHOICE TESTS: I. SCORING PROCEDURES*

The notion of going beyond simple right-wrong scoring of multiple-choice items to assess intermediate states of knowledge has received considerable attention in the last 10 years. The techniques proposed have involved differential weighting of the item response alternatives. The accumulated literature (to 1970) on differential weightingof not only item alternatives but also item scores-has been well summarized by Wang and Stanley (1970). Motivated by the apparent continued optimism surrounding these techniques (see, e.g., Hambleton, Roberts, & Traub, 1970; Patnaik & Traub, 1973), the authors conducted a large-scale study of the effects, on reliability and validity, of differential option weighting. Two independent variables were examined: (1) the manner in which examinees are instructed to respond, and (2) the manner in which responses obtained by some method are scored. To investigate these two variables, a large sample was randomly divided into three experimental test-taking groups. The effects of different item-response instructions are dealt with in a forthcoming paper (Hakstian & Kansup, 1975). The effects of different scoring procedures, applied to the same set of responses, are of concern in the present paper. In the present study, one group of 346 subjects responded to multiple-choice items after being given conventional test-taking instructions, i.e., simply marking the correct answer. There have been attempts in the past to increase the reliability and validity of conventionally administered tests by scoring incorrect choices according to some a priori-determined degree of correctness each possesses, a scoring procedure we refer to as logical weighting. Somewhat similar is empirical weighting, in which the weights for each option of each item are determined by their contributions to the overall psychometric qualities of the test, for the particular examinee sample. Investigations of the latter-usually employing modifications of Guttman's (1941) procedure-have shown insubstantial increases in reliability and no increase in validity (Davis & Fifer, 1959; Hendrickson, 1971; Sabers & White, 1969). Some investigations of logical weighting have shown increases in reliability for logically-weighted scores (Nedelsky, 1954; Patnaik & Traub, 1973), whereas at least one study (Hambleton et al., 1970) has shown a decrease (although not statistically significant). None of these studies has demonstrated a statistically significant increase in validity. In the present study, the tests were scored both conventionally and by a logical weighting procedure. Internal consistency was compared for the two scoring procedures, as well as, unlike earlier

[1]  Leverne S. Collet Elimination Scoring: An Empirical Evaluation. , 1971 .

[2]  Kenneth D. Hopkins,et al.  Validity and Reliability Consequences of Confidence Weighting , 1973 .

[3]  G. Glass,et al.  Statistical methods in education and psychology , 1970 .

[4]  Ronald K. Hambleton,et al.  A COMPARISON OF THE RELIABILITY AND VALIDITY OF TWO METHODS FOR ASSESSING PARTIAL KNOWLEDGE ON A MULTIPLE-CHOICE TEST , 1970 .

[5]  L. Guttman,et al.  The Quantification of a class of attributes : A theory and method of scale construction , 1941 .

[6]  Ross E. Traub,et al.  DIFFERENTIAL WEIGHTING BY JUDGED DEGREE1OF CORRECTNESS , 1973 .

[7]  R. Koehler A COMPARISON OF THE VALIDITIES OF CONVENTIONAL CHOICE TESTING AND VARIOUS CONFIDENCE MARKING PROCEDURES , 1971 .

[8]  L. Nedelsky Ability to Avoid Gross Error as a Measure of Achievment , 1954 .

[9]  Joan J. Michael THE RELIABILITY OF A MULTIPLE-CHOICE EXAMINATION UNDER VARIOUS TEST-TAKING INSTRUCTIONS1 , 1968 .

[10]  E. H. Shuford,et al.  Admissible probability measurement procedures , 1966, Psychometrika.

[11]  R. Ebel CONFIDENCE WEIGHTING AND TEST RELIABILITY1 , 1965 .

[12]  Clyde H. Coombs,et al.  The Assessment of Partial Knowledge1 , 1956 .

[13]  Robert M. Rippey A COMPARISON OF FIVE DIFFERENT SCORING FUNCTIONS FOR CONFIDENCE TESTS1 , 1970 .

[14]  G. Hendrickson THE EFFECT OF DIFFERENTIAL OPTION WEIGHTING ON MULTIPLE‐CHOICE OBJECTIVE TESTS1 , 1971 .

[15]  John Schmid,et al.  Some Modifications of the Multiple-Choice Item , 1953 .

[16]  Frederick B. Davis,et al.  The Effect on Test Reliability and Validity of Scoring Aptitude and Achievement Tests With Weights for Every Choice , 1959 .

[17]  D. Sabers,et al.  THE EFFECT OF DIFFERENTIAL WEIGHTING OF INDIVIDUAL ITEM RESPONSES ON THE PREDICTIVE VALIDITY AND RELIABILITY OF AN APTITUDE TEST , 1969 .

[18]  Julian C. Stanley,et al.  Differential Weighting: A Review of Methods and Empirical Studies1 , 1970 .

[19]  Ruth B. Ekstrom,et al.  MANUAL FOR KIT OF REFERENCE TESTS FOR COGNITIVE FACTORS (REVISED 1963) , 1963 .

[20]  B. deFinetti,et al.  METHODS FOR DISCRIMINATING LEVELS OF PARTIAL KNOWLEDGE CONCERNING A TEST ITEM. , 1965, The British journal of mathematical and statistical psychology.

[21]  Frederic M. Lord,et al.  FORMULA SCORING AND NUMBER-RIGHT SCORING1 , 1975 .

[22]  Leonard S. Feldt,et al.  A test of the hypothesis that cronbach's alpha or kuder-richardson coefficent twenty is the same for two tests , 1969 .