A Comparative Study of Measures of Partial Knowledge in Multiple-Choice Tests

A common belief among many test experts is that measurements obtained from multiple-choice (MC) tests can be improved by using evidence about partial knowledge. A large number of methods designed to extract such information from direct reports provided by examinees have been developed over the last 50 years. Most methods require modifications in test instructions, response modes, and scoring rules. These testing methods are reviewed and the results of a large-scale empirical study of the most promising among them are reported. Seven testing methods were applied to MC tests from four different content areas using a between-persons design. To identify the most efficient methods and the optimal conditions for their application, the results were analyzed with respect to six different criteria. The results showed a surprisingly large tendency on the part of the examinees to take advantage of the special features of the alternative methods and indicated that, on average, high ability examinees were better judges of their level of knowledge and, consequently, could benefit more from these methods. Systematic interactions were found between the testing method and the test content, indicating that no method was uniformly superior.

[1]  D. Budescu,et al.  To Guess or Not to Guess: A Decision‐Theoretic View of Formula Scoring , 1993 .

[2]  Rami Zwick,et al.  Comparing the calibration and coherence of numerical and verbal probability judgments , 1993 .

[3]  Educational and psychological testing : the test taker's outlook , 1993 .

[4]  G. Keren Calibration and probability judgements: Conceptual and methodological issues , 1991 .

[5]  R. Linn Educational measurement, 3rd ed. , 1989 .

[6]  Robert B. Frary,et al.  Partial-Credit Scoring Methods for Multiple-Choice Tests , 1989 .

[7]  Nona Tollefson,et al.  The Impact of Alternative Scoring Procedures for Multiple-Choice Items on Test Reliability, Validity, and Grading , 1988 .

[8]  D. Budescu On the Feasibility of Multiple Matching Tests— Variations on a Theme by Guiliksen , 1988 .

[9]  Richard M. Smith Assessing Partial Knowledge in Vocabulary. , 1987 .

[10]  W. Angoff DOES GUESSING REALLY HELP , 1987 .

[11]  Derar Jaradat,et al.  The Subset Selection Technique for Multiple-Choice Tests: An Empirical Inquiry. , 1986 .

[12]  H. Gulliksen Perspective on Educational Measurement , 1986 .

[13]  The Merits of Multiple-Answer Items as Evaluated by Using Six Scoring Formulas. , 1984 .

[14]  Rand R. Wilcox,et al.  A Simple Model for Diagnostic Testing when There Are Several Types of Misinformation. , 1983 .

[15]  David J. Weiss,et al.  Effect of Examinee Certainty on Probabilistic Test Scores and a Comparison of Scoring Methods for Probabilistic Responses. , 1983 .

[16]  David V. Budescu,et al.  Encoding subjective probabilities: A psychological and psychometric review , 1983 .

[17]  Stephen G. West,et al.  Validity of self-evaluation of ability: A review and meta-analysis , 1982 .

[18]  Timothy Paul Hutchinson Some theories of performance in multiple choice tests, and their implications for variants of the task , 1982 .

[19]  R. Wilcox Some empirical and theoretical results on an answer‐until‐correct scoring procedure† , 1982 .

[20]  B. Fischhoff,et al.  Calibration of probabilities: the state of the art to 1980 , 1982 .

[21]  R. Wilcox SOME NEW RESULTS ON AN ANSWER‐UNTIL‐CORRECT SCORING PROCEDURE , 1982 .

[22]  Dieudonné Leclercq,et al.  Confidence marking: Its use in testing , 1982 .

[23]  Rand R. Wilcox,et al.  Solving Measurement Problems with an Answer-Until-Correct Scoring Procedure , 1981 .

[24]  R. Frary The Effect of Misinformation, Partial Information, and Guessing on Expected Multiple-Choice Test Item Scores , 1980 .

[25]  Ingram Olkin,et al.  A subset selection technique for scoring items on a multiple choice test , 1979 .

[26]  Charles F. Gettys,et al.  Alternative Response and Scoring Methods for Multiple-Choice Items: An Empirical Study of Probabilistic and Ordinal Response Modes , 1978 .

[27]  R. Frary,et al.  AN EMPIRICAL TEST OF LORD'S THEORETICAL RESULTS REGARDING FORMULA SCORING OF MULTIPLE‐CHOICE TESTS , 1977 .

[28]  G. S. Hanna A STUDY OF RELIABILITY AND VALIDITY EFFECTS OF TOTAL AND PARTIAL IMMEDIATE FEEDBACK IN MULTIPLE‐CHOICE TESTING , 1977 .

[29]  Robert Wood Multiple choice: A state of the art report , 1977 .

[30]  G. Echternacht Reliability and Validity of Item Option Weighting Schemes , 1976 .

[31]  Thomas E. Whalen,et al.  A k-sample significance test for independent alpha coefficients , 1976 .

[32]  A. Ralph Hakstian,et al.  A COMPARISON OF SEVERAL METHODS OF ASSESSING PARTIAL KNOWLEDGE IN MULTIPLE-CHOICE TESTS: I. SCORING PROCEDURES* , 1975 .

[33]  G. S. Hanna INCREMENTAL RELIABILITY AND VALIDITY OF MULTIPLE-CHOICE TESTS WITH AN ANSWER-UNTIL-CORRECT PROCEDURE1 , 1975 .

[34]  James J. Diamond,et al.  A PRELIMINARY STUDY OF THE RELIABILITY AND VALIDITY OF A SCORING PROCEDURE BASED UPON CONFIDENCE AND PARTIAL INFORMATION1,2 , 1975 .

[35]  Effects of a Confidence Weighted Scoring System on Measures of Test Reliability and Validity , 1975 .

[36]  Frederic M. Lord,et al.  FORMULA SCORING AND NUMBER-RIGHT SCORING1 , 1975 .

[37]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[38]  Richard M. Evans,et al.  Effect of Self-Scoring Procedures on Test Reliability , 1974 .

[39]  James J. Diamond,et al.  The Correction for Guessing1 , 1973 .

[40]  Kenneth D. Hopkins,et al.  Validity and Reliability Consequences of Confidence Weighting , 1973 .

[41]  Ronald K. Hambleton,et al.  The Effect of Scoring Instructions and Degree of Speededness on the Validity and Reliability of Multiple-Choice Tests1 , 1972 .

[42]  D. Gilman,et al.  INCREASING TEST RELIABILITY THROUGH SELF-SCORING PROCEDURES , 1972 .

[43]  R. Koehler A COMPARISON OF THE VALIDITIES OF CONVENTIONAL CHOICE TESTING AND VARIOUS CONFIDENCE MARKING PROCEDURES , 1971 .

[44]  A SIMPLE CONFIDENCE TESTING FORMAT1 , 1971 .

[45]  Leverne S. Collet Elimination Scoring: An Empirical Evaluation. , 1971 .

[46]  Julian C. Stanley,et al.  Differential Weighting: A Review of Methods and Empirical Studies1 , 1970 .

[47]  J. C. Arnold,et al.  On Scoring Multiple Choice Exams Allowing for Partial Knowledge , 1970 .

[48]  Robert M. Rippey A COMPARISON OF FIVE DIFFERENT SCORING FUNCTIONS FOR CONFIDENCE TESTS1 , 1970 .

[49]  Ronald K. Hambleton,et al.  A COMPARISON OF THE RELIABILITY AND VALIDITY OF TWO METHODS FOR ASSESSING PARTIAL KNOWLEDGE ON A MULTIPLE-CHOICE TEST , 1970 .

[50]  Julian C. Stanley,et al.  Weighting Test Items and Test-Item Options, an Overview of the Analytical and Empirical Literature , 1970 .

[51]  R. Hambleton,et al.  Effects of Promised Reward and Threatened Penalty on Performance of a Multiple-Choice Vocabulary Test , 1969 .

[52]  Leonard S. Feldt,et al.  A test of the hypothesis that cronbach's alpha or kuder-richardson coefficent twenty is the same for two tests , 1969 .

[53]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[54]  Joan J. Michael THE RELIABILITY OF A MULTIPLE-CHOICE EXAMINATION UNDER VARIOUS TEST-TAKING INSTRUCTIONS1 , 1968 .

[55]  C. S. Bernhardson Comparison of the Three-Decision and Conventional Multiple-Choice Tests , 1967, Psychological reports.

[56]  E. H. Shuford,et al.  Admissible probability measurement procedures , 1966, Psychometrika.

[57]  Paul Horst,et al.  Psychological measurement and prediction , 1966 .

[58]  B. deFinetti,et al.  METHODS FOR DISCRIMINATING LEVELS OF PARTIAL KNOWLEDGE CONCERNING A TEST ITEM. , 1965, The British journal of mathematical and statistical psychology.

[59]  C. F. Willey The Three-Decision Multiple-Choice Test: A Method of Increasing the Sensitivity of the Multiple-Choice Item , 1960 .

[60]  Clyde H. Coombs,et al.  The Assessment of Partial Knowledge1 , 1956 .

[61]  John Schmid,et al.  Some Modifications of the Multiple-Choice Item , 1953 .

[62]  H. Gulliksen Theory of mental tests , 1952 .