Ability Estimation Under Different Item Parameterization and Scoring Models

Testing is essential in education and other social science fields because many assessments, decisions, and policies are made according to the results of testing. The purpose of testing is to estimate a person's ability, that is, latent trait or construct. In a test setting, responses to a set of test items by each individual are recorded. Through a scoring scheme, test scores are assigned to individuals according to their item responses. Test scores therefore provide information from which we infer a person's ability. In this study, combinations of different item response theory (IRT) models and dichotomous versus polytomous scoring models were compared. The different IRT models yielded different ability estimates. The polytomous models, in which each of the categories of response is evaluated and scored according to its degree of correctness or the amount of information provided toward the full answer, provided more accurate ability estimates than items that were dichotomously scored.

[1]  Frederic M. Lord,et al.  An Upper Asymptote for the Three-Parameter Logistic Item-Response Model. , 1981 .

[2]  G. Masters,et al.  Rating Scale Analysis. Rasch Measurement. , 1983 .

[3]  R. Darrell Bock,et al.  Fitting a response model forn dichotomously scored items , 1970 .

[4]  F. Samejima Estimation of latent ability using a response pattern of graded scores , 1968 .

[5]  R. Darrell Bock,et al.  The Nominal Categories Model , 1997 .

[6]  G. Masters A rasch model for partial credit scoring , 1982 .

[7]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[8]  Michael D. Mumford,et al.  Methodology Review: Principles, Procedures, and Findings in the Application of Background Data Measures , 1987 .

[9]  David Thissen,et al.  A taxonomy of item response models , 1986 .

[10]  Fritz Drasgow,et al.  The Relation between Incorrect Option Choice and Estimated Ability , 1983 .

[11]  A New Family of Models for the Multiple-Choice Item. , 1979 .

[12]  Effects of Variations in Item Step Values on Item and Test Information in the Partial Credit Model , 1987 .

[13]  G. van Engelenburg On psychometric models for polytomous items with ordered categories within the framework of item response theory , 1997 .

[14]  R. D. Bock,et al.  Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm , 1981 .

[15]  R. Darrell Bock,et al.  Estimating item parameters and latent ability when responses are scored in two or more nominal categories , 1972 .

[16]  Eiji Muraki,et al.  Fitting a Polytomous Item Response Model to Likert-Type Data , 1990 .

[17]  David Andrich,et al.  An extension of the rasch model for ratings providing both location and dispersion parameters , 1982 .

[18]  D. Thissen,et al.  Multiple-Choice Models: The Distractors Are also Part of the Item. , 1989 .

[19]  A comparative item analysis study of a language testing instrument , 1994 .

[20]  David Thissen,et al.  A Response Model for Multiple Choice Items. Psychometric Technical Report No. 1. , 1983 .

[21]  Melvin R. Novick,et al.  Some latent train models and their use in inferring an examinee's ability , 1966 .

[22]  David Thissen,et al.  A response model for multiple choice items , 1984 .

[23]  E. Muraki A GENERALIZED PARTIAL CREDIT MODEL: APPLICATION OF AN EM ALGORITHM , 1992 .

[24]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[25]  J. Neyman,et al.  Consistent Estimates Based on Partially Consistent Observations , 1948 .

[26]  A. Zwinderman,et al.  Robustness of Marginal Maximum Likelihoo Estimation in the Rasch odel , 1990 .

[27]  G. Tutz Sequential item response models with an ordered response , 1990 .

[28]  Julian Davies,et al.  Relative value , 2020, Nature.

[29]  Gerhard Tutz,et al.  Sequential Models for Ordered Responses , 1997 .

[30]  D. Andrich A rating formulation for ordered response categories , 1978 .

[31]  Howard Wainer,et al.  ON THE RELATIVE VALUE OF MULTIPLE‐CHOICE, CONSTRUCTED‐RESPONSE, AND EXAMINEE‐SELECTED ITEMS ON TWO ACHIEVEMENT TESTS1 , 1993 .

[32]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[33]  David Andrich,et al.  Models for measurement, precision, and the nondichotomization of graded responses , 1995 .

[34]  F. Lord A theory of test scores. , 1952 .

[35]  Mark D. Reckase,et al.  Item Response Theory: Parameter Estimation Techniques , 1998 .

[36]  Robert J. Mislevy,et al.  Bayes modal estimation in item response models , 1986 .

[37]  CONCURRENT CALIBRATION OF DICHOTOMOUSLY AND POLYTOMOUSLY SCORED TOEFL ITEMS USING IRT MODELS , 1997 .

[38]  Benjamin D. Wright,et al.  A History of Social Science Measurement , 2005 .

[39]  R. Hambleton,et al.  Handbook of Modern Item Response Theory , 1997 .

[40]  R. J. Mokken,et al.  Handbook of modern item response theory , 1997 .

[41]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[42]  Bas T. Hemker Reversibility Revisited and Other Comparisons of Three Types of Polytomous IRT Models , 2001 .

[43]  E. Muraki A Generalized Partial Credit Model: Application of an EM Algorithm , 1992 .

[44]  Gideon J. Mellenbergh,et al.  Conceptual Notes on Models for Discrete Polytomous Item Responses , 1995 .

[45]  P. Fayers Item Response Theory for Psychologists , 2004, Quality of Life Research.