Using Classical Test Theory in Combination with Item Response Theory

This study is about relations between classical test theory (CTT) and item response theory (IRT). It is shown that CTT is based on the assumption that measures are exchangeable, whereas IRT is based on conditional independence. Thus, IRT is presented as an extension of CTT, and concepts from both theories are related to one another. Furthermore, it is demonstrated that IRT can be used to provide CTT statistics in situations where CTT fails. Reliability, for instance, can be determined even though a test was not administered to the intended population.

[1]  J. Rost,et al.  Lehrbuch Testtheorie, Testkonstruktion , 1999 .

[2]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[3]  R. Hambleton,et al.  Item Response Theory , 1984, The History of Educational Measurement.

[4]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[5]  E. Muraki Information Functions of the Generalized Partial Credit Model , 1993 .

[6]  N. Verhelst,et al.  Equivalent linear logistic test models , 2002 .

[7]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[8]  Cees A. W. Glas,et al.  The One Parameter Logistic Model , 1995 .

[9]  R. L. Ebel,et al.  Essentials of educational measurement , 1972 .

[10]  Mark D. Reckase,et al.  The Selection of Test Items for Decision Making with a Computer Adaptive Test. , 1994 .

[11]  W. A. Nicewander Some relationships between the information function of IRT and the signal/noise ratio and reliability coefficient of classical test theory , 1993 .

[12]  Rolf Steyer,et al.  Messen Und Testen , 1993 .

[13]  Patrick Suppes,et al.  When are Probabilistic Explanations Possible , 1981 .

[14]  Gideon J. Mellenbergh,et al.  Measurement precision in test score and item response models , 1996 .

[15]  Frederic M. Lord,et al.  Comparison of IRT True-Score and Equipercentile Observed-Score "Equatings" , 1984 .

[16]  Estimation of Reliability Coefficients Using the Test Information Function and Its Modifications , 1994 .

[17]  Jan-Eric Gustafsson,et al.  The Rasch Model for Dichotomous Items: Theory, Applications and a Computer Program. No. 63. , 1977 .

[18]  Melvin R. Novick,et al.  Some latent train models and their use in inferring an examinee's ability , 1966 .

[19]  I. W. Molenaar,et al.  Rasch models: foundations, recent developments and applications , 1995 .

[20]  Cees A. W. Glas,et al.  Tests of Fit for Polytomous Rasch Models , 1995 .

[21]  M. J. Kolen,et al.  Conditional Standard Errors of Measurement for Scale Scores Using IRT , 1996 .

[22]  G. J. Mellenbergh,et al.  A Unidimensional Latent Trait Model for Continuous Item Responses. , 1994, Multivariate behavioral research.

[23]  E. Muraki A GENERALIZED PARTIAL CREDIT MODEL: APPLICATION OF AN EM ALGORITHM , 1992 .

[24]  Frederic M. Lord,et al.  The relation of the reliability of multiple-choice tests to the distribution of item difficulties , 1952 .

[25]  Mary Pommerich,et al.  Item Response Theory for Scores on Tests Including Polytomous Items with Ordered Responses , 1995 .

[26]  Gerhard H. Fischer,et al.  The Linear Logistic Test Model , 1995 .

[27]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[28]  Howard Wainer,et al.  Computerized Adaptive Testing: A Primer , 2000 .

[29]  Philip Rabinowitz,et al.  Methods of Numerical Integration , 1985 .

[30]  Neil J. Dorans,et al.  Reliability and Measurement Precision , 2000 .

[31]  Charles Lewis,et al.  Estimating the Consistency and Accuracy of Classifications Based on Test Scores , 1993 .

[32]  R. J. Mokken,et al.  A Theory and Procedure of Scale Analysis: With Applications in Political Research , 1971 .

[33]  J. Loevinger,et al.  The technic of homogeneous tests compared with some aspects of scale analysis and factor analysis. , 1948, Psychological bulletin.

[34]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[35]  R. Brennan,et al.  Estimating Consistency and Accuracy Indices for Multiple Classifications , 2002 .