Computerized Adaptive Testing: The Capitalization on Chance Problem

This paper describes several simulation studies that examine the effects of capitalization on chance in the selection of items and the ability estimation in CAT, employing the 3-parameter logistic model. In order to generate different estimation errors for the item parameters, the calibration sample size was manipulated (N = 500, 1000 and 2000 subjects) as was the ratio of item bank size to test length (banks of 197 and 788 items, test lengths of 20 and 40 items), both in a CAT and in a random test. Results show that capitalization on chance is particularly serious in CAT, as revealed by the large positive bias found in the small sample calibration conditions. For broad ranges of θ, the overestimation of the precision (asymptotic Se) reaches levels of 40%, something that does not occur with the RMSE (θ). The problem is greater as the item bank size to test length ratio increases. Potential solutions were tested in a second study, where two exposure control methods were incorporated into the item selection algorithm. Some alternative solutions are discussed. Se describen varios estudios de simulación para examinar los efectos de la capitalización del azar en la selección de items y la estimación de rasgo en Tests Adaptativos Informatizados (TAI), empleando el modelo logístico de 3 parámetros. Para generar diferentes errores de estimación de los parámetros de los ítems, se manipuló el tamaño de la muestra de calibración (N = 500, 1000 y 2000 sujetos), así como la ratio entre tamaño del banco y longitud del test (bancos de 197 y 788 ítems, longitudes del test de 20 y 40 ítems), ambos tanto en un TAI como en un test aleatorio. Los resultados muestran que la capitalización del azar es especialmente importante en el TAI, donde se obtuvo un sesgo positivo en las condiciones de escaso tamaño de la muestra. Para rangos amplios de θ, la sobrestimación de la precisión (Se asintótico) alcanza niveles del 40%, algo que no ocurre con los valores de RMSE (θ). El problema es mayor a medida que se incrementa la ratio entre el tamaño del banco de ítems y la longitud del test. Varias soluciones fueron puestas a prueba en un segundo estudio, donde se incorporaron dos métodos para el control de la exposición en los algoritmos de selección de los ítems. Se discuten también algunas soluciones alternativas.

[1]  Cornelis A.W. Glas,et al.  Cross-validating item parameter estimation in computerized adaptive testing , 2001 .

[2]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[3]  W. D. Schafer,et al.  Increasing the Homogeneity of CAT's Item‐Exposure Rates by Minimizing or Maximizing Varied Target Functions While Assembling Shadow Tests , 2005 .

[4]  Vicente Ponsoda,et al.  A Comparison of Item Exposure Control Methods in Computerized Adaptive Testing , 1998 .

[5]  Ronald K. Hambleton,et al.  Computerized Adaptive Testing: Theory, Applications, and Standards , 1991 .

[7]  Fritz Drasgow,et al.  Recovery of Two- and Three-Parameter Logistic Item Characteristic Curves: A Monte Carlo Study , 1982 .

[8]  Robert J. Mislevy,et al.  DEALING WITH UNCERTAINTY ABOUT ITEM PARAMETERS: EXPECTED RESPONSE FUNCTIONS , 1994 .

[9]  T. A. Warm Weighted likelihood estimation of ability in item response theory , 1989 .

[11]  J. Barrada Tests adaptativos informatizados: una perspectiva general , 2012 .

[12]  R. Tsutakawa,et al.  The effect of uncertainty of item parameter estimation on ability estimates , 1990 .

[13]  Melvin R. Novick,et al.  Some latent train models and their use in inferring an examinee's ability , 1966 .

[14]  Furong Gao,et al.  Bayesian or Non-Bayesian: A Comparison Study of Item Parameter Estimation in the Three-Parameter Logistic Model , 2005 .

[15]  Anastasios A. Economides,et al.  A Review of Item Exposure Control Strategies for Computerized Adaptive Testing Developed from 1983 to 2005 , 2007 .

[16]  Wim J. van der Linden,et al.  Capitalization on Item Calibration Error in Adaptive Testing , 1998 .

[17]  F. Lord A Broad-Range Tailored Test of Verbal Ability , 1975 .

[18]  W. A. Nicewander,et al.  Some Reliability Estimates for Computerized Adaptive Tests , 1999 .

[19]  G. Gage Kingsbury,et al.  Practical issues in developing and maintaining acomputerized adaptive testing program , 2000 .

[20]  Ronald K. Hambleton,et al.  Influence of Item Parameter Estimation Errors in Test Development. , 1993 .

[21]  B. G. Dodd The Effect of Item Selection Procedure and Stepsize on Computerized Adaptive Attitude Measurement Using the Rating Scale Model , 1990 .

[22]  Ronald K. Hambleton,et al.  Item Parameter Estimation Errors and Their Influence on Test Information Functions , 1994 .

[23]  R. Owen,et al.  A Bayesian Sequential Procedure for Quantal Response in the Context of Adaptive Mental Testing , 1975 .

[24]  Cornelis A.W. Glas,et al.  Cross-Validating Item Parameter Estimation in Adaptive Testing , 2001 .

[25]  Mark D. Reckase,et al.  Item Response Theory: Parameter Estimation Techniques , 1998 .

[26]  Robert J. Mislevy,et al.  BILOG 3 : item analysis and test scoring with binary logistic models , 1990 .

[27]  F. J. Abad,et al.  Deterioro de parámetros de los ítems en tests adaptativos informatizados: estudio con eCAT , 2010 .

[28]  Hua-Hua Chang,et al.  Computerized adaptive testing: a mixture item selection approach for constrained situations. , 2005, The British journal of mathematical and statistical psychology.

[29]  J. Olea,et al.  Varying the Valuating Function and the Presentable Bank in Computerized Adaptive Testing , 2011, The Spanish journal of psychology.

[30]  Ronald K. Hambleton,et al.  Small Sample Estimation in Dichotomous Item Response Models: Effect of Priors Based on Judgmental Information on the Accuracy of Item Parameter Estimates , 2003 .

[31]  Un test adaptativo informatizado para evaluar el conocimiento de inglés escrito: diseño y comprobaciones psicométricas , 2004 .

[32]  Fritz Drasgow,et al.  Item response theory : application to psychological measurement , 1983 .

[33]  Alan M Jette,et al.  Computer adaptive testing improved accuracy and precision of scores over random item selection in a physical functioning item bank. , 2006, Journal of clinical epidemiology.

[34]  R. Lissitz,et al.  Applications of the Analytically Derived Asymptotic Standard Errors of Item Response Theory Item Parameter Estimates , 2004 .

[35]  Robert J. Mislevy,et al.  Bayes modal estimation in item response models , 1986 .