A General Bayesian Model for Testlets: Theory and Applications

The need for more realistic and richer forms of assessment in educational tests has led to the inclusion (in many tests) of polytomously scored items, multiple items based on a single stimulus (a “testlet”), and the increased use of a generalized mixture of binary and polytomous item formats. In this paper, the authors extend earlier work on the modeling of testlet-based response data to include the situation in which a test is composed, partially or completely, of polytomously scored items and/or testlets. The model they propose, a modified version of commonly employed item response models, is embedded within a fully Bayesian framework, and inferences under the model are obtained using Markov chain Monte Carlo techniques. The authors demonstrate its use within a designed series of simulations and by analyzing operational data from the North Carolina Test of Computer Skills and the Educational Testing Service’s Test of Spoken English. Their empirical findings suggest that the North Carolina Test of Computer Skills exhibits significant testlet effects, indicating significant dependence of item scores obtained from common stimuli, whereas the Test of Spoken English does not.

[1]  F. Lord A theory of test scores. , 1952 .

[2]  Eric T. Bradlow,et al.  A Bayesian random effects model for testlets , 1999 .

[3]  Fritz Drasgow,et al.  Appropriateness measurement with polychotomous item response models and standardized indices , 1984 .

[4]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[5]  Michael V. Levine,et al.  Optimal appropriateness measurement , 1988 .

[6]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[7]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[8]  David Thissen,et al.  Trace Lines for Testlets: A Use of Multiple-Categorical-Response Models. , 1989 .

[9]  F. Samejima Estimation of latent ability using a response pattern of graded scores , 1968 .

[10]  H. Wainer Computerized adaptive testing: A primer, 2nd ed. , 2000 .

[11]  Howard Wainer,et al.  Item Clusters and Computerized Adaptive Testing: A Case for Testlets , 1987 .

[12]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[13]  William Stout,et al.  The theoretical detect index of dimensionality and its application to approximate simple structure , 1999 .

[14]  Adrian F. M. Smith,et al.  Bayesian computation via the gibbs sampler and related markov chain monte carlo methods (with discus , 1993 .

[15]  Eric T. Bradlow,et al.  A hierarchical latent variable model for ordinal data from a customer satisfaction survey with no answer responses , 1999 .

[16]  R. Darrell Bock,et al.  Estimating item parameters and latent ability when responses are scored in two or more nominal categories , 1972 .

[17]  Howard Wainer,et al.  Precision and Differential Item Functioning on a Testlet-Based Test: The 1991 Law School Admissions Test as an Example , 1995 .

[18]  Howard Wainer,et al.  MAKING ESSAY TEST SCORES FAIRER WITH STATISTICS , 1982 .

[19]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[20]  Howard Wainer,et al.  Testlet Response Theory: An Analog for the 3PL Model Useful in Testlet-Based Adaptive Testing , 2000 .

[21]  Christine E. DeMars,et al.  Item Response Theory , 2010, Assessing Measurement Invariance for Applied Research.

[22]  William Stout,et al.  A nonparametric approach for assessing latent trait unidimensionality , 1987 .

[23]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[24]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[25]  Melvin R. Novick,et al.  Some latent train models and their use in inferring an examinee's ability , 1966 .