Random Item IRT Models

It is common practice in IRT to consider items as fixed and persons as random. Both, continuous and categorical person parameters are most often random variables, whereas for items only continuous parameters are used and they are commonly of the fixed type, although exceptions occur. It is shown in the present article that random item parameters make sense theoretically, and that in practice the random item approach is promising to handle several issues, such as the measurement of persons, the explanation of item difficulties, and trouble shooting with respect to DIF. In correspondence with these issues, three parts are included. All three rely on the Rasch model as the simplest model to study, and the same data set is used for all applications. First, it is shown that the Rasch model with fixed persons and random items is an interesting measurement model, both, in theory, and for its goodness of fit. Second, the linear logistic test model with an error term is introduced, so that the explanation of the item difficulties based on the item properties does not need to be perfect. Finally, two more models are presented: the random item profile model (RIP) and the random item mixture model (RIM). In the RIP, DIF is not considered a discrete phenomenon, and when a robust regression approach based on the RIP difficulties is applied, quite good DIF identification results are obtained. In the RIM, no prior anchor sets are defined, but instead a latent DIF class of items is used, so that posterior anchoring is realized (anchoring based on the item mixture). It is shown that both approaches are promising for the identification of DIF.

[1]  G. H. Fischer,et al.  The linear logistic test model as an instrument in educational research , 1973 .

[2]  Jun Lu,et al.  Signal Detection Models with Random Participant and Item Effects , 2007 .

[3]  P. Rousseeuw,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[4]  J. Templin,et al.  Skills Diagnosis Using IRT-Based Latent Class Models , 2007 .

[5]  Wells HivelyII,et al.  A “UNIVERSE‐DEFINED” SYSTEM OF ARITHMETIC ACHIEVEMENT TESTS1 , 1968 .

[6]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[7]  Grant Henning,et al.  Linguistic and cultural bias in language proficiency tests , 1985 .

[8]  H. Swaminathan,et al.  Detecting Differential Item Functioning Using Logistic Regression Procedures , 1990 .

[9]  Isaac I. Bejar,et al.  A FEASIBILITY STUDY OF ON‐THE‐FLY ITEM GENERATION IN ADAPTIVE TESTING , 2002 .

[10]  M. Meulders,et al.  A conceptual and psychometric framework for distinguishing categories and dimensions. , 2005, Psychological review.

[11]  Ronald J. M. M. Does,et al.  A stochastic growth model applied to repeated tests of academic knowledge , 1989 .

[12]  P. Boeck,et al.  Explanatory item response models : a generalized linear and nonlinear approach , 2004 .

[13]  J. Raaijmakers,et al.  How to deal with "The language-as-fixed-effect fallacy": Common misconceptions and alternative solutions. , 1999 .

[14]  Victoria Savalei,et al.  Logistic Approximation to the Normal: The KL Rationale , 2006 .

[15]  G. Verbeke,et al.  Statistical inference in generalized linear mixed models: a review. , 2006, The British journal of mathematical and statistical psychology.

[16]  Dorothy T. Thayer,et al.  Differential Item Performance and the Mantel-Haenszel Procedure. , 1986 .

[17]  Francis Tuerlinckx,et al.  A Hierarchical IRT Model for Criterion-Referenced Measurement , 2000 .

[18]  W. James Popham,et al.  Criterion-Referenced Measurement , 1971 .

[19]  Roel Bosker,et al.  Multilevel analysis : an introduction to basic and advanced multilevel modeling , 1999 .

[20]  Aeilko H. Zwinderman,et al.  A generalized rasch model for manifest predictors , 1991 .

[21]  Isaac I. Bejar,et al.  A Generative Approach to Psychological and Educational Measurement. , 1991 .

[22]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[23]  R. D. Bock,et al.  Adaptive EAP Estimation of Ability in a Microcomputer Environment , 1982 .

[24]  H. Wainer,et al.  Differential Item Functioning. , 1994 .

[25]  E. B. Coleman Generalizing to a Language Population , 1964 .

[26]  L. Shepard,et al.  Methods for Identifying Biased Test Items , 1994 .

[27]  Derek C. Briggs,et al.  Generalizability in Item Response Modeling , 2007 .

[28]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[29]  Susan P. Homan,et al.  The Validity of Item Bias Techniques with Math Word Problems , 1984 .

[30]  J. Schepers,et al.  Models with item and item group predictors , 2004 .

[31]  Raymond J. Adams,et al.  Multilevel Item Response Models: An Approach to Errors in Variables Regression , 1997 .

[32]  H. H. Clark The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. , 1973 .

[33]  W. H. Angoff,et al.  ITEM‐RACE INTERACTION ON A TEST OF SCHOLASTIC APTITUDE1 , 1973 .

[34]  David M. Williamson,et al.  Calibrating Item Families and Summarizing the Results Using Family Expected Response Functions , 2003 .

[35]  Ronald J. M. M. Does,et al.  Approximations of Normal IRT Models for Change , 1999 .

[36]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[37]  Francis Tuerlinckx,et al.  A crossed random effects model to detect differential item functioning , 2008 .

[38]  Wim J. van der Linden,et al.  Computerized Adaptive Testing With Item Cloning , 2003 .

[39]  Neil J. Dorans,et al.  Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. , 1986 .

[40]  Sandip Sinharay,et al.  Calibration of Polytomous Item Families Using Bayesian Hierarchical Modeling , 2005 .

[41]  K. McGraw,et al.  Forming inferences about some intraclass correlation coefficients. , 1996 .

[42]  Susan E. Embretson,et al.  Generating items during testing: Psychometric issues and models , 1999 .

[43]  M. Meulders,et al.  Cross-Classification Multilevel Logistic Models in Psychometrics , 2003 .

[44]  Erling B. Andersen,et al.  Discrete Statistical Models with Social Science Applications. , 1980 .

[45]  L. Tierney,et al.  Accurate Approximations for Posterior Moments and Marginal Densities , 1986 .

[46]  David M. Williams,et al.  VALIDITY OF APPROXIMATION TECHNIQUES FOR DETECTING ITEM BIAS , 1985 .

[47]  Wen-Chung Wang,et al.  Effects of Anchor Item Methods on the Detection of Differential Item Functioning Within the Family of Rasch Models , 2004 .

[48]  W. Haenszel,et al.  Statistical aspects of the analysis of data from retrospective studies of disease. , 1959, Journal of the National Cancer Institute.

[49]  Sun-Joo Cho,et al.  Explanatory Item Response Models , 2004 .

[50]  David Thissen,et al.  Beyond group-mean differences: The concept of item bias. , 1986 .

[51]  Howard T. Everson,et al.  Methodology Review: Statistical Approaches for Assessing Measurement Bias , 1993 .

[52]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .