RIM: A random item mixture model to detect Differential Item Functioning

In this paper we present a new methodology for detecting differential item functioning (DIF). We introduce a DIF model, called the random item mixture (RIM), that is based on a Rasch model with random item difficulties (besides the common random person abilities). In addition, a mixture model is assumed for the item difficulties such that the items may belong to one of two classes: a DIF or a non-DIF class. The crucial difference between the DIF class and the non-DIF class is that the item difficulties in the DIF class may differ according to the observed person groups while they are equal across the person groups for the items from the non-DIF class. Statistical inference for the RIM is carried out in a Bayesian framework. The performance of the RIM is evaluated using a simulation study in which it is compared with traditional procedures, like the likelihood ratio test, the Mantel-Haenszel procedure and the standardizedp-DIF procedure. In this comparison, the RIM performs better than the other methods. Finally, the usefulness of the model is also demonstrated on a real life data set.

[1]  Allan S. Cohen,et al.  A Mixture Model Analysis of Differential Item Functioning , 2005 .

[2]  Francis Tuerlinckx,et al.  A double-structure structural equation model for three-mode data. , 2008, Psychological methods.

[3]  Ann A. O'Connell,et al.  Multilevel modeling of educational data , 2008 .

[4]  F. B. Gonçalves,et al.  An Integrated Bayesian Model for DIF Analysis , 2009 .

[5]  Juana Gómez-Benito,et al.  Effects of Ability Scale Purification on the Identification of dif , 2002 .

[6]  M. Tanner Tools for statistical inference: methods for the exploration of posterior distributions and likeliho , 1994 .

[7]  J. Sandoval,et al.  Accuracy of Judgments of WISC-R Item Difficulty for Minority Groups. , 1980 .

[8]  Wen-Chung Wang,et al.  Effects of Average Signed Area Between Two Item Characteristic Curves and Test Purification Procedures on the DIF Detection via the Mantel-Haenszel Method , 2004 .

[9]  Francis Tuerlinckx,et al.  A Hierarchical IRT Model for Criterion-Referenced Measurement , 2000 .

[10]  Paul De Boeck,et al.  Random Item IRT Models , 2008 .

[11]  F Tuerlinckx,et al.  The effect of ignoring item interactions on the estimated discrimination parameters in item response theory. , 2001, Psychological methods.

[12]  David C. Geary,et al.  Sex Differences in Spatial Abilities Among Adults from the United States and China Implications for Evolutionary Theory , 2001 .

[13]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[14]  Allan S. Cohen,et al.  A Mixture Item Response Model for Multiple-Choice Data , 2001 .

[15]  Scott E. Maxwell,et al.  Designing Experiments and Analyzing Data: A Model Comparison Perspective , 1990 .

[16]  T. Snijders Models for longitudinal network datain , 2005 .

[17]  A. Brix Bayesian Data Analysis, 2nd edn , 2005 .

[18]  Terry A. Ackerman A Didactic Explanation of Item Bias, Item Impact, and Item Validity from a Multidimensional Perspective , 1992 .

[19]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[20]  Bruno D. Zumbo,et al.  Three Generations of DIF Analyses: Considering Where It Has Been, Where It Is Now, and Where It Is Going , 2007 .

[21]  Jürgen Rost,et al.  Rasch Models in Latent Classes: An Integration of Two Approaches to Item Analysis , 1990 .

[22]  Ronald Cools,et al.  Numerical integration in logistic-normal models , 2006, Comput. Stat. Data Anal..

[23]  R. Hambleton,et al.  The Effects of Purification of Matching Criterion on the Identification of DIF Using the Mantel-Haenszel Procedure , 1993 .

[24]  G. Engelhard,et al.  Accuracy of Bias Review Judges in Identifying Differential Item Functioning on Teacher Certification Tests , 1990 .

[25]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[26]  Holmes Finch,et al.  The MIMIC Model as a Method for Detecting DIF: Comparison With Mantel-Haenszel, SIBTEST, and the IRT Likelihood Ratio , 2005 .

[27]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[28]  Howard Wainer,et al.  Use of item response theory in the study of group differences in trace lines. , 1988 .

[29]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[30]  G. J. Mellenbergh Contingency Table Models for Assessing Item Bias , 1982 .

[31]  F. Huffer,et al.  A Bayesian Approach for Fitting a Random Effect Differential Item Functioning Across Group Units , 2006 .

[32]  M. R. Novick,et al.  The Role of Exchangeability in Inference , 1981 .

[33]  David Thissen,et al.  Beyond group-mean differences: The concept of item bias. , 1986 .

[34]  Howard Wainer,et al.  How Is Reliability Related to the Quality of Test Scores? What Is the Effect of Local Dependence on Reliability? , 1998 .

[35]  Neil J. Dorans,et al.  Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. , 1986 .

[36]  Howard Wainer,et al.  Item Clusters and Computerized Adaptive Testing: A Case for Testlets , 1987 .

[37]  M. Meulders,et al.  Cross-Classification Multilevel Logistic Models in Psychometrics , 2003 .

[38]  B. Plake A Comparison of a Statistical and Subjective Procedure to Ascertain Item Validity: One Step in the Test Validation Process , 1980 .

[39]  Eric T. Bradlow,et al.  A Bayesian random effects model for testlets , 1999 .

[40]  Susan D. Voyer,et al.  Magnitude of sex differences in spatial abilities: a meta-analysis and consideration of critical variables. , 1995, Psychological bulletin.

[41]  Gregory L. Candell,et al.  An Iterative Procedure for Linking Metrics and Assessing Item Bias in Item Response Theory , 1988 .

[42]  H. Wainer,et al.  Differential Item Functioning. , 1994 .

[43]  G. J. Mellenbergh,et al.  Effects of Amount of DIF, Test Length, and Purification Type on Robustness and Power of Mantel-Haenszel Procedures , 2000 .