Extensions of Mantel–Haenszel for Multilevel DIF Detection

Multilevel data structures are ubiquitous in the assessment of differential item functioning (DIF), particularly in large-scale testing programs. There are a handful of DIF procures for researchers to select from that appropriately account for multilevel data structures. However, little, if any, work has been completed to extend a popular DIF method to this case. Thus, the primary goal of this study was to introduce and investigate the effectiveness of several new options for DIF assessment in the presence of multilevel data with the Mantel–Haenszel (MH) procedure, a popular, flexible, and effective tool for DIF detection. The performance of these new methods was compared with the standard MH technique through a simulation study, where data were simulated in a multilevel framework, corresponding to examinees nested in schools, for example. The standard MH test for DIF detection was employed, along with several multilevel extensions of MH. Results demonstrated that these multilevel tests proved to be preferable to standard MH in a wide variety of cases where multilevel data were present, particularly when the intraclass correlation was relatively large. Implications of this study for practice and future research are discussed.

[1]  Jun Yu Li,et al.  Effects of Unequal Ability Variances on the Performance of Logistic Regression, Mantel-Haenszel, SIBTEST IRT, and IRT Likelihood Ratio for DIF Detection , 2010 .

[2]  B. Muthén,et al.  Multilevel Covariance Structure Analysis , 1994 .

[3]  Y. F. Cheong Analysis of School Context Effects on Differential Item Functioning Using Hierarchical Generalized Linear Models , 2006 .

[4]  Cora J. M. Maas,et al.  The Accuracy of Multilevel Structural Equation Modeling With Pseudobalanced Groups and Small Samples , 2001 .

[5]  H. Swaminathan,et al.  Identification of Items that Show Nonuniform DIF , 1996 .

[6]  Hariharan Swaminathan,et al.  Performance of the Mantel-Haenszel and Simultaneous Item Bias Procedures for Detecting Differential Item Functioning , 1993 .

[7]  Kenneth A. Bollen,et al.  Monte Carlo Experiments: Design and Implementation , 2001 .

[8]  M D Begg,et al.  Analyzing k (2 × 2) Tables Under Cluster Sampling , 1999, Biometrics.

[9]  Assessing Impact, DIF, and DFF in Accommodated Item Scores , 2012 .

[10]  D D Boos,et al.  Mantel-Haenszel test statistics for correlated binary data. , 1997, Biometrics.

[11]  William Stout,et al.  Simulation Studies of the Effects of Small Sample Size and Studied Item Parameters on SIBTEST and Mantel‐Haenszel Type I Error Performance , 1996 .

[12]  J. Hox,et al.  Sufficient Sample Sizes for Multilevel Modeling , 2005 .

[13]  Allan S. Cohen,et al.  A Multilevel Mixture IRT Model With an Application to DIF , 2010 .

[14]  Dorothy T. Thayer,et al.  Differential Item Performance and the Mantel-Haenszel Procedure. , 1986 .

[15]  Natasha J. Williams,et al.  DIF Identification Using HGLM for Polytomous Items , 2006 .

[16]  Holmes Finch,et al.  The MIMIC Model as a Method for Detecting DIF: Comparison With Mantel-Haenszel, SIBTEST, and the IRT Likelihood Ratio , 2005 .

[17]  D. Bartram,et al.  Empirical Bayes Versus Standard Mantel-Haenszel Statistics for Detecting Differential Item Functioning Under Small Sample Conditions , 2007 .

[18]  Howard T. Everson,et al.  Methodology Review: Statistical Approaches for Assessing Measurement Bias , 1993 .

[19]  Insu Paek,et al.  Accuracy of DIF Estimates and Power in Unbalanced Designs Using the Mantel–Haenszel DIF Detection Procedure , 2011 .

[20]  L. Shepard,et al.  Methods for Identifying Biased Test Items , 1994 .

[21]  R. Hanka The Handbook of Research Synthesis , 1994 .

[22]  Bengt Muthen,et al.  10. Latent Variable Modeling of Longitudinal and Multilevel Data , 1997 .

[23]  Anthony S. Bryk,et al.  Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .

[24]  Akihito Kamata,et al.  Evaluation of Model Selection Strategies for Cross-Level Two-Way Differential Item Functioning Analysis , 2012 .

[25]  S. Raudenbush,et al.  Assessing Direct and Indirect Effects in Multilevel Designs with Latent Variables , 1999 .

[26]  W. Haenszel,et al.  Statistical aspects of the analysis of data from retrospective studies of disease. , 1959, Journal of the National Cancer Institute.

[27]  L. Hedges,et al.  Intraclass Correlation Values for Planning Group-Randomized Trials in Education , 2007 .

[28]  Mary Pommerich Demonstrating the Utility of a Multilevel Model in the Assessment of Differential Item Functioning. , 1995 .

[29]  A. Satorra,et al.  Complex Sample Data in Structural Equation Modeling , 1995 .

[30]  L. Hedges,et al.  The Handbook of Research Synthesis , 1995 .

[31]  W. Holmes Finch,et al.  Hierarchical Logistic Regression: Accounting for Multilevel Data in DIF Detection , 2010 .

[32]  W. Holmes Finch,et al.  Estimation of MIMIC Model Parameters with Multilevel Data , 2011 .

[33]  Nambury S. Raju,et al.  The area between two item characteristic curves , 1988 .