A Review of ETS Differential Item Functioning Assessment Procedures: Flagging Rules, Minimum Sample Size Requirements, and Criterion Refinement

Differential item functioning (DIF) analysis is a key component in the evaluation of the fairness and validity of educational tests. The goal of this project was to review the status of ETS DIF analysis procedures, focusing on three aspects: (a) the nature and stringency of the statistical rules used to flag items, (b) the minimum sample size requirements that are currently in place for DIF analysis, and (c) the efficacy of criterion refinement. The main findings of the review are as follows: • The ETS C rule often displays low DIF detection rates even when samples are large. • With improved flagging rules in place, minimum sample size requirements could probably be relaxed. In addition, updated rules for combining data across administrations could allow DIF analyses to be performed in a broader range of situations. • Refinement of the matching criterion improves detection rates when DIF is primarily in one direction but can depress detection rates when DIF is balanced. If nothing is known about the likely pattern of DIF, refinement is advisable. Each of these findings is discussed in detail, focusing on the case of dichotomous items.

[1]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[2]  G. Camilli,et al.  Comparison of the Mantel-Haenszel Test With a Randomized and a Jackknife Test for Detecting Biased Items , 1990 .

[3]  R. Zwick When Do Item Response Function and Mantel-Haenszel Definitions of Differential Item Functioning Coincide? , 1990 .

[4]  Cynthia G. Parshall,et al.  Exact Versus Asymptotic Mantel-Haenszel DIF Statistics: A Comparison of Performance Under Small-Sample Conditions , 1995 .

[5]  Insu Paek Conservativeness in Rejection of the Null Hypothesis When Using the Continuity Correction in the MH Chi-Square Test in DIF Applications , 2010 .

[6]  J. Muñiz,et al.  Utility of the Mantel-Haenszel Procedure for Detecting Differential Item Functioning in Small Samples , 2004 .

[7]  W. Haenszel,et al.  Statistical aspects of the analysis of data from retrospective studies of disease. , 1959, Journal of the National Cancer Institute.

[8]  R. Hambleton,et al.  The Effects of Purification of Matching Criterion on the Identification of DIF Using the Mantel-Haenszel Procedure , 1993 .

[9]  Rebecca Zwick,et al.  An Empirical Bayes Approach to Mantel-Haenszel DIF Analysis. , 1999 .

[10]  Neil J. Dorans,et al.  Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. , 1986 .

[11]  Susan J. Maller,et al.  Iterative Purification and Effect Size Use With Logistic Regression for Differential Item Functioning Detection , 2007 .

[12]  N. Dorans,et al.  The Standardization Approach to Assessing Comprehensive Differential Item Functioning , 1992 .

[13]  Steven P. Isham,et al.  Improving Mantel–Haenszel DIF Estimation Through Bayesian Updating , 2012 .

[14]  Rebecca Zwick,et al.  Using Loss Functions for DIF Detection: An Empirical Bayes Approach , 2000 .

[15]  Dorothy T. Thayer,et al.  Differential Item Performance and the Mantel-Haenszel Procedure. , 1986 .

[16]  H. V. D. Flier,et al.  DETECTING EXPERIMENTALLY INDUCED ITEM BIAS USING THE ITERATIVE LOGIT METHOD , 1985 .

[17]  D. Bartram,et al.  Empirical Bayes Versus Standard Mantel-Haenszel Statistics for Detecting Differential Item Functioning Under Small Sample Conditions , 2007 .

[18]  Dorothy T. Thayer,et al.  DIFFERENTIAL ITEM FUNCTIONING AND THE MANTEL‐HAENSZEL PROCEDURE , 1986 .

[19]  Paul W. Holland,et al.  An Alternate Definition of the ETS Delta Scale of Item Difficulty. Program Statistics Research. , 1985 .

[20]  Dorothy T. Thayer,et al.  Application of an Empirical Bayes Enhancement of Mantel-Haenszel Differential Item Functioning Analysis to a Computerized Adaptive Test , 2002 .

[21]  Edwin O. Blew,et al.  Using Past Data to Enhance Small Sample DIF Estimation: A Bayesian Approach , 2009 .

[22]  Wen-Chung Wang,et al.  Effects of Average Signed Area Between Two Item Characteristic Curves and Test Purification Procedures on the DIF Detection via the Mantel-Haenszel Method , 2004 .

[23]  P. Holland,et al.  DIF DETECTION AND DESCRIPTION: MANTEL‐HAENSZEL AND STANDARDIZATION1,2 , 1992 .

[24]  B. French Iterative purification and effect size use with logistic regression for DIF detection , 2003 .

[25]  R. Millsap,et al.  Factors Influencing the Mantel-Haenszel Procedure the Detection of Differential Item Functioning , 1994 .