Applying the Mantel-Haenszel Procedure to Complex Samples of Items

This Monte Carlo study examined the effect of complex sampling of items on the measurement of differential item functioning (DIF) using the Mantel-Haenszel procedure. Data were generated using a 3-parameter logistic item response theory model according to the balanced incomplete block (BIB) design used in the National Assessment of Educational Progress (NAEP). The length of each block of items and the number of DIF items in the matching variable were varied, as was the difficulty, discrimination, and presence of DIF in the studied item. Block, booklet, pooled booklet, and extra-information analyses were compared to a complete data analysis using the transformed log-odds on the delta scale. The pooled booklet approach is recommended for use when items are selected for examinees according to a BIB design. This study has implications for DIF analyses of other complex samples of items, such as computer administered testing or another complex assessment design. One important issue in educational measurement is to identify items that function differently for subgroups. Such items are said to have differential item functioning (DIF). DIF studies compare the relative performance of the group of interest (the focal group) to that of a comparison or reference group. The Mantel-Haenszel (MH) procedure (Holland & Thayer, 1988; Mantel & Haenszel, 1959) matches the groups on some measure of performance. In usual DIF applications of MH, this matching variable is the total score on the test. For each of the K levels of the matching variable, MH forms a 2 x 2 table, which is shown in Table 1. Tk is the total number of examinees at level k, nRk and nFk are the numbers of reference and focal group members, mlk is the number of examinees who answered the studied item correctly, and mOk is the number who missed the item.

[1]  W. Haenszel,et al.  Statistical aspects of the analysis of data from retrospective studies of disease. , 1959, Journal of the National Cancer Institute.

[2]  Seock-Ho Kim,et al.  A Comparison of Two Area Measures for Detecting Differential Item Functioning , 1991 .

[3]  R. D. Bock,et al.  An Item Response Curve Model for Matrix-Sampling Data: The California Grade-Three Assessment. , 1981 .

[4]  Melvin R. Novick,et al.  Some latent train models and their use in inferring an examinee's ability , 1966 .

[5]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[6]  H. Swaminathan,et al.  Detecting Differential Item Functioning Using Logistic Regression Procedures , 1990 .

[7]  Eugene G. Johnson The NAEP 1990 Technical Report. , 1992 .

[8]  K. Ercikan,et al.  Analysis of Differential Item Functioning in the NAEP History Assessment , 1988 .

[9]  C. Lewis,et al.  Using Bayesian Decision Theory to Design a Computerized Mastery Test , 1990 .

[10]  Nambury S. Raju,et al.  The area between two item characteristic curves , 1988 .

[11]  Nancy L. Allen,et al.  Thin Versus Thick Matching in the Mantel-Haenszel Procedure for Detecting DIF , 1993 .

[12]  Dorothy T. Thayer,et al.  A SIMULATION STUDY OF METHODS FOR ASSESSING DIFFERENTIAL ITEM FUNCTIONING IN COMPUTER‐ADAPTIVE TESTS , 1993 .

[13]  Dorothy T. Thayer,et al.  Differential Item Performance and the Mantel-Haenszel Procedure. , 1986 .

[14]  William Stout,et al.  A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF , 1993 .

[15]  R. Zwick When Do Item Response Function and Mantel-Haenszel Definitions of Differential Item Functioning Coincide? , 1990 .

[16]  H. Wainer,et al.  Toward a Psychometrics for Testlets , 1989 .