Implications of the Multidimensionality-Based DIF Analysis Framework for Selecting a Matching and Studied Subtest

In this paper we describe and illustrate the Roussos-Stout (1996) multidimensionality-based DIF analysis framework, with emphasis on its implication for the selection of a matching and studied subtest for DIF analyses. Standard DIF practice encourages an exploratory search for matching subtest items based on purely statistical criteria, such as a failure to display DIF. By contrast, the multidimensional DIF framework emphasizes a substantively-informed selection of items for both the matching and studied subtest based on the dimensions suspected of underlying the test data. Using two examples, we demonstrate that these two approaches lead to different interpretations about the occurrence of DIF in a test. It is argued that selecting a valid matching and studied subtest, as implied by the multidimensional framework, can lead to a more informed understanding of why DIF occurs. Bias occurs when tests yield scores or promote score interpretations that result in different meanings for members of different groups. Bias is often attributed to construct-irrelevant dimensions that differentially affect the test scores for different groups of examinees (Standards for Educational and Psychological Testing, 1999). Group differences can also be attributed to item impact. Impact occurs when construct-relevant dimensions differentially affect the tests scores for different groups of examinees. In this case, the item is a relevant measure of the target construct and the difference between the groups reflects a true difference on that construct. Differential item functioning (DIF) studies are designed to identify and interpret these construct-related dimensions using a combination of statistical and substantive analyses. The statistical analysis involves administering the test, matching members of the reference and focal group on a measure of ability derived from that test, and using statistical procedures to identify group differences on test items. An item exhibits DIF when examinees from the reference and focal groups differ in the probability of answering that item correctly, after controlling for ability. The substantive analysis builds on the statistical analysis because DIF items are often scrutinized by expert reviewers (e.g., test developers or content specialists) who attempt to identify the construct-related dimensions that produce group differences. A DIF item is considered biased when reviewers identify some dimension, deemed to be irrelevant to the construct measured by the test, that places one group of examinees at a disadvantage. Conversely, a DIF item displays impact when the dimension that differentiates the groups is judged to be relevant to the construct measured by the test. Considerable …

[1]  M. Casey,et al.  The Influence of Spatial Ability on Gender Differences in Mathematics College Entrance Test Scores across Diverse Samples. , 1995 .

[2]  A. Coughlan,et al.  An Empirical Investigation * , 2002 .

[3]  Brian E. Clauser,et al.  Using logistic regression and the Mantel-Haenszel with multiple ability estimates to detect differential item functioning. , 1995 .

[4]  D. Halpern,et al.  Sex differences in intelligence. Implications for education. , 1997, The American psychologist.

[5]  Daniel Bolt,et al.  DIFFERENTIAL ITEM FUNCTIONING: ITS MULTIDIMENSIONAL MODEL AND RESULTING SIBTEST DETECTION PROCEDURE , 1996 .

[6]  Terry A. Ackerman A Didactic Explanation of Item Bias, Item Impact, and Item Validity from a Multidimensional Perspective , 1992 .

[7]  Kathleen A. O'Neill,et al.  Item and test characteristics that are associated with differential item functioning. , 1993 .

[8]  William Stout,et al.  A Multidimensionality-Based DIF Analysis Paradigm , 1996 .

[9]  Mark J. Gierl,et al.  Illustrating the Utility of Differential Bundle Functioning Analyses To Identify and Interpret Group Differences on Achievement Tests. , 2005 .

[10]  Mark J. Gierl,et al.  Identifying Content and Cognitive Skills that Produce Gender Differences in Mathematics: A Demonstration of the Multidimensionality‐Based DIF Analysis Paradigm , 2003 .

[11]  B. Plake A Comparison of a Statistical and Subjective Procedure to Ascertain Item Validity: One Step in the Test Validation Process , 1980 .

[12]  Jeffrey A Douglas,et al.  Item-Bundle DIF Hypothesis Testing: Identifying Suspect Bundles and Assessing Their Differential Functioning , 1996 .

[13]  F. Kok,et al.  Item Bias and Test Multidimensionality , 1988 .

[14]  Ratna Nandakumar,et al.  MULTISIB: A Procedure to Investigate DIF When a Test is Intentionally Two-Dimensional , 1997 .

[15]  Ronald K. Hambleton,et al.  Identifying the causes of DIF in translated verbal items , 1999 .

[16]  G. Engelhard,et al.  Accuracy of Bias Review Judges in Identifying Differential Item Functioning on Teacher Certification Tests , 1990 .

[17]  Ann V. McGillicuddy-De Lisi,et al.  Gender differences in advanced mathematical problem solving. , 2000, Journal of experimental child psychology.

[18]  P. Holland,et al.  DIF DETECTION AND DESCRIPTION: MANTEL‐HAENSZEL AND STANDARDIZATION1,2 , 1992 .

[19]  Brian E. Clauser,et al.  Using Statistical Procedures to Identify Differentially Functioning Test Items , 2005 .

[20]  William Stout,et al.  A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF , 1993 .

[21]  S. Natasha Beretvas,et al.  An empirical investigation demonstrating the multidimensional DIF paradigm: A cognitive explanation for DIF , 2001 .

[22]  Neil J. Dorans,et al.  ASSESSING UNEXPECTED DIFFERENTIAL ITEM PERFORMANCE OF FEMALE CANDIDATES ON SAT AND TSWE FORMS ADMINISTERED IN DECEMBER 1977: AN APPLICATION OF THE STANDARDIZATION APPROACH1 , 1983 .

[23]  W. Stout Psychometrics: From practice to theory and back , 2002 .

[24]  Hua-Hua Chang,et al.  Detecting DIF for Polytomously Scored Items: An Adaptation of the SIBTEST Procedure , 1995 .

[25]  H. Wainer,et al.  Differential Item Functioning. , 1994 .

[26]  L. Shepard,et al.  Methods for Identifying Biased Test Items , 1994 .

[27]  Mark J. Gierl,et al.  Identifying Sources of Differential Item and Bundle Functioning on Translated Achievement Tests: A Confirmatory Analysis , 2001 .

[28]  Dorothy T. Thayer,et al.  DIFFERENTIAL ITEM FUNCTIONING AND THE MANTEL‐HAENSZEL PROCEDURE , 1986 .

[29]  Richard R. Tolman,et al.  Empirical versus subjective procedures for identifying gender differences in science test items , 1993 .

[30]  Michael J. Zieky,et al.  Practical questions in the use of DIF statistics in test development. , 1993 .

[31]  Ratna Nandakumar,et al.  Simultaneous DIF Amplification and Cancellation: Shealy-Stout's Test for DIF , 1993 .

[32]  Howard T. Everson,et al.  Methodology Review: Statistical Approaches for Assessing Measurement Bias , 1993 .

[33]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[34]  Nambury S. Raju,et al.  Differential Bundle Functioning Using the DFIT Framework: Procedures for Identifying Possible Sources of Differential Functioning , 1998 .

[35]  W. H. Angoff,et al.  Perspectives on differential item functioning methodology. , 1993 .

[36]  R. P. McDonald,et al.  A Basis for Multidimensional Item Response Theory , 2000 .