Modeling differential item functioning with group-specific item parameters: A computerized adaptive testing application

Many important decisions are made based on the results of tests administered under different conditions in the fields of educational and psychological testing. Inaccurate inferences are often made if the property of measurement invariance (MI) is not assessed across these conditions. The importance of MI is even greater when test respondents are compared based on their responses to different items, such as the case in computerized adaptive testing (CAT), because the existence of items that exhibit differential item functioning (DIF) can produce bias within a group as well as between groups. This article demonstrates a straightforward psychometric method for conducting a test of measurement invariance (MI) and illustrates a method for modeling DIF by assigning group-specific item parameters in the framework of IRT. The article exemplifies two applications of the method for a CAT used in a high stakes international organizational assessment context. These examples pertain to context effects due to the test administration method (computer based linear test vs. CAT), and the context effects due to language in a CAT

[1]  T. Taris,et al.  Assessing Stability and Change of Psychometric Properties of Multi-Item Concepts Across Different Situations: A General Approach , 1998 .

[2]  David Matsumoto,et al.  Cross-Cultural Research Methods in Psychology: Conceptual Issues and Design , 2010 .

[3]  L. Punnett,et al.  Cross-Language Differential Item Functioning of the Job Content Questionnaire Among European Countries: The JACE Study , 2009, International Journal of Behavioral Medicine.

[4]  Disentangling Sources of Differential Item Functioning in Multilanguage Assessments , 2002 .

[5]  Cornelis A.W. Glas,et al.  Modification indices for the 2-PL and the nominal response model , 1999 .

[6]  H. Swaminathan,et al.  Detecting Differential Item Functioning Using Logistic Regression Procedures , 1990 .

[7]  E. Taal,et al.  Application of the health assessment questionnaire disability index to various rheumatic diseases , 2010, Quality of Life Research.

[8]  Dorothy T. Thayer,et al.  Application of an Empirical Bayes Enhancement of Mantel-Haenszel Differential Item Functioning Analysis to a Computerized Adaptive Test , 2002 .

[9]  Frederic M. Lord,et al.  Small n justifies the Rasch model , 1983 .

[10]  R. Vandenberg,et al.  A Review and Synthesis of the Measurement Invariance Literature: Suggestions, Practices, and Recommendations for Organizational Research , 2000 .

[11]  Robert T. Golembiewski,et al.  Measuring Change and Persistence in Human Affairs: Types of Change Generated by OD Designs , 1976 .

[12]  R. Zwick The Investigation of Differential Item Functioning in Adaptive Tests , 2009 .

[13]  R. D. de Haan,et al.  The use of an item response theory-based disability item bank across diseases: accounting for differential item functioning. , 2010, Journal of clinical epidemiology.

[14]  T. Schmitt,et al.  Introduction to the Special Issue: Moving Beyond Traditional Psychometric Approaches , 2011 .

[15]  Alija Kulenović,et al.  Standards for Educational and Psychological Testing , 1999 .

[16]  R. Hambleton,et al.  Adapting educational and psychological tests for cross-cultural assessment , 2004 .

[17]  Bruno D. Zumbo,et al.  Three Generations of DIF Analyses: Considering Where It Has Been, Where It Is Now, and Where It Is Going , 2007 .

[18]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[19]  Cees A. W. Glas,et al.  DETECTION OF DIFFERENTIAL ITEM FUNCTIONING USING LAGRANGE MULTIPLIER TESTS , 1996 .

[20]  Shudong Wang,et al.  A Meta-Analysis of Testing Mode Effects in Grade K-12 Mathematics Tests , 2007 .

[21]  Cornelis A.W. Glas,et al.  A Comparison of Item-Fit Statistics for the Three-Parameter Logistic Model , 2003 .

[22]  Cees A. W. Glas,et al.  Testing the Rasch Model , 1995 .

[23]  R. Hambleton,et al.  Issues, Designs, and Technical Guidelines for Adapting Tests Into Multiple Languages and Cultures , 2004 .

[24]  J. Horn,et al.  A practical and theoretical guide to measurement invariance in aging research. , 1992, Experimental aging research.

[25]  David Watkins,et al.  The Issue Of Measurement Invariance Revisited , 2003 .

[26]  Dorothy T. Thayer,et al.  DIFFERENTIAL ITEM FUNCTIONING AND THE MANTEL‐HAENSZEL PROCEDURE , 1986 .

[27]  R. Vandenberg Toward a Further Understanding of and Improvement in Measurement Invariance Methods and Procedures , 2002 .

[28]  Adam W. Meade,et al.  A Comparison of Item Response Theory and Confirmatory Factor Analytic Methodologies for Establishing Measurement Equivalence/Invariance , 2004 .

[29]  R. Hambleton,et al.  Advances in translating and adapting educational and psychological tests , 2003 .

[30]  R. D. Bock,et al.  Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm , 1981 .

[31]  Gary A. Schaeffer The Introduction and Comparability of the Computer Adaptive GRE General Test. GRE Board Professional Report No. 88-08aP. , 1995 .

[32]  Adam W. Meade,et al.  Are Internet and Paper-and-Pencil Personality Tests Truly Comparable? , 2007 .

[33]  Ronald K. Hambleton Adapting achievement tests into multiple languages for international assessments , 2002 .