Score-Based Tests of Differential Item Functioning via Pairwise Maximum Likelihood Estimation

Measurement invariance is a fundamental assumption in item response theory models, where the relationship between a latent construct (ability) and observed item responses is of interest. Violation of this assumption would render the scale misinterpreted or cause systematic bias against certain groups of persons. While a number of methods have been proposed to detect measurement invariance violations, they typically require advance definition of problematic item parameters and respondent grouping information. However, these pieces of information are typically unknown in practice. As an alternative, this paper focuses on a family of recently proposed tests based on stochastic processes of casewise derivatives of the likelihood function (i.e., scores). These score-based tests only require estimation of the null model (when measurement invariance is assumed to hold), and they have been previously applied in factor-analytic, continuous data contexts as well as in models of the Rasch family. In this paper, we aim to extend these tests to two-parameter item response models, with strong emphasis on pairwise maximum likelihood. The tests’ theoretical background and implementation are detailed, and the tests’ abilities to identify problematic item parameters are studied via simulation. An empirical example illustrating the tests’ use in practice is also provided.

[1]  Jean-Paul Fox,et al.  Bayesian Item Response Modeling , 2010 .

[2]  R. Brennan,et al.  Test Equating, Scaling, and Linking , 2004 .

[3]  J. Fox Bayesian Item Response Modeling: Theory and Applications , 2010 .

[4]  F. Samejima Estimation of latent ability using a response pattern of graded scores , 1968 .

[5]  Gunter Maris,et al.  A Statistical Test for Differential Item Pair Functioning , 2015, Psychometrika.

[6]  D. Rosenthal,et al.  Quality-space theory in olfaction , 2014, Front. Psychol..

[7]  H. Swaminathan,et al.  Detecting Differential Item Functioning Using Logistic Regression Procedures , 1990 .

[8]  N. Hjort,et al.  Tests For Constancy Of Model Parameters Over Time , 2002 .

[9]  R. D. Bock,et al.  High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature , 2005 .

[10]  Wen-Chung Wang,et al.  Effects of Anchor Item Methods on Differential Item Functioning Detection with the Likelihood Ratio Test , 2003 .

[11]  Steven J. Osterlind,et al.  Differential Item Functioning , 2009, Item Response Theory.

[12]  P. Boeck,et al.  A general framework and an R package for the detection of dichotomous differential item functioning , 2010, Behavior research methods.

[13]  A. Satorra Alternative test criteria in covariance structure analysis: A unified approach , 1989 .

[14]  De Ayala,et al.  The Theory and Practice of Item Response Theory , 2008 .

[15]  Cornelis A.W. Glas,et al.  A Comparison of Item-Fit Statistics for the Three-Parameter Logistic Model , 2003 .

[16]  Myrsini Katsikatsou,et al.  Pairwise Likelihood Ratio Tests and Model Selection Criteria for Structural Equation Models with Ordinal Variables , 2016, Psychometrika.

[17]  R. Millsap Four Unresolved Problems in Studies of Factorial Invariance. , 2005 .

[18]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[19]  David Magis,et al.  Item Purification Does Not Always Improve DIF Detection , 2013 .

[20]  Howard T. Everson,et al.  Methodology Review: Statistical Approaches for Assessing Measurement Bias , 1993 .

[21]  David Thissen,et al.  Marginal maximum likelihood estimation for the one-parameter logistic model , 1982 .

[22]  N. Dorans Using Subpopulation Invariance to Assess Test Score Equity , 2004 .

[23]  Achim Zeileis,et al.  Testing for Measurement Invariance with Respect to an Ordinal Variable , 2014, Psychometrika.

[24]  K. Hornik,et al.  Generalized M‐fluctuation tests for parameter instability , 2007 .

[25]  D. Andrews Tests for Parameter Instability and Structural Change with Unknown Change Point , 1993 .

[26]  Achim Zeileis,et al.  Implementing a class of structural change tests: An econometric computing approach , 2006, Comput. Stat. Data Anal..

[27]  E. Muraki A GENERALIZED PARTIAL CREDIT MODEL: APPLICATION OF AN EM ALGORITHM , 1992 .

[28]  Dorothy T. Thayer,et al.  Differential Item Performance and the Mantel-Haenszel Procedure. , 1986 .

[29]  Cornelis A.W. Glas,et al.  Modeling Country-Specific Differential Item Functioning , 2013 .

[30]  Fritz Drasgow,et al.  Detecting differential item functioning with confirmatory factor analysis and item response theory: toward a unified strategy. , 2006, The Journal of applied psychology.

[31]  Wim Van Den Noortgate,et al.  Assessing and Explaining Differential Item Functioning Using Logistic Mixed Models , 2005 .

[32]  R. Darrell Bock,et al.  High-dimensional Full-information Item Factor Analysis , 1997 .

[33]  Yves Rosseel,et al.  lavaan: An R Package for Structural Equation Modeling , 2012 .

[34]  Testing Fit to IRT Models for Polytomously Scored Items , 2010 .

[35]  Wim J. van der Linden,et al.  Marginal likelihood inference for a model for item responses and response times. , 2010, The British journal of mathematical and statistical psychology.

[36]  Achim Zeileis,et al.  Score-based tests of measurement invariance: use in practice , 2014, Front. Psychol..

[37]  Achim Zeileis,et al.  Rasch Trees: A New Method for Detecting Differential Item Functioning in the Rasch Model , 2015, Psychometrika.

[38]  Myrsini Katsikatsou,et al.  Pairwise likelihood estimation for factor analysis models with ordinal data , 2012, Comput. Stat. Data Anal..

[39]  Achim Zeileis,et al.  Anchor Selection Strategies for DIF Analysis , 2015, Educational and psychological measurement.

[40]  Gideon J. Mellenbergh,et al.  Item bias and item response theory , 1989 .

[41]  I. W. Molenaar,et al.  Rasch models: foundations, recent developments and applications , 1995 .

[42]  C. Glas Item Response Theory Models in Behavioral Social Science: Assessment of Fit , 2015 .

[43]  Cornelis A.W. Glas,et al.  Modification indices for the 2-PL and the nominal response model , 1999 .

[44]  R. Philip Chalmers,et al.  mirt: A Multidimensional Item Response Theory Package for the R Environment , 2012 .

[45]  Gerhard Tutz,et al.  A Penalty Approach to Differential Item Functioning in Rasch Models , 2015, Psychometrika.

[46]  Jan de Leeuw,et al.  On the relationship between item response theory and factor analysis of discretized variables , 1987 .

[47]  Achim Zeileis,et al.  Tests of Measurement Invariance Without Subgroups: A Generalization of Classical Methods , 2013, Psychometrika.

[48]  Jean-Paul Fox,et al.  Evaluating evidence for invariant items: A Bayes factor applied to testing measurement invariance in IRT models , 2016 .

[49]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[50]  Howard Wainer,et al.  Use of item response theory in the study of group differences in trace lines. , 1988 .

[51]  Carol M. Woods Empirical Selection of Anchors for Tests of Differential Item Functioning , 2009 .

[52]  Cees A. W. Glas,et al.  DETECTION OF DIFFERENTIAL ITEM FUNCTIONING USING LAGRANGE MULTIPLIER TESTS , 1996 .

[53]  Gerhard H. Fischer,et al.  Some neglected problems in IRT , 1995 .

[54]  Gerhard H. Fischer,et al.  Derivations of the Rasch Model , 1995 .

[55]  Achim Zeileis,et al.  Strucchange: An R package for testing for structural change in linear regression models , 2002 .

[56]  Nambury S. Raju,et al.  The area between two item characteristic curves , 1988 .

[57]  Cornelis A.W. Glas,et al.  Item Parameter Estimation and Item Fit Analysis , 2009 .