Detecting Differential Item and Step Functioning with Rating Scale and Partial Credit Trees

Several statistical procedures have been suggested for detecting differential item functioning (DIF) and differential step functioning (DSF) in polytomous items. However, standard procedures are designed for the comparison of pre-specified reference and focal groups, such as males and females. Here, we propose a framework for the detection of DIF and DSF in polytomous items under the rating scale and partial credit model, that employs a model-based recursive partitioning algorithm. In contrast to existing procedures, with this approach no pre-specification of reference and focal groups is necessary, because they are detected in a data-driven way. The resulting groups are characterized by (combinations of) covariates and thus directly interpretable. The statistical background and construction of the new procedures are introduced along with an instructive example. Four simulation studies illustrate and compare their statistical properties to the well-established likelihood ratio test (LRT). While both the LRT and the new procedures respect a given significance level, the new procedures are in most cases equally (simple DIF groups) or more powerful (complex DIF groups) and can also detect DSF. The sensitivity to model misspecification is investigated. An application example with empirical data illustrates the practical use. A software implementation of the new procedures is freely available in the R system for statistical computing.

[1]  R. J. Mokken,et al.  Handbook of modern item response theory , 1997 .

[2]  K. Gabriel,et al.  On closed testing procedures with special reference to ordered analysis of variance , 1976 .

[3]  F. Samejima Estimation of latent ability using a response pattern of graded scores , 1969 .

[4]  Randall D. Penfield Assessing Differential Step Functioning in Polytomous Items Using a Common Odds Ratio Estimator. , 2007 .

[5]  I. W. Molenaar,et al.  Rasch models: foundations, recent developments and applications , 1995 .

[6]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[7]  R. Hambleton,et al.  Handbook of Modern Item Response Theory , 1997 .

[8]  Erling B. Andersen,et al.  Sufficient statistics and latent trait models , 1977 .

[9]  Nambury S. Raju,et al.  A Description and Demonstration of the Polytomous-DFIT Framework , 1999 .

[10]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[11]  Gerhard H. Fischer,et al.  Extended Rating Scale and Partial Credit Models for Assessing Change , 1995 .

[12]  Eugene G. Johnson The NAEP 1992 technical report , 1994 .

[13]  Y.-S. Shih,et al.  A note on split selection bias in classification trees , 2004, Comput. Stat. Data Anal..

[14]  Gideon J. Mellenbergh,et al.  Conceptual Notes on Models for Discrete Polytomous Item Responses , 1995 .

[15]  Robert D. Ankenmann,et al.  An Investigation of the Power of the Likelihood Ratio Goodness-of-Fit Statistic in Detecting Differential Item Functioning. , 1999 .

[16]  J. Fox,et al.  Effect Displays in R for Multinomial and Proportional-Odds Logit Models: Extensions to the effects Package , 2009 .

[17]  Achim Zeileis,et al.  Accounting for Individual Differences in Bradley-Terry Models by Means of Recursive Partitioning , 2011 .

[18]  Carolin Strobl,et al.  Unbiased split selection for classification trees based on the Gini Index , 2007, Comput. Stat. Data Anal..

[19]  Jan-Eric Gustafsson,et al.  Testing and obtaining fit of data to the Rasch model , 1980 .

[20]  Carolin Strobl,et al.  Rasch-Analyse des Freiburger Fragebogens zur Achtsamkeit , 2013 .

[21]  Gregory Camilli,et al.  Application of a Method of Estimating DIF for Polytomous Test Items , 1999 .

[22]  Achim Zeileis,et al.  psychotree - Recursive partitioning based on psychometric models: Version 0.12-1 , 2011 .

[23]  R. Nungester,et al.  Analysis of Differential Item Functioning (DIF) Using Hierarchical Logistic Regression Models , 2002 .

[24]  G. Masters A rasch model for partial credit scoring , 1982 .

[25]  Jon Kabat-Zinn,et al.  Wherever you go there you are , 1994 .

[26]  H. Keselman,et al.  Multiple Comparison Procedures , 2005 .

[27]  David Andrich,et al.  An Expanded Derivation of the Threshold Structure of the Polytomous Rasch Model That Dispels Any “Threshold Disorder Controversy” , 2013 .

[28]  Melissa S. Yale,et al.  Differential Item Functioning , 2014 .

[29]  Hua-Hua Chang,et al.  Detecting DIF for Polytomously Scored Items: An Adaptation of the SIBTEST Procedure , 1995 .

[30]  H. Wainer,et al.  Differential Item Functioning. , 1994 .

[31]  Achim Zeileis,et al.  Rasch Trees: A New Method for Detecting Differential Item Functioning in the Rasch Model , 2015, Psychometrika.

[32]  K. Hornik,et al.  Model-Based Recursive Partitioning , 2008 .

[33]  Daniel M. Bolt,et al.  A Monte Carlo Comparison of Parametric and Nonparametric Polytomous DIF Detection Methods , 2002 .

[34]  K. Hornik,et al.  Generalized M‐fluctuation tests for parameter instability , 2007 .

[35]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[36]  Mark Wilson,et al.  The partial credit model and null categories , 1993 .

[37]  E. Muraki A GENERALIZED PARTIAL CREDIT MODEL: APPLICATION OF AN EM ALGORITHM , 1992 .

[38]  Dorothy T. Thayer,et al.  Differential Item Performance and the Mantel-Haenszel Procedure. , 1986 .

[39]  Cees A. W. Glas,et al.  Testing the Rasch Model , 1995 .

[40]  Allan S. Cohen,et al.  Detection of Differential Item Functioning Under the Graded Response Model With the Likelihood Ratio Test , 1998 .

[41]  Tamara Rader,et al.  Translation and validation of the Dutch version of the Effective Consumer Scale (EC-17) , 2012, Quality of Life Research.

[42]  Wim Van Den Noortgate,et al.  Assessing and Explaining Differential Item Functioning Using Logistic Mixed Models , 2005 .

[43]  Wen-Chung Wang,et al.  Factors Influencing the Mantel and Generalized Mantel-Haenszel Methods for the Assessment of Differential Item Functioning in Polytomous Items , 2004 .

[44]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[45]  D. Andrich A rating formulation for ordered response categories , 1978 .

[46]  Tony C. M. Lam,et al.  Assessing Differential Item Functioning in Performance Assessment: Review and Recommendations. , 2005 .

[47]  E. B. Andersen,et al.  A goodness of fit test for the rasch model , 1973 .

[48]  Carol M. Woods DIF Testing for Ordinal Items With Poly-SIBTEST, the Mantel and GMH Tests, and IRT-LR-DIF When the Latent Distribution Is Nonnormal for Both Groups , 2011 .

[49]  Randall D. Penfield Distinguishing between Net and Global DIF in Polytomous Items , 2010 .

[50]  F. Samejima Estimation of latent ability using a response pattern of graded scores , 1968 .

[51]  Randall D. Penfield,et al.  Using a Taxonomy of Differential Step Functioning to Improve the Interpretation of DIF in Polytomous Items: An Illustration , 2008 .

[52]  Wen-Chung Wang,et al.  Efficiency of the Mantel, Generalized Mantel–Haenszel, and Logistic Discriminant Function Analysis Methods in Detecting Differential Item Functioning for Polytomous Items , 2005 .

[53]  Stefan Schmidt,et al.  Measuring mindfulness—the Freiburg Mindfulness Inventory (FMI) , 2006 .

[54]  H. Swaminathan,et al.  Detecting Differential Item Functioning Using Logistic Regression Procedures , 1990 .

[55]  Neil J. Dorans,et al.  DIF Assessment for Polytomously Scored Items: A Framework for Classification and Evaluation , 1995 .