Tree-Based Global Model Tests for Polytomous Rasch Models

Psychometric measurement models are only valid if measurement invariance holds between test takers of different groups. Global model tests, such as the well-established likelihood ratio (LR) test, are sensitive to violations of measurement invariance, such as differential item functioning and differential step functioning. However, these traditional approaches are only applicable when comparing previously specified reference and focal groups, such as males and females. Here, we propose a new framework for global model tests for polytomous Rasch models based on a model-based recursive partitioning algorithm. With this approach, a priori specification of reference and focal groups is no longer necessary, because they are automatically detected in a data-driven way. The statistical background of the new framework is introduced along with an instructive example. A series of simulation studies illustrates and compares its statistical properties to the well-established LR test. While both the LR test and the new framework are sensitive to differential item functioning and differential step functioning and respect a given significance level regardless of true differences in the ability distributions, the new data-driven approach is more powerful when the group structure is not known a priori—as will usually be the case in practical applications. The usage and interpretation of the new method are illustrated in an empirical application example. A software implementation is freely available in the R system for statistical computing.

[1]  Daniel M. Bolt,et al.  A Monte Carlo Comparison of Parametric and Nonparametric Polytomous DIF Detection Methods , 2002 .

[2]  Wen-Chung Wang,et al.  Efficiency of the Mantel, Generalized Mantel–Haenszel, and Logistic Discriminant Function Analysis Methods in Detecting Differential Item Functioning for Polytomous Items , 2005 .

[3]  Neil J. Dorans,et al.  DIF Assessment for Polytomously Scored Items: A Framework for Classification and Evaluation , 1995 .

[4]  E. B. Andersen,et al.  A goodness of fit test for the rasch model , 1973 .

[5]  Melissa S. Yale,et al.  Differential Item Functioning , 2014 .

[6]  Robert D. Ankenmann,et al.  An Investigation of the Power of the Likelihood Ratio Goodness-of-Fit Statistic in Detecting Differential Item Functioning. , 1999 .

[7]  H. Jane Rogers,et al.  Differential Item Functioning , 2005 .

[8]  Achim Zeileis,et al.  Rasch Trees: A New Method for Detecting Differential Item Functioning in the Rasch Model , 2015, Psychometrika.

[9]  Dorothy T. Thayer,et al.  Differential Item Performance and the Mantel-Haenszel Procedure. , 1986 .

[10]  Wen-Chung Wang,et al.  Effects of Anchor Item Methods on the Detection of Differential Item Functioning Within the Family of Rasch Models , 2004 .

[11]  H. Swaminathan,et al.  Detecting Differential Item Functioning Using Logistic Regression Procedures , 1990 .

[12]  Wim Van Den Noortgate,et al.  Assessing and Explaining Differential Item Functioning Using Logistic Mixed Models , 2005 .

[13]  Mark Wilson,et al.  The partial credit model and null categories , 1993 .

[14]  Carolin Strobl,et al.  Unbiased split selection for classification trees based on the Gini Index , 2007, Comput. Stat. Data Anal..

[15]  Jan-Eric Gustafsson,et al.  Testing and obtaining fit of data to the Rasch model , 1980 .

[16]  P. Boeck,et al.  Explanatory item response models : a generalized linear and nonlinear approach , 2004 .

[17]  I. W. Molenaar,et al.  Rasch models: foundations, recent developments and applications , 1995 .

[18]  Carolin Strobl,et al.  Rasch-Analyse des Freiburger Fragebogens zur Achtsamkeit , 2013 .

[19]  Gregory Camilli,et al.  Application of a Method of Estimating DIF for Polytomous Test Items , 1999 .

[20]  David Andrich,et al.  An Expanded Derivation of the Threshold Structure of the Polytomous Rasch Model That Dispels Any “Threshold Disorder Controversy” , 2013 .

[21]  Achim Zeileis,et al.  Accounting for Individual Differences in Bradley-Terry Models by Means of Recursive Partitioning , 2011 .

[22]  K. Hornik,et al.  Model-Based Recursive Partitioning , 2008 .

[23]  Cees A. W. Glas,et al.  Testing the Rasch Model , 1995 .

[24]  Gerhard H. Fischer,et al.  Extended Rating Scale and Partial Credit Models for Assessing Change , 1995 .

[25]  J. Fox,et al.  Effect Displays in R for Multinomial and Proportional-Odds Logit Models: Extensions to the effects Package , 2009 .

[26]  R. Hambleton,et al.  Handbook of Modern Item Response Theory , 1997 .

[27]  Randall D. Penfield Assessing Differential Step Functioning in Polytomous Items Using a Common Odds Ratio Estimator. , 2007 .

[28]  Erling B. Andersen,et al.  Sufficient statistics and latent trait models , 1977 .

[29]  Wen-Chung Wang,et al.  Factors Influencing the Mantel and Generalized Mantel-Haenszel Methods for the Assessment of Differential Item Functioning in Polytomous Items , 2004 .

[30]  Hua-Hua Chang,et al.  Detecting DIF for Polytomously Scored Items: An Adaptation of the SIBTEST Procedure , 1995 .

[31]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[32]  Achim Zeileis,et al.  Tests of Measurement Invariance Without Subgroups: A Generalization of Classical Methods , 2013, Psychometrika.

[33]  A. Tamhane,et al.  Multiple Comparison Procedures , 2009 .

[34]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[35]  Matthias von Davier,et al.  Multivariate and Mixture Distribution Rasch Models , 2007 .

[36]  D. Andrich A rating formulation for ordered response categories , 1978 .

[37]  Achim Zeileis,et al.  Anchor Selection Strategies for DIF Analysis , 2015, Educational and psychological measurement.

[38]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[39]  G. Masters A rasch model for partial credit scoring , 1982 .

[40]  Daniel A. Newman,et al.  Using Mixed-Measurement Item Response Theory With Covariates (MM-IRT-C) to Ascertain Observed and Unobserved Measurement Equivalence , 2011 .

[41]  Randall D. Penfield,et al.  A Generalized DIF Effect Variance Estimator for Measuring Unsigned Differential Test Functioning in Mixed Format Tests , 2006 .

[42]  F. Samejima Estimation of latent ability using a response pattern of graded scores , 1968 .

[43]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[44]  Eugene G. Johnson The NAEP 1992 technical report , 1994 .

[45]  Stefan Schmidt,et al.  Measuring mindfulness—the Freiburg Mindfulness Inventory (FMI) , 2006 .

[46]  Basil Abou El-Komboz,et al.  Infrastructure for Psychometric Modeling , 2015 .

[47]  Matthias von Davier,et al.  Multivariate and Mixture Distribution Rasch Models: Extensions and Applications , 2006 .

[48]  Achim Zeileis,et al.  psychotree - Recursive partitioning based on psychometric models: Version 0.12-1 , 2011 .