Rater Model Using Signal Detection Theory for Latent Differential Rater Functioning

Abstract Differential rater functioning (DRF) occurs when raters show evidence of exercising differential severity or leniency when scoring examinees within different subgroups. Previous studies of DRF have examined rater bias using manifest variables (e.g., use of covariates) to determine the subgroups. These manifest variables include gender and the ethnicity of the examinee. For example, a rater may score males more severely. Ideally, each rater’s severity should be invariant across subgroups. This study examines DRF in the context of latent subgroups that classify possible sources of DRF based on raters’ scoring behavior rather than manifest factors. An extension of the latent class signal detection theory (LC-SDT) model for identifying DRF is proposed and examined using real-world data and simulations. Results from real-world data show that the signal detection approach leads to an effective method to identify latent DRF. Simulations with varying sample sizes and conditions of rater precision were shown to recover parameters at an adequate level, supporting its use to identify latent DRF in large-scale data. These findings suggest that the DRF extension of the LC-SDT can be a useful model to examine characteristics of raters and add information that can aid rater training.

[1]  M. Agha,et al.  Finite Mixture Distribution , 1982 .

[2]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[3]  Lawrence T. DeCarlo,et al.  A Model of Rater Behavior in Essay Grading Based on Signal Detection Theory , 2005 .

[4]  Lawrence T DeCarlo,et al.  A Latent Class Extension of Signal Detection Theory, with Applications , 2002, Multivariate behavioral research.

[5]  Brian F. Patterson,et al.  Incorporating Criterion Ratings Into Model-Based Rater Monitoring Procedures Using Latent-Class Signal Detection Theory , 2017, Applied psychological measurement.

[6]  Jay Magidson,et al.  LG-Syntax user's guide: Manual for Latent GOLD 4.5 Syntax module , 2008 .

[7]  Young-sun Lee,et al.  Investigating the Impact of Item Parameter Drift for Item Response Theory Models with Mixture Distributions , 2016, Front. Psychol..

[8]  Terry A. Ackerman A Didactic Explanation of Item Bias, Item Impact, and Item Validity from a Multidimensional Perspective , 1992 .

[9]  Edward W. Wolfe,et al.  Monitoring Rater Performance Over Time: A Framework for Detecting Differential Accuracy and Differential Scale Category Use , 2009 .

[10]  Lawrence T. DeCarlo,et al.  STUDIES OF A LATENT-CLASS SIGNAL-DETECTION MODEL FOR CONSTRUCTED-RESPONSE SCORING , 2008 .

[11]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[12]  Stefanie A. Wind,et al.  Rater fairness in music performance assessment: Evaluating model-data fit and differential rater functioning , 2015 .

[13]  J. Vermunt Latent Class Models , 2004 .

[14]  Lawrence T. DeCarlo,et al.  Studies of a Latent Class Signal Detection Model for Constructed Response Scoring II: Incomplete and Hierarchical Designs. Research Report. ETS RR-10-08. , 2010 .

[15]  Jürgen Rost,et al.  Rasch Models in Latent Classes: An Integration of Two Approaches to Item Analysis , 1990 .

[16]  P. Holland,et al.  EVALUATING HYPOTHESES ABOUT DIFFERENTIAL ITEM FUNCTIONING1,2 , 1992 .

[17]  Kevin B. Tamanini Evaluating Differential Rater Functioning in Performance Ratings: Using a Goal-Based Approach , 2008 .

[18]  A. Cohen,et al.  Finite Mixture Distributions , 1982 .

[19]  Clifford C. Clogg,et al.  Handbook of statistical modeling for the social and behavioral sciences , 1995 .

[20]  Tx Station Stata Statistical Software: Release 7. , 2001 .

[21]  C. Clogg Latent Class Models , 1995 .

[22]  Matthew S. Johnson,et al.  A Hierarchical Rater Model for Constructed Responses, with a Signal Detection Rater Model , 2011 .

[23]  Allan S. Cohen,et al.  Model Selection Methods for Mixture Dichotomous IRT Models , 2009 .

[24]  A. Schmitt,et al.  EVALUATING HYPOTHESES ABOUT DIFFERENTIAL ITEM FUNCTIONING , 2012 .

[25]  Allan S. Cohen,et al.  A Mixture Model Analysis of Differential Item Functioning , 2005 .

[26]  J. Linacre,et al.  Many-facet Rasch measurement , 1994 .

[27]  Jeroen K. Vermunt,et al.  AVOIDING BOUNDARY ESTIMATES IN LATENT CLASS ANALYSIS BY BAYESIAN POSTERIOR MODE ESTIMATION , 2006 .

[28]  Nigel O'Brian,et al.  Generalizability Theory I , 2003 .

[29]  Wen-Chung Wang,et al.  Assessment of Differential Rater Functioning in Latent Classes with New Mixture Facets Models , 2017, Multivariate behavioral research.