Rasch Scale Stability in the Presence of Item Parameter and Trait Drift

Testing programs often rely on common-item equating to maintain a single measurement scale across multiple test administrations and multiple years. Changes over time, in the item parameters and the latent trait underlying the scale, can lead to inaccurate score comparisons and misclassifications of examinees. This study examined how instability in a scale and the items composing a scale affects item parameter recovery and classification accuracy. Results showed that a Rasch item response theory scale can maintain near baseline recovery properties if the changes in the latent trait over time are small. The Rasch scale also maintained good recovery of item and person parameters if there was equal item drift in both directions. Under conditions of relatively little item drift and small to moderate periodic changes in the latent trait, a Rasch scale may remain stable for 15 years, ±3. Substantial item drift or large changes in the latent trait can dramatically reduce the longevity of the scale.

[1]  Gautam Puhan Detecting and Correcting Scale Drift in Test Equating: An Illustration from a Large Scale Testing Program , 2008 .

[2]  Mark R. Raymond,et al.  Determining the Content of Credentialing Examinations , 2006 .

[3]  Harvey Goldstein,et al.  Measuring Changes in Educational Attainment over Time: Problems and Possibilities. , 1983 .

[4]  Melissa S. Yale,et al.  Differential Item Functioning , 2014 .

[5]  Melvin R. Novick,et al.  Some latent train models and their use in inferring an examinee's ability , 1966 .

[6]  Xin Li,et al.  An Investigation of the Item Parameter Drift in the Examination for the Certificate of Proficiency in English (ECPE) , 2008 .

[7]  R. Darrell Bock,et al.  Item Pool Maintenance in the Presence of Item Parameter Drift. , 1988 .

[8]  Mark R. Raymond,et al.  Job Analysis and the Specification of Content for Licensure and Certification Examinations , 2001 .

[9]  George Karabatsos,et al.  Comparing the Aberrant Response Detection Performance of Thirty-Six Person-Fit Statistics , 2003 .

[10]  Stephen Olejnik,et al.  The Power of Rasch Person-Fit Statistics in Detecting Unusual Response Patterns , 1997 .

[11]  M. J. Subkoviak,et al.  The Effect of Item Parameter Drift on Examinee Ability Estimates , 2002 .

[12]  André A. Rupp,et al.  Understanding Parameter Invariance in Unidimensional IRT Models , 2006 .

[13]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[14]  Christine E. DeMars Detection of Item Parameter Drift over Multiple Test Administrations , 2004 .

[15]  F. Drasgow,et al.  What is the shelf life of a test? The effect of time on the psychometrics of a cognitive ability test battery , 1999 .

[16]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[17]  D. Eignor The standards for educational and psychological testing. , 2013 .

[18]  Joseph A. Martineau Distorting Value Added: The Use of Longitudinal, Vertically Scaled Student Achievement Data for Growth-Based, Value-Added Accountability , 2006 .