Detection of Unusual Test Administrations Using a Linear Mixed Effects Model

With an increase in the number of administrations of test forms, there is also an increase in the complexity of the quality assurance procedures needed to maintain the stability of the reported scores. Many traditional methods for quality control (QC; Allalouf, Educational Measurement: Issues and Practice 26(1):36–46, 2007) are not sufficient for detecting unusual scores in a rapid flow of administrations, for which new QC approaches have been proposed in recent years. In this study we investigate data from 15 consecutive administrations that follow a specific equating design (or braiding plan) and propose a linear mixed effects model for the detection of abnormal results. In this project we investigate the effects of the braiding plans and estimate the effects of several identified factors that influence the means and variances of the scale scores. This analysis is appropriate for tests with a continuous or almost continuous administration mode. The data are from a global standardized assessment of English skills. Reading and Listening sections of the English test were modeled separately. Test-takers’ education level, major, years abroad, years of study, and whether they repeated the test turned out to have significant effects on the scores. In addition, a formula for a prediction interval for the scaled mean score of certain subgroups was proposed to detect unusual administrations.

[1]  Yi-Hsuan Lee,et al.  Monitoring Scale Scores over Time via Quality Control Charts, Model-Based Approaches, and Time Series Techniques , 2013, Psychometrika.

[2]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[3]  A. Agresti An introduction to categorical data analysis , 1997 .

[4]  Paul W. Holland,et al.  Statistical models for test equating, scaling, and linking , 2011 .

[5]  A. Rotnitzky,et al.  Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis by DANIELS, M. J. and HOGAN, J. W , 2009 .

[6]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[7]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[8]  Roel Bosker,et al.  Multilevel analysis : an introduction to basic and advanced multilevel modeling , 1999 .

[9]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[10]  Alina A. von Davier,et al.  Applying Time-Series Analysis to Detect Scale Drift , 2009 .

[11]  Shelby J Haberman,et al.  Harmonic Regression and Scale Stability , 2013, Psychometrika.

[12]  N. Draper,et al.  Applied Regression Analysis: Draper/Applied Regression Analysis , 1998 .

[13]  von Davier,et al.  The Use of Quality Control and Data Mining Techniques for Monitoring Scaled Scores: An Overview. Research Report. ETS RR-12-20. , 2012 .

[14]  Avi Allalouf,et al.  Quality Control Procedures in the Scoring, Equating, and Reporting of Test Scores , 2007 .