CUSUM-Based Person-Fit Statistics for Adaptive Testing

Item scores that do not fit an assumed item response theory model may cause the latent trait value to be inaccurately estimated. Several person-fit statistics for detecting nonfitting score patterns for paper-and-pencil tests have been proposed. In the context of computerized adaptive tests (CAT), the use of person-fit analysis has hardly been explored. Because it has been shown that the distribution of existing person-fit statistics is not applicable in a CAT, in this study new person-fit statistics are proposed and critical values for these statistics are derived from existing statistical theory. Statistics are proposed that are sensitive to runs of correct or incorrect item scores and are based on all items administered in a CAT or based on subsets of items, using observed and expected item scores and using cumulative sum (CUSUM) procedures. The theoretical and empirical distributions of the statistics are compared and detection rates are investigated. Results showed that the nominal and empirical Type I error rates were comparable for CUSUM procedures when the number of items in each subset and the number of measurement points were not too small. Detection rates of CUSUM procedures were superior to other fit statistics. Applications of the statistics are discussed.

[1]  Klaas Sijtsma,et al.  Detection of Aberrant Item Score Patterns: A Review of Recent Developments. Research Report 94-8. , 1994 .

[2]  Rob R. Meijer,et al.  Detection of Person Misfit in Computerized Adaptive Tests with Polytomous Items , 2002 .

[3]  Steven P. Reise,et al.  Scoring Method and the Detection of Person Misfit in a Personality Assessment Context , 1995 .

[4]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[5]  Tom A. B. Snijders Asymptotic distribution of person fit statistics with estimated person parameters , 2001 .

[6]  Klaas Sijtsma,et al.  Influence of Test and Person Characteristics on Nonparametric Appropriateness Measurement , 1994 .

[7]  Steven P. Reise,et al.  The Influence of Test Characteristics on the Detection of Aberrant Response Patterns , 1991 .

[8]  Mark D. Reckase,et al.  Item Response Theory: Parameter Estimation Techniques , 1998 .

[9]  Tim Davey,et al.  Pretesting alongside an Operational CAT. , 1999 .

[10]  Fritz Drasgow,et al.  Appropriateness measurement with polychotomous item response models and standardized indices , 1984 .

[11]  R. Hambleton,et al.  Item Response Theory , 1984, The History of Educational Measurement.

[12]  Herbert Hoijtink,et al.  The many null distributions of person fit indices , 1990 .

[13]  Donald B. Rubin,et al.  Measuring the Appropriateness of Multiple-Choice Test Scores , 1979 .

[14]  Michael V. Levine,et al.  Optimal appropriateness measurement , 1988 .

[15]  D. Siegmund Sequential Analysis: Tests and Confidence Intervals , 1985 .

[16]  Michael L. Nering The Distribution of Indexes of Person Fit within the Computerized Adaptive Testing Environment , 1997 .

[17]  B. Wright,et al.  Best test design , 1979 .

[18]  Rob R. Meijer,et al.  The Null Distribution of Person-Fit Statistics for Conventional and Adaptive Tests , 1999 .

[19]  T. A. Warm Weighted likelihood estimation of ability in item response theory , 1989 .

[20]  Roger M. Sauter,et al.  Introduction to Statistical Quality Control (2nd ed.) , 1992 .

[21]  Herbert Hoijtink,et al.  Person-Fit and the Rasch Model, with an Application to Knowledge of Logical Quantors. , 1996 .

[22]  Eric T. Bradlow,et al.  Bayesian Identification of Outliers in Computerized Adaptive Tests , 1998 .

[23]  Cornelis A.W. Glas,et al.  Computerized adaptive testing : theory and practice , 2000 .

[24]  Rob R. Meijer,et al.  Detecting person misfit in adaptive testing using statistical process control techniques , 2000 .

[25]  Klaas Sijtsma,et al.  Methodology Review: Evaluating Person Fit , 2001 .

[26]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.