The Null Distribution of Person-Fit Statistics for Conventional and Adaptive Tests

Several person-fit statistics have been proposed to detect item score patterns that do not fit an item response theory model. To classify response patterns as misfitting, the distribution of a person-fit statistic is needed. The theoretical null distributions of several fit statistics have been derived for paper-and-pencil (P&P) tests. However, it is unknown whether these distributions also hold for computerized adaptive tests (CAT). A three-part simulation study was conducted. In the first study, the theoretical distribution of the l z statistic across trait. θlevels for CAT and P&P tests was investigated. The distribution of the l* z statistic proposed by Snijders (in press) was also investigated. Results indicated that the distribution of both l z and l* z differed from the theoretical distribution in CAT. The second study examined the distributions of l z and l* z using simulation. These simulated distributions, when based on O [UNKNOWN], were found to be problematic in CAT. In the third study, the detection rates of l* z and l z were compared. The rates for both statistics were found to be similar in most cases.

[1]  M. Liou,et al.  Constructing the exact significance level for a person fit statistic , 1992 .

[2]  R. Hambleton,et al.  Item Response Theory , 1984, The History of Educational Measurement.

[3]  Rob R. Meijer,et al.  Detecting person misfit in adaptive testing using statistical process control techniques , 2000 .

[4]  S. Reise,et al.  Fitting the Two-Parameter Model to Personality Data , 1990 .

[5]  Steven P. Reise,et al.  Traitedness and the assessment of response pattern scalability , 1993 .

[6]  T. A. Warm Weighted likelihood estimation of ability in item response theory , 1989 .

[7]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[8]  Fritz Drasgow,et al.  Appropriateness Measurement for Some Multidimensional Test Batteries , 1991 .

[9]  K. C. Klauer An exact and optimal standardized person test for assessing consistency with the rasch model , 1991 .

[10]  Fritz Drasgow,et al.  Appropriateness measurement with polychotomous item response models and standardized indices , 1984 .

[11]  R. Tibshirani,et al.  An introduction to the bootstrap , 1993 .

[12]  Fritz Drasgow,et al.  Optimal Detection of Certain Forms of Inappropriate Test Scores , 1986 .

[13]  I. W. Molenaar,et al.  Rasch models: foundations, recent developments and applications , 1995 .

[14]  Klaas Sijtsma,et al.  Influence of Test and Person Characteristics on Nonparametric Appropriateness Measurement , 1994 .

[15]  Fritz Drasgow,et al.  Optimal Identification of Mismeasured Individuals. , 1996 .

[16]  Karl Christoph Klauer The Assessment of Person Fit , 1995 .

[17]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[18]  R. Hambleton,et al.  Item Response Theory: Principles and Applications , 1984 .

[19]  Michael L. Nering The Distribution of Indexes of Person Fit within the Computerized Adaptive Testing Environment , 1997 .

[20]  Fritz Drasgow,et al.  Detecting Faking on a Personality Instrument Using Appropriateness Measurement , 1996 .

[21]  Edward J. Bedrick Approximating the conditional distribution of person FIT indexes for checking the rasch model , 1997 .

[22]  D. Rubin,et al.  MEASURING THE APPROPRIATENESS OF MULTIPLE‐CHOICE TEST SCORES1,2 , 1976 .

[23]  Rob R. Meijer,et al.  The Number of Guttman Errors as a Simple and Powerful Person-Fit Statistic , 1994 .

[24]  Rob R. Meijer,et al.  Trait Level Estimation for Nonfitting Response Vectors , 1997 .

[25]  Rob R. Meijer,et al.  CUSUM-Based Person-Fit Statistics for Adaptive Testing , 2001 .

[26]  Steven P. Reise,et al.  Scoring Method and the Detection of Person Misfit in a Personality Assessment Context , 1995 .

[27]  Herbert Hoijtink,et al.  The many null distributions of person fit indices , 1990 .

[28]  Kikumi K. Tatsuoka,et al.  Caution indices based on item response theory , 1984 .

[29]  Fritz Drasgow,et al.  Detecting Inappropriate Test Scores with Optimal and Practical Appropriateness Indices , 1987 .

[30]  F. Baker,et al.  Item response theory : parameter estimation techniques , 1993 .

[31]  Donald B. Rubin,et al.  Measuring the Appropriateness of Multiple-Choice Test Scores , 1979 .

[32]  Steven P. Reise,et al.  The Influence of Test Characteristics on the Detection of Aberrant Response Patterns , 1991 .

[33]  Rob R. Meijer,et al.  Statistical Tests for Person Misfit in Computerized Adaptive Testing. Research Report 98-01. , 1998 .

[34]  Cornelis A.W. Glas,et al.  Computerized adaptive testing : theory and practice , 2000 .

[35]  R. J. De Ayala The nominal response model in computerized adaptive testing , 1992 .

[36]  Robert J. Jannarone,et al.  Conjunctive item response theory kernels , 1986 .