The many null distributions of person fit indices

This paper deals with the situation of an investigator who has collected the scores ofn persons to a set ofk dichotomous items, and wants to investigate whether the answers of all respondents are compatible with the one parameter logistic test model of Rasch. Contrary to the standard analysis of the Rasch model, where all persons are kept in the analysis and badly fittingitems may be removed, this paper studies the alternative model in which a small minority ofpersons has an answer strategy not described by the Rasch model. Such persons are called anomalous or aberrant. From the response vectors consisting ofk symbols each equal to 0 or 1, it is desired to classify each respondent as either anomalous or as conforming to the model. As this model is probabilistic, such a classification will possibly involve false positives and false negatives. Both for the Rasch model and for other item response models, the literature contains several proposals for a person fit index, which expresses for each individual the plausibility that his/her behavior follows the model. The present paper argues that such indices can only provide a satisfactory solution to the classification problem if their statistical distribution is known under the null hypothesis that all persons answer according to the model. This distribution, however, turns out to be rather different for different values of the person's latent trait value. This value will be called “ability parameter”, although our results are equally valid for Rasch scales measuring other attributes.As the true ability parameter is unknown, one can only use its estimate in order to obtain an estimated person fit value and an estimated null hypothesis distribution. The paper describes three specifications for the latter: assuming that the true ability equals its estimate, integrating across the ability distribution assumed for the population, and conditioning on the total score, which is in the Rasch model the sufficient statistic for the ability parameter.Classification rules for aberrance will be worked out for each of the three specifications. Depending on test length, item parameters and desired accuracy, they are based on the exact distribution, its Monte Carlo estimate and a new and promising approximation based on the moments of the person fit statistic. Results for the likelihood person fit statistic are given in detail, the methods could also be applied to other fit statistics. A comparison of the three specifications results in the recommendation to condition on the total score, as this avoids some problems of interpretation that affect the other two specifications.

[1]  E. B. Andersen,et al.  Estimating the parameters of the latent population distribution , 1977 .

[2]  Kikumi K. Tatsuoka,et al.  Caution indices based on item response theory , 1984 .

[3]  Fritz Drasgow,et al.  Detecting Inappropriate Test Scores with Optimal and Practical Appropriateness Indices , 1987 .

[4]  Richard M. Smith A Comparison of Rasch Person Analysis and Robust Estimators , 1985 .

[5]  Delwyn L. Harnisch,et al.  ANALYSIS OF ITEM RESPONSE PATTERNS. QUESTIONABLE TEST DATA AND DISSIMILAR CURRICULUM PRACTICES , 1981 .

[6]  R. D. Bock,et al.  Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm , 1981 .

[7]  Charles Lewis,et al.  A Nonparametric Approach to the Analysis of Dichotomous Item Responses , 1982 .

[8]  P. Jansen Computing the second-order derivatives of the symmetric functions in the Rasch model , 1984 .

[9]  G. Masters,et al.  Rating scale analysis , 1982 .

[10]  Irene A. Stegun,et al.  Handbook of Mathematical Functions. , 1966 .

[11]  Erling B. Andersen,et al.  Latent Trait Models and Ability Parameter Estimation , 1982 .

[12]  R. D. Bock,et al.  Marginal maximum likelihood estimation of item parameters , 1982 .

[13]  Fritz Drasgow,et al.  Item response theory : application to psychological measurement , 1983 .

[14]  Robert J. Mislevy,et al.  Bayes modal estimation in item response models , 1986 .

[15]  A. Formann A note on the computation of the second-order derivatives of the elementary symmetric functions in the rasch model , 1986 .

[16]  Kikumi K. Tatsuoka,et al.  A Probabilistic Model for Diagnosing Misconceptions By The Pattern Classification Approach , 1985 .

[17]  N. D. Verhelst,et al.  Extensions of the partial credit model , 1989 .

[18]  Michael V. LeVine,et al.  Appropriateness measurement: Review, critique and validating studies , 1982 .

[19]  B. Wright,et al.  Best test design , 1979 .

[20]  James O. Ramsay,et al.  A comparison of three simple test theory models , 1989 .

[21]  Richard M. Smith Person Fit in the Rasch Model , 1986 .

[22]  Pieter M. Kroonenberg,et al.  A survey of algorithms for exact distributions of test statistics in r × c contingency tables with fixed margins , 1985 .

[23]  H. Jane Rogers,et al.  A Monte Carlo Investigation of Several Person and Item Fit Statistics for Item Response Models , 1987 .

[24]  Fritz Drasgow,et al.  Appropriateness measurement with polychotomous item response models and standardized indices , 1984 .

[25]  Jan Kogut Detecting Aberrant Response Patterns in the Rasch Model. Rapport 87-3. , 1987 .