A Monte Carlo Investigation of Several Person and Item Fit Statistics for Item Response Models

This study investigated the behavior of several per son and item fit statistics commonly used to test and obtain fit to the one-parameter item response model. Using simulated data for 500 persons and 15 items, the sensitivity of the total-t, mean-square residual, and between-t fit statistics to guessing, heterogeneity in discrimination parameters, and multidimensionality was examined. Additionally, 25 misfitting persons and a misfitting item were generated to test the power of the three fit statistics to detect deviations in a subset of observations. Neither the total-t nor the mean-square residual were able to detect deviation from any of the models fitted. Use of these statistics appears to be un warranted. The between-t was a useful indicator of guessing and heterogeneity in discrimination parame ters, but was unable to detect multidimensionality. These results show that use of person and item fit statistics to test and obtain overall fit to the one- parameter model can lead to acceptance of the model even when it is grossly inappropriate. Assessments of model fit based on this strategy are inadequate. Alter native methods must be sought.