The Effect of Test Length and IRT Model on the Distribution and Stability of Three Appropriateness Indexes

The extent to which three appropriateness indexes - Z 3 , ECIZ4, and W (a variation of Wright's person-fit statistic) - are well-standardized was investigated in a monte carlo study. To assess the effects of the item response theory (IRT) model and test length on the distribution of the indexes and their cutoff values at three false positive rates, nonaberrant response patterns were generated. ECIZ4 most closely approximated a normal distribution, showing less skewness and kurtosis than Z 3 , and W. The ECIZ4 cutoff values were affected less by test length and the IRT model than were Z 3 , and W. In contrast, the distribution of W was the least stable over replications, and its cutoff values varied greatly depending on the IRT model and test length