Detecting Inappropriate Test Scores with Optimal and Practical Appropriateness Indices

Several statistics have been proposed as quantitative indices of the appropriateness of a test score as a mea sure of ability. Two criteria have been used to evalu ate such indices in previous research. The first crite rion, standardization, refers to the extent to which the conditional distributions of an index, given ability, are invariant across ability levels. The second criterion, relative power, refers to indices' relative effectiveness for detecting inappropriate test scores. In this paper the effectiveness of nine appropriateness indices is de termined in an absolute sense by comparing them to optimal indices; an optimal index is the most powerful index for a particular form of aberrance that can be computed from item responses. Three indices were found to provide nearly optimal rates of detection of very low ability response patterns modified to simulate cheating, as well as very high ability response patterns modified to simulate spuriously low responding. Opti mal indices had detection rates from 50% to 200% higher than any other index when average ability re sponse vectors were manipulated to appear spuriously high and spuriously low.