The Influence of Test Characteristics on the Detection of Aberrant Response Patterns

Statistical methods to assess the congruence between an item response pattern and a specified item response theory model have recently proliferated. This "person fit" research has focused on the question: To what extent can person-fit indices identify well-defined forms of aberrant item response? This study extended previous person-fit research in two ways. First, an unexplored model for generating aberrant response patterns was explicated. The data-generation model is based on the theory that aberrant item responses result in less psychometric information for the individual than predicated by the parameters of a specified response model. Second, the proposed response aberrancy generation model was implemented to investigate how the aberrancy detection power of a person-fit statistic is influenced by test properties (e.g., the spread of item difficulties). Results indicated that detecting aberrant response patterns was especially problematic for tests with less than 20 items, and for tests with limited ranges of item difficulty. An applied consequence of these results is that certain types of test designs (e.g., peaked tests) and administration procedures (e.g., adaptive tests) potentially act to limit the detection of aberrant item responses.

[1]  Donald B. Rubin,et al.  Measuring the Appropriateness of Multiple-Choice Test Scores , 1979 .

[2]  Menucha Birenbaum,et al.  Effect of Dissimulation Motivation and Anxiety on Response Pattern Appropriateness Measures , 1986 .

[3]  Steven P. Reise,et al.  A Comparison of Item- and Person-Fit Methods of Assessing Model-Data Fit in IRT , 1990 .

[4]  H. V. D. Flier,et al.  Deviant Response Patterns and Comparability of Test Scores , 1982 .

[5]  Herbert Hoijtink,et al.  The many null distributions of person fit indices , 1990 .

[6]  Delwyn L. Harnisch,et al.  ANALYSIS OF ITEM RESPONSE PATTERNS. QUESTIONABLE TEST DATA AND DISSIMILAR CURRICULUM PRACTICES , 1981 .

[7]  Kikumi K. Tatsuoka,et al.  Caution indices based on item response theory , 1984 .

[8]  Fritz Drasgow,et al.  Detecting Inappropriate Test Scores with Optimal and Practical Appropriateness Indices , 1987 .

[9]  J. Lumsden,et al.  Person Reliability , 1977 .

[10]  Craig N. Mills,et al.  A Comparison of Several Goodness-of-Fit Statistics , 1985 .

[11]  Menucha Birenbaum Comparing the Effectiveness of Several Irt Based Appropriateness Measures in Detecting Unusual Response Patterns , 1985 .

[12]  W. M. Yen Using Simulation Results to Choose a Latent Trait Model , 1981 .

[13]  Fritz Drasgow,et al.  Appropriateness measurement with polychotomous item response models and standardized indices , 1984 .

[14]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[15]  Robert L. Linn,et al.  A Generalized Logistic Item Response Model Parameterizing Test Score Inappropriateness , 1987 .

[16]  D. Harnisch ITEM RESPONSE PATTERNS: APPLICATIONS FOR EDUCATIONAL PRACTICE , 1983 .

[17]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .