The Extent, Causes and Importance of Context Effects on Item Parameters for Two Latent Trait Models.

There is widespread interest in latent trait models-an entire edition of this journal (Volume 14, No. 2) was devoted to them. One of the most important assumptions of latent trait models is that the characteristics of each item can be described by one set of parameters. (See Allen & Yen, 1979 or Lord & Novick, 1968 for further discussions of the models and their assumptions.) If item parameters are influenced by the sequencing of the items or by the characteristics of other items in the test, then context effects are occurring. Two situations in which context effects are particularly important are (a) the field testing and selection of items and (b) the calibration of items (i.e., the estimation of item parameters) for item pools. If there are context effects, the items that appear to be best in a field test may not be the best in a final test booklet. With item pools, context effects can influence the choice of the best items and the parameter values associated with each item. The item parameter values also influence the trait values that are obtained subsequently for any given examinee's pass/fail responses to the items, and the parameter values influence the standard error of the trait value provided by the latent trait model. The item parameter values also influence the item characteristic curve (ICC) predicted by the latent trait model. The values of the ICC can be important. For example, the ICC can be used to estimate the degree to which a rise in a trait value, say through average growth from grade 3 to grade 4, will affect an examinee's performance on particular items. The test characteristic curve (TCC), which is the mean of the ICC values for the items in a particular test, also can be influenced by context effects on item parameters. The TCC relates expected proportion-correct scores to trait values, and it is useful in interpreting score distributions (Lord & Novick, 1968, pp. 386-392). Numerous studies have examined the effects of changes in item order or context on test performance. Most of these studies have not reported results for item statistics but rather have reported results for examinees' scores (Brenner, 1964; Cronbach, 1946, 1950; Sax & Carr, 1962; Sax & Cromack, 1966; Sirotnik & Wellington, 1974; Tuck, 1978), particularly as these scores relate to anxiety (Berger, Munz, Smouse, & Angelino, 1969; Hambleton & Traub, 1974; Marso, 1970; Munz & Smouse, 1968; Towle & Merrill, 1975). Several studies have examined the stability of classical item statistics under changes in context. When administered under power or slightly speeded conditions, items on some tests tended to be somewhat easier when they appeared near the beginning of a test than when they appeared near the end of a test (Flaugher, Melton, & Myers, 1968; Mollenkopf, 1950). Rearrangements of the items in other tests produced little change in item difficulties under slightly speeded or power conditions (Flaugher et al., 1968; Gerow, 1980; Huck & Bowers, 1972; Mollenkopf, 1950; Monk & Stallings, 1970). While item-test biserials were not systematically affected by position under power conditions (Flaugher et al., 1968; Mollenkopf, 1950), under speeded conditions biserials were higher when an item appeared near the end of a test than when it came near the beginning (Mollenkopf, 1950).

[1]  R. Dawis,et al.  The Influence of Test Context On Item Difficulty , 1976 .

[2]  R. N. Marso Test Item Arrangement, Testing Time, and Performance. , 1970 .

[3]  Charles T. Myers,et al.  Item Rearrangement Under Typical Test Conditions , 1968 .

[4]  R. Hambleton,et al.  The Effects of Item Order on Test Performance and Stress , 1974 .

[5]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[6]  D. Munz,et al.  Interaction effects of item-difficulty sequence and achievement-anxiety reaction on academic performance. , 1968, Journal of educational psychology.

[7]  M. J. Allen Introduction to Measurement Theory , 1979 .

[8]  H. Angelino,et al.  The effects of item difficulty sequencing and anxiety reaction type on aptitude test performance. , 1969, The Journal of psychology.

[9]  Test difficulty, reliability, and discrimination as functions of item difficulty order. , 1964 .

[10]  L. Cronbach Response Sets and Test Validity , 1946 .

[11]  Examinees' Control of Item Difficulty Sequence , 1978 .

[12]  SCRAMBLING CONTENT IN ACHIEVEMENT TESTING: AN APPLICATION OF MULTIPLE MATRIX SAMPLING IN EXPERIMENTAL DESIGN , 1974 .

[13]  Joshua R. Gerow Performance on Achievement Tests as a Function of the Order of Item Difficulty , 1980 .

[14]  Nelson J. Towle,et al.  EFFECTS OF ANXIETY TYPE AND ITEM‐DIFFICULTY SEQUENCING ON MATHEMATICS TEST PERFORMANCE* , 1975 .

[15]  The effects of anxiety and item difficulty sequence on achievement testing scores. , 1968, The Journal of psychology.

[16]  L. Cronbach Further Evidence on Response Sets and Test Design , 1950 .

[17]  W. M. Yen Using Simulation Results to Choose a Latent Trait Model , 1981 .

[18]  William G. Mollenkopf,et al.  An experimental study of the effects on item-analysis data of changing item placement and test time limit , 1950 .

[19]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[20]  Gilbert Sax,et al.  THE EFFECTS OF VARIOUS FORMS OF ITEM ARRANGEMENTS ON TEST PERORMANCE , 1966 .

[21]  Gilbert Sax,et al.  An Investigation of Response Sets on Altered Parallel Forms , 1962 .

[22]  S. Huck,et al.  ITEM DIFFICULTY LEVEL AND SEQUENCE EFFECTS IN MULTIPLE-CHOICE ACHIEVEMENT TESTS , 1972 .