Item Pool Maintenance in the Presence of Item Parameter Drift.

Differential linear drift of item location parameters over a 10-year period is demonstrated in data from the College Board Physics Achievement Test. The relative direction of drift is associated with the content of the items and reflects changing emphasis in the physics curricula of American secondary schools. No evidence of drift of discriminating power parameters was found. Statistical proceduresfor detecting, estimating, and accountingfor item parameter drift in item pools for long-term testing programs are proposed. Although schemes for maintaining a test scale by classical equating methods have been in use for many years (Angoff, 1971), no similar methodology employing item response theory (IRT) has yet been proposed. The main obstacle appears to be the problem of item parameter drift, that is, differential change in item parameter values over time (Goldstein, 1983). It has not been clear how to take into account possible increasing or decreasing difficulty over time of some items in an IRT scale relative to others that are unchanging or changing in the opposite direction. Such effects might be expected as a result of educational, technological, or cultural change during the useful life of the scale. An example is an item from a vocational aptitude test that asks for the correct SAE number of motor oil for winter use. Knowledge of SAE numbers for the viscosity of motor oil has become superfluous since the introduction of multiple-viscosity oils. In the present paper, a method for maintaining and updating an IRT scale over a period of time, while accounting for item parameter drift, is proposed. The method involves the fitting of what we call a "time-dependent" IRT model to data obtained from the operational use of the test. Although based on IRT and formulated at the item level rather than the test level, the model is employed in the same spirit as classical equating of successive forms of a regularly updated test. It does not depend on calibration of the items in data from a particular year, but attempts to smooth the parameter estimates over a number of years after eliminating cohort main effects. Item parameter drift is only one of the forms of differential item functioning (DIF) that can affect objective tests. Similar effects, referred to as item bias, can be observed when the respondents are classified with respect to subgroup