Item Clusters and Computerized Adaptive Testing: A Case for Testlets

It is observed that many sorts of difficulties may preclude the uneventful construction of tests by a computerized algorithm, such as those currently in favor in Computerized Adaptive Testing (CAT). In this essay we discuss a number of these problems, as well as some possible avenues of solution. We conclude with the development of the "testlet," a bundle of items that can be arranged either hierarchically or linearly, thus maintaining the efficiency of an adaptive test while keeping the quality control of test construction that is possible currently only with careful expert scrutiny. Performance on the separate testlets is aggregated to yield ability estimates. In the old days of testing a wise examiner confronted a nervous examinee and asked questions. After getting some initial idea of the examinee's ability or knowledge level, the examiner would not waste time asking questions that were too difficult or too easy. Instead, he or she would focus questions on the area around the examinee's proficiency level. This made for a challenging and efficient test. It was also subjective and expensive. This state of affairs changed during the First World War with the Army Alpha. The mass processing of recruits into the armed services demonstrated that standardized tests could make the training and accessioning process more efficient. Substituting for the wise examiner was a broad-range test made up of many multiple-choice questions. This had several advantages, among which were its vastly reduced cost per unit and its objectivity-everyone took the same test, and it was scored in the same way for everyone. It also had some disadvantages. As a written document, a test form could be stolen, and because of its broad audience it had to contain items that would tax the least able of the prospective

[1]  William G. Mollenkopf,et al.  An experimental study of the effects on item-analysis data of changing item placement and test time limit , 1950 .

[2]  F. Lord A theory of test scores. , 1952 .

[3]  Gilbert Sax,et al.  An Investigation of Response Sets on Altered Parallel Forms , 1962 .

[4]  F. Samejima Estimation of latent ability using a response pattern of graded scores , 1968 .

[5]  Frederic M. Lord,et al.  THE SELF‐SCORING FLEXILEVEL TEST1 , 1971 .

[6]  Frederic M. Lord Robbins-Monro Procedures for Tailored Testing , 1971 .

[7]  R. Darrell Bock,et al.  Estimating item parameters and latent ability when responses are scored in two or more nominal categories , 1972 .

[8]  F. Lord INDIVIDUALIZED TESTING AND ITEM CHARACTERISTIC CURVE THEORY , 1972 .

[9]  F. Lord A Broad-Range Tailored Test of Verbal Ability , 1975 .

[10]  Nelson J. Towle,et al.  EFFECTS OF ANXIETY TYPE AND ITEM‐DIFFICULTY SEQUENCING ON MATHEMATICS TEST PERFORMANCE* , 1975 .

[11]  P. Suppes,et al.  Contemporary Developments in Mathematical Psychology , 1976 .

[12]  R. Dawis,et al.  The Influence of Test Context On Item Difficulty , 1976 .

[13]  David J. Weiss Proceedings of the 1977 Computerized Adaptive Testing Conference , 1977 .

[14]  Frederic M. Lord,et al.  Practical Applications of Item Characteristic Curve Theory. , 1977 .

[15]  Wendy M. Yen,et al.  The Extent, Causes and Importance of Context Effects on Item Parameters for Two Latent Trait Models. , 1980 .

[16]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[17]  Neal M. Kingston,et al.  Item Location Effects and Their Implications for IRT Equating and Adaptive Testing , 1984 .

[18]  David Andrich,et al.  9 – A Latent-Trait Model for Items with Response Dependencies: Implications for Test Construction and Analysis* , 1985 .

[19]  E. Muraki,et al.  Full-Information Item Factor Analysis , 1988 .