Comparison of the Psychometric Properties of Several Computer-Based Test Designs for Credentialing Exams With Multiple Purposes

Many credentialing agencies today are either administering their examinations by computer or are likely to be doing so in the coming years. Unfortunately, although several promising computer-based test designs are available, little is known about how well they function in examination settings. The goal of this study was to compare fixed-length examinations (both operational forms and newly constructed forms) with several variations of multistage test designs for making pass-fail decisions. Results were produced for 3 passing scores. Four operational 60-item examinations were compared to (a) 3 new 60-item forms, (b) 60-item 3-stage tests, and (c) 40-item 2-stage tests; all were constructed using automated test assembly software. The study was carried out using computer simulation techniques that were set to mimic common examination practices. All 60-item tests, regardless of design or passing score, produced accurate ability estimates and acceptable and similar levels of decision consistency and decision accuracy. One interesting finding was that the 40-item test results were poorer than the 60-item test results, as expected, but were in the range of acceptability. This raises the practical policy question of whether content-valid 40-item tests with lower item exposure levels and/or savings in item development costs are an acceptable trade-off for a small loss in decision accuracy and consistency.

[1]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[2]  R. Hambleton,et al.  Fundamentals of Item Response Theory , 1991 .

[3]  L. Thelma Meeting of the National Council on Measurement in Education , 2000 .

[4]  Richard M. Luecht,et al.  Some Practical Examples of Computer‐Adaptive Sequential Testing , 1998 .

[5]  Walter P. Vispoel,et al.  Reviewing and Changing Answers on Computer‐adaptive and Self‐adaptive Vocabulary Tests , 1998 .

[6]  Martha L. Stocking,et al.  Methods of Controlling the Exposure of Items in CAT , 2000 .

[7]  Bernard P. Veldkamp,et al.  An Integer Programming Approach to Item Bank Design , 2000 .

[8]  Bernard P. Veldkamp,et al.  Designing item pools for computerized adaptive testing , 1999 .

[9]  Bernard P. Veldkamp,et al.  Automated Simultaneous Assembly of Multi-Stage Testing for the , 2004 .

[10]  Cornelis A.W. Glas,et al.  Computerized adaptive testing : theory and practice , 2000 .

[11]  Howard Wainer,et al.  Computerized Adaptive Testing: A Primer , 2000 .

[12]  Z. Ying,et al.  a-Stratified Multistage Computerized Adaptive Testing with b Blocking , 2001 .

[13]  Ronald K. Hambleton,et al.  Impact of Test Design, Item Quality, and Item Bank Size on the Psychometric Properties of Computer-Based Credentialing Examinations , 2004 .

[14]  Liane Nicole Patsula,et al.  A comparison of computerized adaptive testing and multi-stage testing. , 1999 .

[15]  Dehui Xing,et al.  Impact of several computer-based testing variables on the psychometric properties of credentialing examinations. , 2001 .

[16]  David J. Weiss,et al.  Book Review : New Horizons in Testing: Latent Trait Test Theory and Computerized Adaptive Testing David J. Weiss (Ed.) New York: Academic Press, 1983, 345 pp., $35.00 , 1984 .