Efficiency of Targeted Multistage Calibration Designs Under Practical Constraints: A Simulation Study

Calibration of an item bank for computer adaptive testing requires substantial resources. In this study, we investigated whether the efficiency of calibration under the Rasch model could be enhanced by improving the match between item difficulty and student ability. We introduced targeted multistage calibration designs, a design type that considers ability-related background variables and performance for assigning students to suitable items. Furthermore, we investigated whether uncertainty about item difficulty could impair the assembling of efficient designs. The results indicated that targeted multistage calibration designs were more efficient than ordinary targeted designs under optimal conditions. Limited knowledge about item difficulty reduced the efficiency of one of the two investigated targeted multistage calibration designs, whereas targeted designs were more robust.

[1]  Cornelis A.W. Glas,et al.  Psychometric aspects of pupil monitoring systems , 2009 .

[2]  Richard A. Feinberg,et al.  Conducting Simulation Studies in Psychometrics , 2016 .

[3]  Hua-Hua Chang,et al.  An Item-Driven Adaptive Design for Calibrating Pretest Items , 2014 .

[4]  R. Brennan,et al.  Test Equating, Scaling, and Linking: Methods and Practices , 2004 .

[5]  G. Maris,et al.  Conditional Statistical Inference with Multistage Testing Designs , 2015, Psychometrika.

[6]  Wim van den Noortgate,et al.  Item difficulty estimation: An auspicious collaboration between data and judgment , 2012, Comput. Educ..

[7]  K. Sijtsma,et al.  The effect of differential motivation on IRT linking , 2015 .

[8]  David J. Weiss,et al.  APPLICATION OF COMPUTERIZED ADAPTIVE TESTING TO EDUCATIONAL PROBLEMS , 1984 .

[9]  F. Keller,et al.  Entwicklung schulischer Leistungen während der obligatorischen Schulzeit. Bericht zur vierten Zürcher Lernstandserhebung zuhanden der Bildungsdirektion des Kantons Zürich , 2013 .

[10]  Isaac I. Bejar,et al.  Subject Matter Experts' Assessment of Item Statistics , 1981 .

[11]  N. Verhelst,et al.  Item calibration in incomplete testing designs , 2011 .

[12]  Martijn P. F. Berger On the Efficiency of IRT Models When Applied to Different Sampling Designs , 1991 .

[13]  C. D. Vale,et al.  Evaluation of the Efficiency of Item Calibration , 1988 .

[14]  T. O'neill,et al.  Re-Evaluating the NCLEX-RN® Passing Standard , 2005, Journal of Nursing Measurement.

[15]  Tim Davey,et al.  Realistic Simulation of Item Response Data. ACT Research Report Series 97-4. , 1997 .

[16]  Pao-Kuei Wu,et al.  MISSING RESPONSES AND IRT ABILITY ESTIMATION: OMITS, CHOICE, TIME LIMITS, AND ADAPTIVE TESTING , 1996 .

[17]  Michael J. Kolen Data Collection Designs and Linking Procedures , 2007 .

[18]  T. Sydorenko Item Writer Judgments of Item Difficulty Versus Actual Item Difficulty: A Case Study , 2011 .

[19]  Benjamin D. Wright,et al.  Solving measurement problems with the Rasch model. , 1977 .

[20]  Andreas Frey,et al.  Too hard, too easy, or just right? The relationship between effort or boredom and ability-difficulty fit , 2013 .

[21]  C. Glas,et al.  Elements of adaptive testing , 2010 .

[22]  David J. Weiss,et al.  Improving Measurement Quality and Efficiency with Adaptive Testing , 1982 .

[23]  Wim J. van der Linden,et al.  Capitalization on Item Calibration Error in Adaptive Testing , 1998 .

[24]  N. Verhelst,et al.  Loss of Information in Estimating Item Parameters in Incomplete Designs , 2006, Psychometrika.

[25]  Bridgid Finn Measuring Motivation in Low-Stakes Assessments. Research Report. ETS RR-15-19. , 2015 .