Acquiring Item Difficulty Estimates: a Collaborative Effort of Data and Judgment. Nominee for Best Paper Award

The evolution from static to dynamic electronic learning environments has stimulated the research on adaptive item sequencing. A prerequisite for adaptive item sequencing, in which the difficulty of the item is constantly matched to the knowledge level of the learner is to have items with a known difficulty level. The difficulty level can be estimated by means of the item response theory (IRT), as often done prior to computerized adaptive testing. However, the requirement of this calibration method is not easily met in many practical learning situations, for instance, due to the cost of prior calibration and due to continuous generation of new learning items. The aim of this paper is to search for alternative estimation methods and to review the accuracy of these methods as compared to IRT-based calibration. Using real data, six estimation methods are compared with IRT-based calibration: proportion correct, learner feedback, expert rating, paired comparison (learner), paired comparison (expert) and the Elo rating system. Results indicate that proportion correct has the strongest relation with IRT-based difficulty estimates, followed by learner feedback, the Elo rating system, expert rating and finally paired comparison.

[1]  F. Samejima Estimation of latent ability using a response pattern of graded scores , 1968 .

[2]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[3]  Donald E. Powers,et al.  The Relationship of Content Characteristics of GRE Analytical Reasoning Items to Their Difficulties and Discriminations , 1989 .

[4]  D. D. Bickerstaff,et al.  Computerized adaptive testing , 2015 .

[5]  R. Tsutakawa,et al.  The effect of uncertainty of item parameter estimation on ability estimates , 1990 .

[6]  L. Thurstone A law of comparative judgment. , 1994 .

[7]  N. D Verhelst,et al.  Handbook of Modern Item Response Theory , 1996 .

[8]  R. J. Mokken,et al.  Handbook of modern item response theory , 1997 .

[9]  Barbara S. Plake,et al.  Teachers' Ability to Estimate Item Difficulty: A Test of the Assumptions in the Angoff Standard Setting Method , 1998 .

[10]  Peter Brusilovsky,et al.  Adaptive and Intelligent Technologies for Web-based Eduction , 1999, Künstliche Intell..

[11]  Howard Wainer,et al.  Computerized Adaptive Testing: A Primer , 2000 .

[12]  George Fernandez,et al.  Cognitive Scaffolding for a Web-Based Adaptive Learning Environment , 2003, ICWL.

[13]  Hahn-Ming Lee,et al.  Personalized e-learning system using Item Response Theory , 2005, Comput. Educ..

[14]  Enrique Alfonseca,et al.  On the Dynamic Adaptation of Computer Assisted Assessment of Free-Text Answers , 2006, AH.

[15]  Seonghoon Kim A Comparative Study of IRT Fixed Parameter Calibration Methods. , 2006 .

[16]  Beverly Park Woolf,et al.  Estimating Student Proficiency Using an Item Response Theory Model , 2006, Intelligent Tutoring Systems.

[17]  Chih-Ming Chen,et al.  Personalized curriculum sequencing utilizing modified item response theory for web-based instruction , 2006, Expert Syst. Appl..

[18]  Qing Li,et al.  An Experimental Study of a Personalized Learning Environment Through Open-Source Software Tools , 2007, IEEE Transactions on Education.

[19]  Zongkai Yang,et al.  Research on Personalized E-Learning System Using Fuzzy Set Based Clustering Algorithm , 2007, International Conference on Computational Science.

[20]  Chih-Ming Chen,et al.  Personalized web-based tutoring system based on fuzzy item response theory , 2008, Expert Syst. Appl..

[21]  Hideki Toyoda,et al.  Item difficulty parameter estimation using the idea of the graded response model and computerized adaptive testing , 2009 .

[22]  Wim van den Noortgate,et al.  Adaptive item-based learning environments based on the item response theory: possibilities and challenges , 2010, J. Comput. Assist. Learn..

[23]  R. Likert “Technique for the Measurement of Attitudes, A” , 2022, The SAGE Encyclopedia of Research Design.