Can automated item generation be used to develop high quality MCQs that assess application of knowledge?

The purpose of this study was to compare the quality of multiple choice questions (MCQs) developed using automated item generation (AIG) versus traditional methods, as judged by a panel of experts. The quality of MCQs developed using two methods (i.e., AIG or traditional) was evaluated by a panel of content experts in a blinded study. Participants rated a total of 102 MCQs using six quality metrics and made a judgment regarding whether or not each item tested recall or application of knowledge. A Wilcoxon two-sample test evaluated differences in each of the six quality metrics rating scales as well as an overall cognitive domain judgment. No significant differences were found in terms of item quality or cognitive domain assessed when comparing the two item development methods. The vast majority of items (> 90%) developed using both methods were deemed to be assessing higher-order skills. When compared to traditionally developed items, MCQs developed using AIG demonstrated comparable quality. Both modalities can produce items that assess higher-order cognitive skills.

[1]  Mark Gierl,et al.  Three Modeling Applications to Promote Automatic Item Generation for Examinations in Dentistry. , 2016, Journal of dental education.

[2]  Susan Case,et al.  The Quality of In‐house Medical School Examinations , 2002, Academic medicine : journal of the Association of American Medical Colleges.

[3]  L. Gruppen,et al.  Pushing Critical Thinking Skills With Multiple-Choice Questions: Does Bloom's Taxonomy Work? , 2017, Academic medicine : journal of the Association of American Medical Colleges.

[4]  R. Schwartzstein,et al.  Frame-of-Reference Training: Establishing Reliable Assessment of Teaching Effectiveness , 2016, The Journal of continuing education in the health professions.

[5]  T. Maguire,et al.  Strategy choices in multiple‐choice items , 1994, Academic medicine : journal of the Association of American Medical Colleges.

[6]  J. Frank,et al.  Core principles of assessment in competency-based medical education , 2017, Medical teacher.

[7]  M. Severo,et al.  Investigating the existence of social networks in cheating behaviors in medical students , 2018, BMC Medical Education.

[8]  S. Durning,et al.  Exploring examinee behaviours as validity evidence for multiple‐choice question examinations , 2017, Medical education.

[9]  Mark J. Gierl,et al.  Evaluating the quality of medical multiple‐choice items created with automated processes , 2013, Medical education.

[10]  Mark J. Gierl,et al.  Using automatic item generation to create multiple‐choice test items , 2012, Medical education.

[11]  E. Palmer,et al.  Assessment of higher order cognitive skills in undergraduate education: modified essay or multiple choice questions? Research paper , 2007, BMC medical education.

[12]  D. M. Elnicki,et al.  Exploring Clinical Reasoning Strategies and Test-Taking Behaviors During Clinical Vignette Style Multiple-Choice Examinations: A Mixed Methods Study. , 2014, Journal of graduate medical education.

[13]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[14]  Mark J. Gierl,et al.  Using Automatic Item Generation to Improve the Quality of MCQ Distractors , 2016, Teaching and learning in medicine.

[15]  Mark J. Gierl,et al.  Using cognitive models to develop quality multiple-choice questions , 2016, Medical teacher.

[16]  C. Touchie,et al.  Plus ça change, plus c’est pareil: Making a continued case for the use of MCQs in medical education , 2018, Medical teacher.

[17]  Henry Mandin,et al.  The impact of two multiple-choice question formats on the problem-solving strategies used by novices and experts , 2004, BMC medical education.

[18]  Mark J. Gierl,et al.  Evaluating the Psychometric Characteristics of Generated Multiple-Choice Test Items , 2016 .

[19]  Darren George,et al.  SPSS for Windows Step by Step: A Simple Guide and Reference , 1998 .

[20]  Michael C. Rodriguez,et al.  A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment , 2002 .

[21]  M. Albanese,et al.  Progress testing: critical analysis and suggested practices , 2016, Advances in health sciences education : theory and practice.

[22]  James Jaccard,et al.  Statistics for the Behavioral Sciences , 1983 .