A comparative study of methods for a priori prediction of MCQ difficulty

Successful exams require a balance of easy, medium, and difficult questions. Question difficulty is generally either estimated by an expert or determined after an exam is taken. The latter provides no utility for the generation of new questions and the former is expensive both in terms of time and cost. Additionally, it is not known whether expert prediction is indeed a good proxy for estimating question difficulty. In this paper, we analyse and compare two ontology-based measures for difficulty prediction of multiple choice questions, as well as comparing each measure with expert prediction (by 15 experts) against the exam performance of 12 residents over a corpus of 231 medical case-based questions that are in multiple choice format. We find one ontology-based measure (relation strength indicativeness) to be of comparable performance (accuracy = 47%) to expert prediction (average accuracy = 49%).

[1]  Marija Cubric,et al.  Towards automatic generation of e-assessment using semantic web technologies , 2011 .

[2]  Fong-Lok Lee,et al.  Problem Complexity: A Measure of Problem Difficulty in Algebra by Using Computer , 2000 .

[3]  Sean Bechhofer,et al.  The OWL API: A Java API for OWL ontologies , 2011, Semantic Web.

[4]  Laszlo Kovacs,et al.  Complexity-based generation of multi-choice tests in AQG systems , 2013, 2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom).

[5]  M. Rodríguez-Díez,et al.  Technical flaws in multiple-choice questions in the access exam to medical specialties (“examen MIR”) in Spain (2009–2013) , 2016, BMC medical education.

[6]  Sandra Williams,et al.  Generating Mathematical Word Problems , 2011, AAAI Fall Symposium: Question Generation.

[7]  David B. Swanson,et al.  Extended‐matching items: A practical alternative to free‐response questions , 1993 .

[8]  G. Dinant,et al.  Do short cases elicit different thinking processes than factual knowledge questions do? , 2001, Medical education.

[9]  Bruce A. Bracken,et al.  Psychometric Characteristics of Assessment Procedures , 2003 .

[10]  Y. Sarin,et al.  Item analysis of published MCQs. , 1998, Indian pediatrics.

[11]  P. Sreenivasa Kumar,et al.  A novel approach to generate MCQs from domain ontology , 2015 .

[12]  afzal. naveed Automatic Generation of Multiple Choice Questions using Surface-based Semantic Relations , 2015 .

[13]  Jim Euchner Design , 2014, Catalysis from A to Z.

[14]  Pradeep Kumar,et al.  Item and Test Analysis to Identify Quality Multiple Choice Questions (MCQs) from an Assessment of Medical Students of Ahmedabad, Gujarat , 2014, Indian journal of community medicine : official publication of Indian Association of Preventive & Social Medicine.

[15]  Mozaffer Rahim Hingorjo,et al.  Analysis of one-best MCQs: the difficulty index, discrimination index and distractor efficiency. , 2012, JPMA. The Journal of the Pakistan Medical Association.

[16]  J C Masters,et al.  Assessment of multiple-choice questions in selected test banks accompanying text books used in nursing education. , 2001, The Journal of nursing education.

[17]  Bijan Parsia,et al.  A Systematic Review of Automatic Question Generation for Educational Purposes , 2019, International Journal of Artificial Intelligence in Education.

[18]  Dorit Hutzler,et al.  Learning Methods for Rating the Difficulty of Reading Comprehension Questions , 2014, 2014 IEEE International Conference on Software Science, Technology and Engineering.

[19]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[20]  V VinuE.,et al.  A novel approach to generate MCQs from domain ontology: Considering DL semantics and open-world assumption , 2015, J. Web Semant..

[21]  L. Crocker,et al.  Introduction to Classical and Modern Test Theory , 1986 .

[22]  Danna Zhou,et al.  d. , 1840, Microbial pathogenesis.

[23]  Shilpi Banerjee,et al.  Rubrics for assessment item difficulty in engineering courses , 2015, 2015 IEEE Frontiers in Education Conference (FIE).

[24]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[25]  P. Sreenivasa Kumar,et al.  Difficulty-level Modeling of Ontology-based Factual Questions , 2017, Semantic Web.

[26]  Do item-writing flaws reduce examinations psychometric quality? , 2016, BMC Research Notes.

[27]  Sigrid Harendza,et al.  Pattern recognition as a concept for multiple-choice questions in a national licensing exam , 2014, BMC medical education.

[28]  Mari Botti,et al.  Design, format, validity and reliability of multiple choice questions for use in nursing research and education. , 2005, Collegian.

[29]  Janine van der Rijt,et al.  Teachers’ and students’ perceptions of assessments: A review and a study into the ability and accuracy of estimating the difficulty levels of assessment items , 2006 .

[30]  M. Tarrant,et al.  An assessment of functioning and non-functioning distractors in multiple-choice questions: a descriptive analysis , 2009, BMC medical education.

[31]  Andreas Papasalouros,et al.  Automatic Generation Of Multiple Choice Questions From Domain Ontologies , 2008, e-Learning.

[32]  B. Bloom Taxonomy of educational objectives , 1956 .

[33]  R. Tiemann,et al.  Assessing scientific reasoning: a comprehensive evaluation of item features that affect item difficulty , 2016, Assessing Student Learning Outcomes in Higher Education.

[34]  Sujan Kumar Saha,et al.  Automatic Generation of Multiple Choice Questions Using Wikipedia , 2013, PReMI.

[35]  Noah A. Smith,et al.  Automatic factual question generation from text , 2011 .

[36]  Benjamin S. Bloom,et al.  Taxonomy of Educational Objectives: The Classification of Educational Goals. , 1957 .

[37]  B. Rush,et al.  The impact of item-writing flaws and item complexity on examination item difficulty and discrimination value , 2016, BMC Medical Education.

[38]  Ming Liu,et al.  Using Wikipedia and Conceptual Graph Structures to Generate Questions for Academic Writing Support , 2012, IEEE Transactions on Learning Technologies.

[39]  P. Sreenivasa Kumar,et al.  Automated Generation of Assessment Tests from Domain Ontologies , 2015 .

[40]  J. Collins Education techniques for lifelong learning: writing multiple-choice questions for continuing medical education activities and self-assessment modules. , 2006, Radiographics : a review publication of the Radiological Society of North America, Inc.

[41]  C. Zimitat,et al.  Peer review improves the quality of MCQ examinations , 2012 .

[42]  Bijan Parsia,et al.  Generating Multiple Choice Questions From Ontologies: Lessons Learnt , 2014, OWLED.

[43]  G. Kurdi,et al.  Ontology-Based Generation of Medical, Multi-term MCQs , 2019, International Journal of Artificial Intelligence in Education.

[44]  R. F. Boldt GRE ANALYTICAL REASONING ITEM STATISTICS PREDICTION STUDY , 1998 .

[45]  Deborah Laughlin,et al.  Assessment of Item-Writing Flaws in Multiple-Choice Questions , 2013, Journal for nurses in professional development.

[46]  Belal M H Hijji Flaws of Multiple Choice Questions in Teacher-Constructed Nursing Examinations: A Pilot Descriptive Study. , 2017, The Journal of nursing education.

[47]  Maha Al-Yahya,et al.  Ontology-Based Multiple Choice Question Generation , 2014, TheScientificWorldJournal.

[48]  S. Downing Threats to the Validity of Locally Developed Multiple-Choice Tests in Medical Education: Construct-Irrelevant Variance and Construct Underrepresentation , 2002, Advances in health sciences education : theory and practice.

[49]  G. Norman,et al.  Applying learning taxonomies to test items: is a fact an artifact? , 1996, Academic medicine : journal of the Association of American Medical Colleges.

[50]  V. Mešić,et al.  Identifying Predictors of Physics Item Difficulty: A Linear Regression Approach. , 2011 .

[51]  Sushil Dev Rout,et al.  Assessment of Functional and Nonfunctional Distracter in an Item Analysis , 2016 .

[52]  Véronique Malaisé,et al.  Lifting EMMeT to OWL Getting the Most from SKOS , 2015, OWLED.

[53]  T. Vik,et al.  Quality assurance of item writing: During the introduction of multiple choice questions in medicine for high stakes examinations , 2009, Medical teacher.

[54]  Rebecca Grayson,et al.  Modelling question difficulty in an A level physics examination , 2013 .

[55]  Jonathan D Kibble,et al.  Are faculty predictions or item taxonomies useful for estimating the outcome of multiple-choice examinations? , 2011, Advances in physiology education.

[56]  K. Bhowmick,et al.  Evaluation of MCQs for Judgement of Higher Levels of Cognitive Learning , 2010 .

[57]  S G Clyman,et al.  An investigation of the difficulty of computer‐based case simulations , 1998, Medical education.