Introducing a Framework to Assess Newly Created Questions with Natural Language Processing

Statistical models such as those derived from Item Response Theory (IRT) enable the assessment of students on a specific subject, which can be useful for several purposes (e.g., learning path customization, drop-out prediction). However, the questions have to be assessed as well and, although it is possible to estimate with IRT the characteristics of questions that have already been answered by several students, this technique cannot be used on newly generated questions. In this paper, we propose a framework to train and evaluate models for estimating the difficulty and discrimination of newly created Multiple Choice Questions by extracting meaningful features from the text of the question and of the possible choices. We implement one model using this framework and test it on a real-world dataset provided by CloudAcademy, showing that it outperforms previously proposed models, reducing by 6.7% the RMSE for difficulty estimation and by 10.8% the RMSE for discrimination estimation. We also present the results of an ablation study performed to support our features choice and to show the effects of different characteristics of the questions’ text on difficulty and discrimination.

[1]  James O. Berger,et al.  Bayesian analysis of dynamic item response models in educational testing , 2013, 1304.4441.

[2]  W. Kintsch,et al.  Reading comprehension and readability in educational practice and psychological theory , 1979 .

[3]  Xian Wu,et al.  Question Difficulty Prediction for Multiple Choice Problems in Medical Exams , 2019, CIKM.

[4]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[5]  Mete Akcaoglu,et al.  Teaching systems thinking through game design , 2018, Educational Technology Research and Development.

[6]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[7]  Dit-Yan Yeung,et al.  Knowledge Query Network for Knowledge Tracing: How Knowledge Interacts with Skills , 2019, LAK.

[8]  Penghe Chen,et al.  Prerequisite-Driven Deep Knowledge Tracing , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[9]  Samir Bennani,et al.  Learner modelling: systematic review of the literature from the last 5 years , 2019, Educational Technology Research and Development.

[10]  G. Harry McLaughlin,et al.  SMOG Grading - A New Readability Formula. , 1969 .

[11]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[12]  Qing Wang,et al.  Knowledge Tracing with Sequential Key-Value Memory Networks , 2019, SIGIR.

[13]  Roberto Turrin,et al.  R2DE: a NLP approach to estimating IRT parameters of newly generated questions , 2020, LAK.

[14]  Eric C. Larson,et al.  Why Deep Knowledge Tracing has less Depth than Anticipated , 2019, EDM.

[15]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[16]  E A Smith,et al.  Automated readability index. , 1967, AMRL-TR. Aerospace Medical Research Laboratories.

[17]  David Hatfield,et al.  Toward a Scalable Learning Analytics Solution , 2019, AIED.

[18]  Leonidas J. Guibas,et al.  Deep Knowledge Tracing , 2015, NIPS.

[19]  Richard C. Atkinson,et al.  Ingredients for a theory of instruction. , 1972 .

[20]  William H. DuBay The Principles of Readability. , 2004 .

[21]  Min Chi,et al.  Deep Learning vs. Bayesian Knowledge Tracing: Student Models for Interventions , 2018 .

[22]  Enhong Chen,et al.  Question Difficulty Prediction for READING Problems in Standard Tests , 2017, AAAI.

[23]  Dit-Yan Yeung,et al.  Addressing two problems in deep knowledge tracing via prediction-consistent regularization , 2018, L@S.

[24]  Hui Xiong,et al.  EKT: Exercise-Aware Knowledge Tracing for Student Performance Prediction , 2019, IEEE Transactions on Knowledge and Data Engineering.

[25]  Neil T. Heffernan,et al.  Incorporating Rich Features into Deep Knowledge Tracing , 2017, L@S.

[26]  Le An Ha,et al.  Predicting the Difficulty of Multiple Choice Questions in a High-stakes Medical Exam , 2019, BEA@ACL.

[27]  Chaitanya Ekanadham,et al.  Back to the basics: Bayesian extensions of IRT outperform neural networks for proficiency estimation , 2016, EDM.

[28]  R. Gunning The Technique of Clear Writing. , 1968 .

[29]  Dit-Yan Yeung,et al.  Dynamic Key-Value Memory Networks for Knowledge Tracing , 2016, WWW.

[30]  R. Hambleton,et al.  Fundamentals of Item Response Theory , 1991 .

[31]  Chun-Kit Yeung,et al.  Deep-IRT: Make Deep Learning Based Knowledge Tracing Explainable Using Item Response Theory , 2019, EDM.

[32]  Jing Liu,et al.  Question Difficulty Estimation in Community Question Answering Services , 2013, EMNLP.