Going deeper: Automatic short-answer grading by combining student and question models

As various educational technologies have rapidly become more powerful and more prevalent, especially from the 2010s onward, the demand of automated grading natural language responses has become a major area of research. In this work, we leverage the classic student and domain/question models that are widely used in the field of intelligent tutoring systems to the task of automatic short-answer grading (ASAG). ASAG is the process of applying natural language processing techniques to assess student-authored short answers, and conventional ASAG systems often mainly focus upon student answers , referred as answer-based . In recent years, various deep learning models have gained great popularity in a wide range of domains. While classic machine learning methods have been widely employed to ASAG, as far as we know, deep learning models have not been applied to it probably because the lexical features from short answers provide limited information. In this work, we explore the effectiveness of a deep learning model, deep belief networks (DBN), to the task of ASAG. Overall, our results on a real-world corpus demonstrate that 1) leveraging student and question models to the conventional answer-based approach can greatly enhance the performance of ASAG, and 2) deep learning models such as DBN can be productively applied to the task of ASAG.

[1]  Johanna D. Moore,et al.  Beetle II: A System for Tutoring and Computational Linguistics Experimentation , 2010, ACL.

[2]  Daryl J. D'Souza Management of Teaching in a Complex Setting Steven Burrows , 2022 .

[3]  Arthur C. Graesser,et al.  AutoTutor: A simulation of a human tutor , 1999, Cognitive Systems Research.

[4]  Kurt VanLehn,et al.  The Behavior of Tutoring Systems , 2006, Int. J. Artif. Intell. Educ..

[5]  Tiffany Barnes,et al.  The Q-matrix Method: Mining Student Response Data for Knowledge , 2005 .

[6]  Kurt VanLehn,et al.  A Natural Language Tutorial Dialogue System for Physics , 2006, FLAIRS Conference.

[7]  Li Chen,et al.  Automatic Assessment of Students' Free-Text Answers with Support Vector Machines , 2010, IEA/AIE.

[8]  Geoffrey E. Hinton,et al.  Deep Belief Networks for phone recognition , 2009 .

[9]  Wenyao Xu,et al.  Multi-modal learning for video recommendation based on mobile application usage , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[10]  Chen Lin,et al.  LSTM for septic shock: Adding unreliable labels to reliable predictions , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[11]  Chen Lin,et al.  Incorporating Student Response Time and Tutor Instructional Interventions into Student Modeling , 2016, UMAP.

[12]  Rada Mihalcea,et al.  Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments , 2011, ACL.

[13]  Kurt VanLehn,et al.  Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical strategies , 2011, User Modeling and User-Adapted Interaction.

[14]  Carolyn Penstein Rosé,et al.  Tools for Authoring a Dialogue Agent that Participates in Learning Studies , 2007, AIED.

[15]  Chen Lin,et al.  A Comparisons of BKT, RNN and LSTM for Learning Gain Prediction , 2017, AIED.

[16]  O. Mason,et al.  Automated free text marking with Paperless School , 2002 .

[17]  Martin Chodorow,et al.  C-rater: Automated Scoring of Short-Answer Questions , 2003, Comput. Humanit..

[18]  Jill Burstein,et al.  AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .

[19]  Kurt VanLehn,et al.  Developing pedagogically effective tutorial dialogue tactics: experiments and a testbed , 2007, SLaTE.

[20]  Nitin Madnani,et al.  Automated Scoring of a Summary-Writing Task Designed to Measure Reading Comprehension , 2013, BEA@NAACL-HLT.

[21]  Min Chi,et al.  ATTAIN: Attention-based Time-Aware LSTM Networks for Disease Progression Modeling , 2019, IJCAI.

[22]  Claudia Leacock,et al.  Automated evaluation of essays and short answers , 2001 .

[23]  Alicia Troncoso Lora,et al.  Content-based methods in peer assessment of open-response questions to grade students as authors and as graders , 2017, Knowl. Based Syst..

[24]  Niels Ole Bernsen,et al.  Designing interactive speech systems - from first ideas to user testing , 1998 .

[25]  Thorsten Joachims,et al.  Methods for ordinal peer grading , 2014, KDD.

[26]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[27]  Angelo Kyrilov,et al.  Automated assessment of short free-text responses in computer science using latent semantic analysis , 2011, ITiCSE '11.

[28]  Arthur C. Graesser,et al.  Using LSA in AutoTutor: Learning through mixed-initiative dialogue in natural language. , 2007 .

[29]  Xiaowei Jia,et al.  Incremental Dual-memory LSTM in Land Cover Prediction , 2017, KDD.

[30]  Chen Lin,et al.  Intervention-BKT: Incorporating Instructional Interventions into Bayesian Knowledge Tracing , 2016, ITS.

[31]  Dezsö Sima,et al.  Intelligent short text assessment in eMax , 2007, AFRICON 2007.

[32]  Hao Wu,et al.  Building an Evaluation Scale using Item Response Theory , 2016, EMNLP.

[33]  Jeffrey D. Karpicke,et al.  The Critical Importance of Retrieval for Learning , 2008, Science.

[34]  Walt Detmar Meurers,et al.  Evaluating Answers to Reading Comprehension Questions in Context: Results for German and the Role of Information Structure , 2011, TextInfer@EMNLP.

[35]  Aidong Zhang,et al.  Improving EEG feature learning via synchronized facial video , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[36]  Frank B. Baker,et al.  Item Response Theory : Parameter Estimation Techniques, Second Edition , 2004 .

[37]  Paulo Oliveira,et al.  A system for formative assessment and monitoring of students' progress , 2014, Comput. Educ..

[38]  John R. Anderson,et al.  Knowledge tracing: Modeling the acquisition of procedural knowledge , 2005, User Modeling and User-Adapted Interaction.

[39]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[40]  Daniel Marcu,et al.  Evaluating Multiple Aspects of Coherence in Student Essays , 2004, NAACL.

[41]  Sri Suning Kusumawardani,et al.  A review of an information extraction technique approach for automatic short answer grading , 2016, 2016 1st International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE).

[42]  Min Chi,et al.  Temporal Belief Memory: Imputing Missing Data during RNN Training , 2018, IJCAI.

[43]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[44]  David Maxwell Chickering,et al.  Here or There , 2008, ECIR.

[45]  Honglak Lee,et al.  Learning hierarchical representations for face verification with convolutional deep belief networks , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  X. An,et al.  Item Response Theory: What It Is and How You Can Use the IRT Procedure to Apply It , 2014 .

[47]  Rada Mihalcea,et al.  Text-to-Text Semantic Similarity for Automatic Short Answer Grading , 2009, EACL.

[48]  Tom Mitchell,et al.  Towards robust computerised marking of free-text responses , 2002 .

[49]  Antonija Mitrovic,et al.  Towards a negotiable student model for constraint-based ITSs , 2009 .

[50]  Richard C. Anderson,et al.  On asking people questions about what they are reading , 1975 .

[51]  Amparo Alonso-Betanzos,et al.  A factorization approach to evaluate open-response assignments in MOOCs using preference learning on peer assessments , 2015, Knowl. Based Syst..

[52]  Stephen G. Pulman,et al.  Automatic Short Answer Marking , 2005, ACL 2005.

[53]  Benno Stein,et al.  The Eras and Trends of Automatic Short Answer Grading , 2015, International Journal of Artificial Intelligence in Education.

[54]  K. Tatsuoka RULE SPACE: AN APPROACH FOR DEALING WITH MISCONCEPTIONS BASED ON ITEM RESPONSE THEORY , 1983 .

[55]  Diane J. Litman,et al.  ITSPOKE: An Intelligent Tutoring Spoken Dialogue System , 2004, NAACL.

[56]  Johanna D. Moore,et al.  Improving interpretation robustness in a tutorial dialogue system , 2013, BEA@NAACL-HLT.

[57]  Carlo Strapparava,et al.  About the effects of combining Latent Semantic Analysis with natural language processing techniques for free-text assessment , 2005 .

[58]  Thorsten Joachims,et al.  Bayesian Ordinal Peer Grading , 2015, L@S.

[59]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[60]  Chen Lin,et al.  Early Diagnosis and Prediction of Sepsis Shock by Combining Static and Dynamic Information Using Convolutional-LSTM , 2018, 2018 IEEE International Conference on Healthcare Informatics (ICHI).

[61]  Sumit Basu,et al.  Powergrading: a Clustering Approach to Amplify Human Effort for Short Answer Grading , 2013, TACL.

[62]  Aidong Zhang,et al.  A Novel Semi-Supervised Deep Learning Framework for Affective State Recognition on EEG Signals , 2014, 2014 IEEE International Conference on Bioinformatics and Bioengineering.

[63]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[64]  Ismael Pascual-Nieto,et al.  Computer-assisted assessment of free-text answers , 2009, The Knowledge Engineering Review.

[65]  Mark D. Reckase,et al.  Item Response Theory: Parameter Estimation Techniques , 1998 .