Creating Scoring Rubric from Representative Student Answers for Improved Short Answer Grading

Automatic short answer grading remains one of the key challenges of any dialog-based tutoring system due to the variability in the student answers. Typically, each question may have no or few expert authored exemplary answers which make it difficult to (1) generalize to all correct ways of answering the question, or (2) represent answers which are either partially correct or incorrect. In this paper, we propose an affinity propagation based clustering technique to obtain class-specific representative answers from the graded student answers. Our novelty lies in formulating the Scoring Rubric by incorporating class-specific representatives obtained after proposed clustering, selecting, and ranking of graded student answers. We experiment with baseline as well as stateof-the-art sentence-embedding based features to demonstrate the feature-agnostic utility of class-specific representative answers. Experimental evaluations on our large-scale industry dataset and a benchmarking dataset show that the Scoring Rubric significantly improves the classification performance of short answer grading.

[1]  Chris Brew,et al.  SemEval-2013 Task 7: The Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge , 2013, *SEMEVAL.

[2]  Tamara Sumner,et al.  Bayesian Supervised Domain Adaptation for Short Text Similarity , 2016, NAACL.

[3]  Shourya Roy,et al.  Earth Mover's Distance Pooling over Siamese LSTMs for Automatic Short Answer Grading , 2017, IJCAI.

[4]  Peter W. Foltz,et al.  Generating Reference Texts for Short Answer Scoring Using Graph-based Summarization , 2015, BEA@NAACL-HLT.

[5]  Sumit Basu,et al.  Divide and correct: using clusters to grade short answers at scale , 2014, L@S.

[6]  William B. Frakes,et al.  Stemming Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[7]  Jonas Mueller,et al.  Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.

[8]  Rada Mihalcea,et al.  Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments , 2011, ACL.

[9]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[10]  Yuan Zhang,et al.  Deep Learning + Student Modeling + Clustering: a Recipe for Effective Automatic Short Answer Grading , 2016, EDM.

[11]  Chris Brew,et al.  Towards Effective Tutorial Feedback for Explanation Questions: A Dataset and Baselines , 2012, HLT-NAACL.

[12]  Torsten Zesch,et al.  Investigating neural architectures for short answer scoring , 2017, BEA@EMNLP.

[13]  Rada Mihalcea,et al.  Text-to-Text Semantic Similarity for Automatic Short Answer Grading , 2009, EACL.

[14]  Wael Hassan Gomaa,et al.  Short Answer Grading Using String Similarity And Corpus-Based Similarity , 2012 .

[15]  Tom Mitchell,et al.  Computerised Marking of Short-Answer Free-Text Responses , 2003 .

[16]  Yue Zhang,et al.  Fast and Accurate Shift-Reduce Constituent Parsing , 2013, ACL.

[17]  Johanna D. Moore,et al.  BEETLE II: Deep Natural Language Understanding and Automatic Feedback Generation for Intelligent Tutoring in Basic Electricity and Electronics , 2014, International Journal of Artificial Intelligence in Education.

[18]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[19]  Svetlana Stoyanchev,et al.  Automating Model Building in c-rater , 2009, TextInfer@ACL.

[20]  Bikram Sengupta,et al.  Sentence Level or Token Level Features for Automatic Short Answer Grading?: Use Both , 2018, AIED.

[21]  John Blackmore,et al.  Proceedings of the Twenty-Second International FLAIRS Conference (2009) c-rater:Automatic Content Scoring for Short Constructed Responses , 2022 .

[22]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[23]  Rebecca J. Passonneau,et al.  Wise Crowd Content Assessment and Educational Rubrics , 2016, International Journal of Artificial Intelligence in Education.

[24]  Peter W. Foltz,et al.  Identifying Patterns For Short Answer Scoring Using Graph-based Lexico-Semantic Text Matching , 2015, BEA@NAACL-HLT.

[25]  Alexander F. Gelbukh,et al.  SOFTCARDINALITY: Hierarchical Text Overlap for Student Response Analysis , 2013, *SEMEVAL.

[26]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[27]  Bikram Sengupta,et al.  Balancing Human Efforts and Performance of Student Response Analyzer in Dialog-Based Tutors , 2018, AIED.

[28]  Nitin Madnani,et al.  ETS: Domain Adaptation and Stacking for Short Answer Scoring , 2013, *SEMEVAL.

[29]  David Vandyke,et al.  Counter-fitting Word Vectors to Linguistic Constraints , 2016, NAACL.

[30]  Richard G. Baraniuk,et al.  Mathematical Language Processing: Automatic Grading and Feedback for Open Response Mathematical Questions , 2015, L@S.

[31]  Hwee Tou Ng,et al.  A Neural Approach to Automated Essay Scoring , 2016, EMNLP.

[32]  Heather H. Mitchell,et al.  AutoTutor: A tutor with dialogue in natural language , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[33]  Tamara Sumner,et al.  Fast and Easy Short Answer Grading with High Accuracy , 2016, NAACL.

[34]  Sumit Basu,et al.  Powergrading: a Clustering Approach to Amplify Human Effort for Short Answer Grading , 2013, TACL.

[35]  Walt Detmar Meurers,et al.  CoMeT: Integrating different levels of linguistic modeling for meaning assessment , 2013, *SEMEVAL.