Automated Short Answer Grading: A Simple Solution for a Difficult Task

English. The task of short answer grading is aimed at assessing the outcome of an exam by automatically analysing students’ answers in natural language and deciding whether they should pass or fail the exam. In this paper, we tackle this task training an SVM classifier on real data taken from a University statistics exam, showing that simple concatenated sentence embeddings used as features yield results around 0.90 F1, and that adding more complex distance-based features lead only to a slight improvement. We also release the dataset, that to our knowledge is the first freely available dataset of this kind in Italian.1

[1]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[2]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[3]  Rada Mihalcea,et al.  Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments , 2011, ACL.

[4]  R. Siddiqi,et al.  A systematic approach to the automated marking of short-answer questions , 2008, 2008 IEEE International Multitopic Conference.

[5]  Martin Chodorow,et al.  C-rater: Automated Scoring of Short-Answer Questions , 2003, Comput. Humanit..

[6]  Benno Stein,et al.  The Eras and Trends of Automatic Short Answer Grading , 2015, International Journal of Artificial Intelligence in Education.

[7]  Ellen Francine Barbosa,et al.  A Systematic Literature Review of Assessment Tools for Programming Assignments , 2016, 2016 IEEE 29th International Conference on Software Engineering Education and Training (CSEET).

[8]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[9]  William Chan,et al.  InferLite: Simple Universal Sentence Representations from Natural Language Inference Data , 2018, EMNLP.

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Tamara Sumner,et al.  Fast and Easy Short Answer Grading with High Accuracy , 2016, NAACL.

[12]  Surendra Prasad,et al.  Automatic Evaluation of Students’ Answers using Syntactically Enhanced LSA , 2003, HLT-NAACL 2003.

[13]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[14]  Nitin Madnani,et al.  Automated Scoring of a Summary-Writing Task Designed to Measure Reading Comprehension , 2013, BEA@NAACL-HLT.

[15]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[16]  Detmar Meurers,et al.  Integrating parallel analysis modules to evaluate the meaning of answers to reading comprehension questions , 2011 .

[17]  Sally E. Jordan,et al.  e-Assessment for learning? The potential of short-answer free-text questions with tailored feedback , 2009, Br. J. Educ. Technol..

[18]  Shourya Roy,et al.  Earth Mover's Distance Pooling over Siamese LSTMs for Automatic Short Answer Grading , 2017, IJCAI.

[19]  Walt Detmar Meurers Diagnosing Meaning Errors in Short Answers to Reading Comprehension Questions , 2008 .

[20]  Angelo Kyrilov,et al.  Automated assessment of short free-text responses in computer science using latent semantic analysis , 2011, ITiCSE '11.

[21]  Tom Mitchell,et al.  Towards robust computerised marking of free-text responses , 2002 .

[22]  Pierpaolo Vittorini,et al.  The Automated Grading of R Code Snippets: Preliminary Results in a Course of Health Informatics , 2019, MIS4TEL.

[23]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[24]  C. Gütl e-Examiner: Towards a Fully-Automatic Knowledge Assessment Tool applicable in Adaptive E-Learning Systems , 2007 .

[25]  Klaus Zechner,et al.  Automated Essay Scoring: Writing Assessment and Instruction , 2010 .

[26]  Giovanni Moretti,et al.  Tint 2.0: an All-inclusive Suite for NLP in Italian , 2018, CLiC-it.

[27]  Kinshuk,et al.  Auto-Assessor: Computerized Assessment System for Marking Student's Short-Answers Automatically , 2011, 2011 IEEE International Conference on Technology for Education.

[28]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[29]  William Wresch,et al.  The Imminence of Grading Essays by Computer-25 Years Later , 1993 .

[30]  Helen Yannakoudakis,et al.  A New Dataset and Method for Automatically Grading ESOL Texts , 2011, ACL.

[31]  Nitin Madnani,et al.  Effective Feature Integration for Automated Short Answer Scoring , 2015, NAACL.