Automated Grading of Short Text Answers: Preliminary Results in a Course of Health Informatics

Students learning Health Informatics in the degree course of Medicine and Surgery of the University of L’Aquila (Italy) are required – to pass the exam – to submit solutions to assignments concerning the execution and interpretation of statistical analyses. The paper presents a tool for the automated grading of such a kind of solutions, where the statistical analyses are made up R commands and outputs, and the interpretations are short text answers. The tool performs a static analysis of the R commands with the respective output, and uses Natural Language Processing techniques for the short text answers. The paper summarises the solution regarding the R commands and output, and delves into the method and the results used for the automated classification of the short text answers. In particular, we show that through FastText sentence embeddings and a tuned Support Vector Machines classifier, we obtained an accuracy of 0.89, Cohen’s K = 0.76, and F1 score of 0.91 on a binary classification task (i.e. pass or fail). Other experiments including additional linguistically-motivated features, whose goal was to capture lexical differences between the students’ answer and the gold standard sentence, did not yield any significant improvement. The paper ends with a discussion of the findings and the next steps to be taken in our research.

[1]  Nan Hua,et al.  Universal Sentence Encoder for English , 2018, EMNLP.

[2]  Wael Hassan Gomaa,et al.  A Survey of Text Similarity Approaches , 2013 .

[3]  Aristide Saggino,et al.  On the Design and Development of an Assessment System with Adaptive Capabilities , 2018, MIS4TEL.

[4]  Giovanni Moretti,et al.  Tint 2.0: an All-inclusive Suite for NLP in Italian , 2018, CLiC-it.

[5]  Ellen Francine Barbosa,et al.  A Systematic Literature Review of Assessment Tools for Programming Assignments , 2016, 2016 IEEE 29th International Conference on Software Engineering Education and Training (CSEET).

[6]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[7]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[9]  Rada Mihalcea,et al.  Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments , 2011, ACL.

[10]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[11]  Pierpaolo Vittorini,et al.  The Automated Grading of R Code Snippets: Preliminary Results in a Course of Health Informatics , 2019, MIS4TEL.

[12]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[13]  M. James,et al.  Assessment and Learning: differences and relationships between formative and summative assessment , 1997 .

[14]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[15]  D. Cicchetti Guidelines, Criteria, and Rules of Thumb for Evaluating Normed and Standardized Assessment Instruments in Psychology. , 1994 .

[16]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics, ProbabilityTheory Group (Formerly: E1071), TU Wien , 2015 .

[17]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[18]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[19]  William Chan,et al.  InferLite: Simple Universal Sentence Representations from Natural Language Inference Data , 2018, EMNLP.

[20]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[21]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[22]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[23]  Benno Stein,et al.  The Eras and Trends of Automatic Short Answer Grading , 2015, International Journal of Artificial Intelligence in Education.