论文信息 - Finding a Tradeoff between Accuracy and Rater's Workload in Grading Clustered Short Answers

Finding a Tradeoff between Accuracy and Rater's Workload in Grading Clustered Short Answers

n this paper we investigate the potential of answer clustering for semi-automatic scoring of short answer questions for German as a foreign language. We use surface features like word and character n-grams to cluster answers to listening comprehension exercises per question and simulate having human graders only label one answer per cluster and then propagating this label to all other members of the cluster. We investigate various ways to select this single item to be labeled and find that choosing the item closest to the centroid of a cluster leads to improved (simulated) grading accuracy over random item selection. Averaged over all questions, we can reduce a teachers workload to labeling only 40% of all different answers for a question, while still maintaining a grading accuracy of more than 85%.

Magdalena Wolska | Andrea Horbach | Alexis Palmer

[1] Helmut Schmidt,et al. Probabilistic part-of-speech tagging using decision trees , 1994 .

[2] Julio Gonzalo,et al. A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2009, Information Retrieval.

[3] George Karypis,et al. A Comparison of Document Clustering Techniques , 2000 .

[4] Walt Detmar Meurers,et al. Evaluating Answers to Reading Comprehension Questions in Context: Results for German and the Role of Information Structure , 2011, TextInfer@EMNLP.

[5] Manfred Pinkal,et al. Using the text to evaluate short answers for reading comprehension exercises , 2013, *SEMEVAL.

[6] Rada Mihalcea,et al. Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments , 2011, ACL.

[7] Martin Chodorow,et al. C-rater: Automated Scoring of Short-Answer Questions , 2003, Comput. Humanit..

[8] Walt Detmar Meurers,et al. Short Answer Assessment: Establishing Links Between Research Strands , 2012, BEA@NAACL-HLT.

[9] Burr Settles,et al. Active Learning Literature Survey , 2009 .

[10] Sumit Basu,et al. Powergrading: a Clustering Approach to Amplify Human Effort for Short Answer Grading , 2013, TACL.

[11] Stephen G. Pulman,et al. Automatic Short Answer Marking , 2005, ACL 2005.