Using and Improving Coding Guides for and by Automatic Coding of PISA Short Text Responses

We propose and empirically evaluate a theoretical framework of how to use coding guides for automatic coding (scoring) and how, in turn, automatic coding can enhance the use of coding guides. We adopted a recently described baseline approach to automatically classify responses. Well-established coding guides from PISA, comprising reference responses, and its German sample from 2012 were used for evaluation. Ten items with 41,990 responses at total were analyzed. Results showed that (1) responses close to the cluster centroid constitute prototypes, (2) automatic coding can improve coding guides, while (3) the proposed procedure leads to unreliable accuracy for small numbers of clusters but promising agreement to human coding for higher numbers. Further analyses are still to be done to find the optimal balance of the implied coding effort and model accuracy.

[1]  Varun Aggarwal,et al.  A system to grade computer programming skills using machine learning , 2014, KDD.

[2]  Shumin Jing Automatic Grading of Short Answers for MOOC via Semi-supervised Document Clustering , 2015, EDM.

[3]  Nicholas Dronen,et al.  Effective Sampling for Large-scale Automated Writing Evaluation Systems , 2014, L@S.

[4]  Joelle Pineau,et al.  Bootstrapping Dialog Systems with Word Embeddings , 2014 .

[5]  Issac I. Bejar Rater Cognition: Implications for Validity. , 2012 .

[6]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[7]  Elena L. Glassman,et al.  Feature engineering for clustering student solutions , 2014, L@S.

[8]  Torsten Zesch,et al.  Reducing Annotation Efforts in Supervised Short Answer Scoring , 2015, BEA@NAACL-HLT.

[9]  Sumit Basu,et al.  Powergrading: a Clustering Approach to Amplify Human Effort for Short Answer Grading , 2013, TACL.

[10]  Peter W. Foltz,et al.  Generating Reference Texts for Short Answer Scoring Using Graph-based Summarization , 2015, BEA@NAACL-HLT.

[11]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[12]  Leonidas J. Guibas,et al.  Codewebs: scalable homework search for massive open online programming courses , 2014, WWW.

[13]  Rada Mihalcea,et al.  Text-to-Text Semantic Similarity for Automatic Short Answer Grading , 2009, EACL.

[14]  Fabian Zehner,et al.  Automatic Coding of Short Text Responses via Clustering in Educational Assessment , 2016, Educational and psychological measurement.

[15]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[16]  William Wresch,et al.  The Imminence of Grading Essays by Computer-25 Years Later , 1993 .

[17]  Svetlana Stoyanchev,et al.  Automating Model Building in c-rater , 2009, TextInfer@ACL.

[18]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[19]  Olaf Köller,et al.  Pisa 2012 : Fortschritte und Herausforderungen in Deutschland , 2013 .

[20]  S. R. Jammalamadaka,et al.  Topics in Circular Statistics , 2001 .