Divide and correct: using clusters to grade short answers at scale

In comparison to multiple choice or other recognition-oriented forms of assessment, short answer questions have been shown to offer greater value for both students and teachers; for students they can improve retention of knowledge, while for teachers they provide more insight into student understanding. Unfortunately, the same open-ended nature which makes them so valuable also makes them more difficult to grade at scale. To address this, we propose a cluster-based interface that allows teachers to read, grade, and provide feedback on large groups of answers at once. We evaluated this interface against an unclustered baseline in a within-subjects study with 25 teachers, and found that the clustered interface allows teachers to grade substantially faster, to give more feedback to students, and to develop a high-level view of students' understanding and misconceptions.

[1]  A. Poulos,et al.  Effectiveness of feedback: the students’ perspective , 2008 .

[2]  Ben Hamner,et al.  Contrasting state-of-the-art automated scoring of essays: analysis , 2012 .

[3]  Sally E. Jordan,et al.  e-Assessment for learning? The potential of short-answer free-text questions with tailored feedback , 2009, Br. J. Educ. Technol..

[4]  Mary Thorpe,et al.  Assessment and ‘third generation’ distance education , 1998 .

[5]  Lydia B. Chilton,et al.  Personalized Online Education - A Crowdsourcing Challenge , 2012, HCOMP@AAAI.

[6]  M. Scriven The methodology of evaluation , 1966 .

[7]  B. Bloom The 2 Sigma Problem: The Search for Methods of Group Instruction as Effective as One-to-One Tutoring , 1984 .

[8]  P. Sadler,et al.  The Impact of Self- and Peer-Grading on Student Learning , 2006 .

[9]  Jeffrey D. Karpicke,et al.  The Critical Importance of Retrieval for Learning , 2008, Science.

[10]  Marti A. Hearst The debate on automated essay grading , 2000 .

[11]  Rada Mihalcea,et al.  Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments , 2011, ACL.

[12]  Richard C. Anderson,et al.  On asking people questions about what they are reading , 1975 .

[13]  J. H. McMillan Secondary Teachers' Classroom Assessment and Grading Practices , 2005 .

[14]  Sumit Basu,et al.  Powergrading: a Clustering Approach to Amplify Human Effort for Short Answer Grading , 2013, TACL.

[15]  Steve Myran,et al.  Elementary Teachers' Classroom Assessment and Grading Practices , 2002 .

[16]  John Heywood,et al.  Assessment in higher education , 1978 .

[17]  John Heywood,et al.  Assessment in Higher Education: Student Learning, Teaching, Programmes and Institutions , 1977 .

[18]  J. Michael Spector,et al.  Handbook of Research on Educational Communications and Technology, 3rd Edition , 2012 .

[19]  E. Mory Feedback research revisited. , 2004 .

[20]  Robert B. Frary,et al.  Hodgepodge Grading: Endorsed by Students and Teachers Alike. , 1999 .

[21]  John L. Esposito,et al.  Practice and Theory , 2004 .

[22]  Zhenghao Chen,et al.  Tuned Models of Peer Assessment in MOOCs , 2013, EDM.

[23]  Loren G. Terveen,et al.  Two peers are better than one: aggregating peer reviews for computing assignments is surprisingly accurate , 2009, GROUP.

[24]  Susan M. Brookhart,et al.  Teachers' Grading: Practice and Theory , 1994 .