Can Automated Scoring Surpass Hand Grading of Students' Constructed Responses and Error Patterns in Mathematics?

A unique online parsing system that produces partial­credit scoring of students’ constructed responses to mathematical questions is presented. The parser is the core of a free college readiness website in mathematics. The software generates immediate error analysis for each student response. The response is scored on a continuous scale, based on its overall correctness and the fraction of correct elements. The parser scoring was validated against human scoring of 207 real­world student responses (r = 0.91). Moreover, the software generates more consistent scores than teachers in some cases. The parser analysis of students’ errors on 124 additional responses showed that the errors were factored into two groups: structural (possibly conceptual), and computational (could result from typographical errors). The two error groups explained 55% of students’ scores variance (structural errors: 36%; computational errors: 19%). In contrast, these groups explained only 33% of the teacher score variance (structural: 18%; computational: 15%). There was a low agreement among teachers on error classification, and their classification was weakly correlated to the parser’s error groups. Overall, the parser’s total scoring closely matched human scoring, but the machine was found to surpass humans in systematically distinguishing between students’ error patterns.

[1]  Charles A. Wight,et al.  Automated Error Analysis through Parsing Mathematical Expressions in Adaptive Online Learning , 2006 .

[2]  J.-C. Hong Error patterns in problem solving , 1996 .

[3]  Descriptors Comparative Assessment of Educational Progress , 1998 .

[4]  John R. Anderson,et al.  Abstract Planning and Perceptual Chunks: Elements of Expertise in Geometry , 1990, Cogn. Sci..

[5]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[6]  Jacob Cohen,et al.  A power primer. , 1992, Psychological bulletin.

[7]  Deborah Alpert Sleight Use of Paper-Based Support Tools to Aid the Acquisition of Cognitive Skills During Unsupervised Practice , 2003, Interact. Learn. Environ..

[8]  Randy Elliot Bennett,et al.  Validity and Automad Scoring: It's Not Only the Scoring , 1998 .

[9]  A. Orton,et al.  Students' understanding of differentiation , 1983 .

[10]  Ning Wang,et al.  Validating the cognitive complexity and content quality of a mathematics performance assessment , 1994 .

[11]  E. Pehkonen The state-of-art in mathematical creativity , 1997 .

[12]  Stellan Ohlsson,et al.  Learning from Performance Errors. , 1996 .

[13]  Randy Elliot Bennett,et al.  Evaluating an Automatically Scorable, Open-Ended Response Type for Measuring Mathematical Reasoning in Computer-Adaptive Tests. , 1997 .

[14]  Kaizhong Zhang,et al.  A System for Approximate Tree Matching , 1994, IEEE Trans. Knowl. Data Eng..

[15]  Kaizhong Zhang,et al.  Approximate tree pattern matching , 1997 .

[16]  Cliff E. Beevers,et al.  Incorporating partial credit in computer-aided assessment of Mathematics in secondary education , 2006, Br. J. Educ. Technol..

[17]  Herbert A. Simon,et al.  Why a Diagram is (Sometimes) Worth Ten Thousand Words , 1987 .

[18]  C. Beevers,et al.  Assessment in mathematics , 2002 .

[19]  Laura R. Novick,et al.  Spatial diagrams: Key instruments in the toolbox for thought , 2000 .

[20]  P. H. Miller Theories of developmental psychology , 1983 .

[21]  George McGuire,et al.  Partial credit in mathematics exams - a comparison of traditional and CAA exams , 2002 .