Characteristics of hand and machine-assigned scores to college students’ answers to open-ended tasks

Assessment of learning in higher education is a critical concern to policy makers, educators, parents, and students. And, doing so appropriately is likely to require including constructed response tests in the assessment system. We examined whether scoring costs and other concerns with using open-end measures on a large scale (e.g., turnaround time and inter-reader consistency) could be addressed by machine grading the answers. Analyses with 1359 students from 14 colleges found that two human readers agreed highly with each other in the scores they assigned to the answers to three types of open-ended questions. These reader assigned scores also agreed highly with those assigned by a computer. The correlations of the machine-assigned scores with SAT scores, college grades, and other measures were comparable to the correlations of these variables with the hand-assigned scores. Machine scoring did not widen differences in mean scores between racial/ethnic or gender groups. Our findings demonstrated that machine scoring can facilitate the use of open-ended questions in large-scale testing programs by providing a fast, accurate, and economical way to grade responses.

[1]  Laura Hamilton,et al.  An Approach to Measuring Cognitive Outcomes Across Higher Education Institutions , 2005 .

[2]  Martin Chodorow,et al.  Stumping e-rater: challenging the validity of automated essay scoring , 2002, Comput. Hum. Behav..

[3]  T. Dary Erwin,et al.  Assessment of Critical Thinking: ETS's Tasks in Critical Thinking , 2003 .

[4]  Martin Chodorow,et al.  COMPARING THE VALIDITY OF AUTOMATED AND HUMAN ESSAY SCORING , 2000 .

[5]  Howard Wainer,et al.  COMBINING MULTIPLE-CHOICE AND CONSTRUCTED RESPONSE TEST SCORES: TOWARD A MARXIST THEORY OF TEST CONSTRUCTION , 1992 .

[6]  Robert H. Meyer,et al.  Value-added indicators of school performance: A primer , 1997 .

[7]  Richard Hawkins,et al.  Evaluation of an Automated Procedure for Scoring Patient Notes as Part of a Clinical Skills Examination , 2003, Academic medicine : journal of the Association of American Medical Colleges.

[8]  Richard J. Shavelson,et al.  Responding Responsibly to the Frenzy To Assess Learning in Higher Education. , 2003 .

[9]  Henry Braun,et al.  PSYCHOMETRIC EVALUATION OF THE NEW GRE® WRITING ASSESSMENT , 2001 .

[10]  William Wresch,et al.  The Imminence of Grading Essays by Computer-25 Years Later , 1993 .

[11]  Karen Kukich,et al.  Beyond Automated Essay Scoring , 2000 .

[12]  Roger Bolus,et al.  Evaluation Review the Collegiate Learning Assessment: Facts and Fantasies , 2007 .

[13]  T. Landauer Automatic Essay Assessment , 2003 .

[14]  Randy M. Kaplan,et al.  Using Lexical Semantic Techniques to Classify Free-Responses , 1999 .

[15]  Arthur Daigon,et al.  Computer Grading of English Composition , 1966 .