A Closer Look at the Differences Between Graders in Introductory Computer Science Exams

Contribution: This paper shows how significantly computer science instructors can disagree when grading code writing questions in introductory computer science (CS1) exams. Certain solutions are considered “very good” by some instructors and “very bad” by others. The study identifies four factors as possible contributors to such differences between graders: 1) disagreement on the role of syntax in CS1; 2) absence of objective measures for the seriousness of logic errors; 3) variance in how rubrics are constructed; and 4) the use of subjective judgement while grading. Background: Code writing questions are widely used in CS1 exams. Understanding differences in how graders approach such questions can help the CS education community become more informed about how to design valid assessments for summative and formative purposes. Research Questions: Do graders differ significantly in how they score student programs? What are the main factors to which differences between graders could be attributed? Methodology: A number of CS1 instructors and TAs were asked to score pieces of code resembling student solutions to CS1 exam questions, and were asked to answer open ended questions on how they approach grading in CS1 exams. Responses were then analyzed qualitatively and quantitatively. Findings: There is no consensus on how code writing questions in CS1 exams should be graded. Differences in opinion on how many points to award can be very significant, even for very simple programs and between graders in the same institution. Some graders assign grades that do not necessarily agree with how many points they believe the student solution is worth.

[1]  Arnab Bhattacharya,et al.  Automatic Grading and Feedback using Program Repair for Introductory Programming Courses , 2017, ITiCSE.

[2]  Daryl J. D'Souza,et al.  Exploring programming assessment instruments: a classification scheme for examination questions , 2011, ICER.

[3]  James W. Howatt On criteria for grading student programs , 1994, SGCS.

[4]  James Skene,et al.  Introductory programming: examining the exams , 2012, ACE 2012.

[5]  David M. Olson The reliability of analytic and holistic methods in rating students' computer programs , 1988, SIGCSE '88.

[6]  R. Wayne Hamm,et al.  A tool for program grading: The Jacksonville university scale , 1983, SIGCSE '83.

[7]  J. Carter,et al.  How shall we assess this? , 2003, ITiCSE-WGR '03.

[8]  Klaus Krippendorff,et al.  Answering the Call for a Standard Reliability Measure for Coding Data , 2007 .

[9]  David Jackson A semi-automated approach to online assessment , 2000, ITiCSE '00.

[10]  Charles G. Peterson,et al.  A method for evaluating student written computer programs in an undergraduate computer science programming language course , 1980, SGCS.

[11]  Petri Ihantola,et al.  Review of recent systems for automatic assessment of programming assignments , 2010, Koli Calling.

[12]  Lon Smith,et al.  Weighted primary trait analysis for computer program evaluation , 2005 .

[13]  Tony Clear,et al.  An Australasian study of reading and comprehension skills in novice programmers, using the bloom and SOLO taxonomies , 2006 .

[14]  Robert McCartney,et al.  A multi-national study of reading and tracing skills in novice programmers , 2004, ITiCSE-WGR '04.

[15]  Raymond Lister COMPUTING EDUCATION RESEARCH , 2010 .

[16]  Sue Fitzgerald,et al.  What are we thinking when we grade programs? , 2013, SIGCSE '13.

[17]  Angela Carbone,et al.  Going SOLO to assess novice programmers , 2008, SIGCSE 2008.

[18]  Daniel Zingaro,et al.  Reviewing CS1 exam question content , 2011, SIGCSE '11.