Reliability and validity of rubrics for assessment through writing

Abstract This experimental project investigated the reliability and validity of rubrics in assessment of students’ written responses to a social science “writing prompt”. The participants were asked to grade one of the two samples of writing assuming it was written by a graduate student. In fact both samples were prepared by the authors. The first sample was well written in terms of sentence structure, spelling, grammar, and punctuation; however, the author did not fully answer the question. The second sample fully answered each part of the question, but included multiple errors in structure, spelling, grammar and punctuation. In the first experiment, the first sample was assessed by participants once without a rubric and once with a rubric. In the second experiment, the second sample was assessed by participants once without a rubric and once with a rubric. The results showed that raters were significantly influenced by mechanical characteristics of students’ writing rather than the content even when they used a rubric. Study results also indicated that using rubrics may not improve the reliability or validity of assessment if raters are not well trained on how to design and employ them effectively.

[1]  Hunter M. Breland The Direct Assessment of Writing Skill: A Measurement Review , 1983 .

[2]  Edward W. Wolfe,et al.  The relationship between essay reading style and scoring proficiency in a psychometric scoring system , 1997 .

[3]  Heidi Andrade,et al.  Putting Rubrics to the Test: The Effect of a Model, Criteria Generation, and Rubric‐Referenced Self‐Assessment on Elementary School Students' Writing , 2008 .

[4]  Vicki Spandel,et al.  In Defense of Rubrics , 2006 .

[5]  John C. Hafner,et al.  Quantitative analysis of the rubric as an assessment tool: an empirical study of student peer‐group rating , 2003 .

[6]  Fritjof Capra,et al.  The Hidden Connections , 2002 .

[7]  Alfie Kohn,et al.  The Trouble with Rubrics , 2006 .

[8]  Vicki Spandel,et al.  Speaking My Mind: In Defense of Rubrics , 2006 .

[9]  D. Harvey A Brief History of Neoliberalism , 2020, The Anti-Capitalist Chronicles.

[10]  G. Wiggins The constant danger of sacrificing validity to reliability: Making writing assessment serve writers , 1994 .

[11]  Ute Knoch,et al.  Re-Training Writing Raters Online: How Does It Compare with Face-to-Face Training?. , 2007 .

[12]  P. Frick,et al.  Verbal ability and delinquency: testing the moderating role of psychopathic traits. , 2008, Journal of child psychology and psychiatry, and allied disciplines.

[13]  John M. Malouff Bias in Grading , 2008 .

[14]  Belita Gordon,et al.  The Relation Between Score Resolution Methods and Interrater Reliability: An Empirical Study of an Analytic Scoring Rubric , 2000 .

[15]  Maja Wilson,et al.  Why I Won't Be Using Rubrics to Respond to Students' Writing. , 2007 .

[16]  Jeffrey Oescher,et al.  Using Rubrics to Increase the Reliability of Assessment in Health Classes , 2006 .

[17]  Alfie Kohn,et al.  Speaking My Mind: The Trouble with Rubrics , 2006 .

[18]  R. Bull,et al.  The effects of attractiveness of writer and penmanship on essay grades , 1979 .

[19]  William McColly,et al.  What Does Educational Research Say about the Judging of Writing Ability , 1970 .

[20]  Kevin Dahm,et al.  Rubric Development and Inter-Rater Reliability Issues in Assessing Learning Outcomes , 2002 .

[21]  Dale P. Scannell,et al.  The Effect of Selected Composition Errors on Grades Assigned to Essay Examinations , 1966 .

[22]  Jeffrey D. Sachs,et al.  The End of Poverty , 2005 .

[23]  William E. Coffman On the Reliability of Ratings of Essay Examinations in English. , 1971 .

[24]  Clinton I. Chase,et al.  THE IMPACT OF SOME OBVIOUS VARIABLES ON ESSAY TEST SCORES , 1968 .

[25]  Lonnie Athens,et al.  “Trouble Comes in Threes” , 2010 .

[26]  T. Lumley Assessment criteria in a large-scale writing test: what do they really mean to the raters? , 2002 .

[27]  Liz Hamp-Lyons Assessing Second Language Writing in Academic Contexts , 1991 .

[28]  L. R. Markham,et al.  Influences of Handwriting Quality on Teacher Evaluation of Written Work1 , 1976 .

[29]  Dennis Briggs,et al.  THE INFLUENCE OF HANDWRITING ON ASSESSMENT , 1970 .

[30]  Robert Hunter Wade,et al.  The rising inequality of world income distribution , 2001 .

[31]  Susan M. Brookhart,et al.  The Quality of Local District Assessments Used in Nebraska's School‐Based Teacher‐Led Assessment and Reporting System (STARS) , 2005 .

[32]  D. Charney,et al.  The Validity of Using Holistic Scoring to Evaluate Writing: A Critical Overview , 1984, Research in the Teaching of English.

[33]  Ute Knoch,et al.  Diagnostic assessment of writing: A comparison of two rating scales , 2009 .

[34]  Lisa M. PytlikZillig,et al.  Helping preservice teachers learn to assess writing: Practice and feedback in a Web-based environment , 2009 .

[35]  J. Marshall,et al.  WRITING NEATNESS, COMPOSITION ERRORS, AND ESSAY GRADES , 1969 .

[36]  Sheila Croucher,et al.  Globalization and Belonging: The Politics of Identity in a Changing World , 2003 .

[37]  Jinyan Huang,et al.  How accurate are ESL students’ holistic writing scores on large-scale assessments?—A generalizability theory approach , 2008 .

[38]  Linda Mabry,et al.  Writing to the Rubric: Lingering Effects of Traditional Standardized Testing on Direct Writing Assessment. , 1999 .

[39]  Kenneth Lindblom,et al.  The Round Table: Censorship: A Continuing Problem , 1990 .

[40]  Maja Wilson,et al.  Rethinking Rubrics in Writing Assessment , 2006 .

[41]  Roberta L. Ross-Fisher Developing Effective Success Rubrics , 2005 .

[42]  Belita Gordon,et al.  Using Rating Augmentation to Expand the Scale of an Analytic Rubric , 2000 .

[43]  Becky Francis,et al.  Gender, ‘bias’, assessment and feedback: analyzing the written assessment of undergraduate history essays , 2005 .

[44]  John Follman Critical Thinking and Verbal Ability. , 1993 .

[45]  E. Skaalvik,et al.  Self-Concept and Self-Efficacy: A Test of the Internal/External Frame of Reference Model and Predictions of Subsequent Motivation and Achievement , 2004, Psychological reports.

[46]  Tom Lumley,et al.  Rater characteristics and rater bias: implications for training , 1995 .

[47]  J. Church Human Development Report , 2001 .

[48]  Leonard S. Feldt,et al.  Variability in Reliability Coefficients and the Standard Error of Measurement From School District to District , 1999 .

[49]  J. Cady,et al.  Teachers' use of rubrics to score non‐traditional tasks: factors related to discrepancies in scoring , 2006 .

[50]  Everett M. Shepherd The Effect of the Quality of Penmanship on Grades , 1929 .

[51]  Kevin D. Finson,et al.  Rubrics and Their Use in Inclusive Science , 1998 .

[52]  Aaron Healy,et al.  The Global Marketplace , 2010 .

[53]  B. Huot,et al.  Validating holistic scoring for writing assessment : theoretical and empirical foundations , 1993 .

[54]  Anders Jonsson,et al.  The use of scoring rubrics: Reliability, validity, and educational consequences , 2007 .

[55]  H. W. James The Effect of Handwriting upon Grading , 1927 .

[56]  Martin East,et al.  The impact of bilingual dictionaries on lexical sophistication and lexical accuracy in tests of L2 writing proficiency: A quantitative analysis , 2006 .

[57]  N. Elliot On a Scale: A Social History of Writing Assessment in America , 2005 .

[58]  Liz Hamp-Lyons,et al.  The scope of writing assessment , 2002 .