The Effect of Rating Augmentation on Inter-Rater Reliability: An Empirical Study of a Holistic Rubric.

Abstract A two-stage process by which a holistic rubric is applied to the assessment of open-ended items, such as writing samples, is defined. The first stage involves scoring a performance by the assignment of an integer rating that is congruent with the proficiency level that is exhibited in the performance. The second stage is the subsequent assignment by the rater of an augmentation that indicates whether or not the writing competency reflected in the paper is a bit higher or lower than the competency level reflected in the benchmark paper for the given proficiency level. If the rater feels that the paper represents benchmark proficiency for the given level, no augmentation is assigned to the rating. The results of this study indicate that the use of rating augmentation can improve the inter-rater reliability of holistic assessments, as indicated by generalizability phi coefficients, correlation coefficients, and percent agreement indices. Implications and suggestions for follow-up research are discussed.

[1]  Hunter M. Breland The Direct Assessment of Writing Skill: A Measurement Review , 1983 .

[2]  Kathleen Blake Yancey,et al.  On the nature of holistic scoring: An inquiry composed on email , 1994 .

[3]  B. Huot,et al.  Reliability, Validity, and Holistic Scoring: What We Know and What We Need to Know , 1990 .

[4]  Michael T. Kane,et al.  AN INDEX OF DEPENDABILITY FOR MASTERY TESTS , 1977 .

[5]  Peter L. Cooper,et al.  THE ASSESSMENT OF WRITING ABILITY: A REVIEW OF RESEARCH , 1984 .

[6]  Susan R. Goldman,et al.  Evaluation of Procedure-Based Scoring for Hands-On Science Assessment , 1992 .

[7]  William E. Coffman On the Reliability of Ratings of Essay Examinations in English. , 1971 .

[8]  Edward W. Wolfe,et al.  The relationship between essay reading style and scoring proficiency in a psychometric scoring system , 1997 .

[9]  Lee J. Cronbach,et al.  UCLA's Center for the Study of Evaluation & The National Center for Research on Evaluation, Standards, and Student Testing Generalizability Analysis for Educational Assessments 1 , 1995 .

[10]  Composition Rating Scales for General Merit: An Experimental Evaluation , 1965 .

[11]  Kadriye Ercikan,et al.  Calibration and Scoring of Tests With Multiple-Choice and Constructed-Response Item Types , 1998 .

[12]  B. Huot,et al.  The Literature of Direct Writing Assessment: Major Concerns and Prevailing Trends , 1990 .

[13]  Nicholas T. Longford A Case for Adjusting Subjectively Rated Scores in the Advanced Placement Tests. Program Statistics Research. Technical Report No. 94-5. , 1994 .