Examining Rater Errors in the Assessment of Written Composition With a Many-Faceted Rasch Model

This study describes several categories of rater errors (rater severity, halo effect, central tendency, and restriction of range). Criteria are presented for evaluating the quality of ratings based on a many-faceted Rasch measurement (FACETS) model for analyzing judgments. A random sample of 264 compositions rated by 15 raters and a validity committee from the 1990 administration of the Eighth Grade Writing Test in Georgia is used to illustrate the model. The data suggest that there are significant differences in rater severity. Evidence of a halo effect is found for two raters who appear to be rating the compositions holistically rather than analytically. Approximately 80% of the ratings are in the two middle categories of the rating scale, indicating that the error of central tendency is present. Restriction of range is evident when the unadjusted raw score distribution is examined, although this rater error is less evident when adjusted estimates of writing competence are used

[1]  Stephen B. Dunbar,et al.  Quality Control in the Development and Use of Performance Assessments , 1991 .

[2]  George Engelhard,et al.  The Influences of Mode of Discourse, Experiential Demand, and Gender on the Quality of Student Writing. , 1992 .

[3]  Kevin R. Murphy,et al.  Performance appraisal: An organizational perspective. , 1991 .

[4]  William K. Balzer,et al.  Rater errors and rating accuracy. , 1989 .

[5]  Susan R. Goldman,et al.  Evaluation of Procedure-Based Scoring for Hands-On Science Assessment , 1992 .

[6]  R. D. De Ayala,et al.  Partial Credit Analysis of Writing Ability , 1991 .

[7]  B. Wright,et al.  Best Test Design. Rasch Measurement. , 1979 .

[8]  P. Moss Shifting Conceptions of Validity in Educational Measurement: Implications for Performance Assessment , 1992 .

[9]  Mary E. Lunz,et al.  Measuring the Impact of Judge Severity on Examination Scores , 1990 .

[10]  L. Hedges,et al.  Statistical Methods for Meta-Analysis , 1987 .

[11]  E. Thorndike A constant error in psychological ratings. , 1920 .

[12]  Stephen B. Dunbar,et al.  Complex, Performance-Based Assessment: Expectations and Validation Criteria , 1991 .

[13]  George Engelhard,et al.  The Measurement of Writing Ability With a Many-Faceted Rasch Model , 1992 .

[14]  David Andrich,et al.  Rasch Models For Measurement , 1988 .

[15]  F. E. Zegers Coefficients for Interrater Agreement , 1991 .

[16]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[17]  Alastair Pollitt,et al.  Calibrating graded assessments: Rasch partial credit analysis of performance in writing , 1987 .

[18]  R. Downey,et al.  Rating the ratings: Assessing the psychometric quality of rating data , 1980 .

[19]  The Rasch Model for Item Analysis , 1987 .

[20]  G. Masters,et al.  Rating Scale Analysis. Rasch Measurement. , 1983 .

[21]  Kevin R. Murphy,et al.  Multiple uses of performance appraisal: Prevalence and correlates. , 1989 .