Investigating rater/prompt interactions in writing assessment: Quantitative and qualitative approaches

Abstract This study investigates how experienced and inexperienced raters score essays written by ESL students on two different prompts. The quantitative analysis using multi-faceted Rasch measurement, which provides measurements of rater severity and consistency, showed that the inexperienced raters were more severe than the experienced raters on one prompt but not on the other prompt, and that differences between the two groups of raters were eliminated following rater training. The qualitative analysis, which consisted of analysis of raters' think-aloud protocols while scoring essays, provided insights into reasons for these differences. Differences were related to the ease with which the scoring rubric could be applied to the two prompts and to differences in how the two groups of raters perceived the appropriateness of the prompts.

[1]  James Dean Brown Do English and ESL Faculties Rate Writing Samples Differently , 1991 .

[2]  Edys S. Quellmalz,et al.  EFFECTS OF DISCOURSE AND RESPONSE MODE ON THE MEASUREMENT OF WRITING COMPETENCE , 1982 .

[3]  Assessment of Acting Ability. , 1991 .

[4]  T. Santos Professors' Reactions to the Academic Writing of Nonnative-Speaking Students. , 1988 .

[5]  Liz Hamp-Lyons,et al.  EXAMINING EXPERT JUDGMENTS OF TASK DIFFICULTY ON ESSAY TESTS , 1994 .

[6]  George Engelhard,et al.  Evaluating Rater Accuracy in Performance Assessments. , 1996 .

[7]  Peter Mosenthal Research on Writing: Principles and Methods. , 1983 .

[8]  Gordon Brossell Rhetorical Specification in Essay Examination Topics. , 1983 .

[9]  Alister Cumming,et al.  Professor's Ratings of Language Use and Rhetorical Organizations in ESL Compositions , 1987 .

[10]  Paul B. Diederich,et al.  Measuring Growth in English , 1974 .

[11]  Mary E. Lunz,et al.  Measuring the Impact of Judge Severity on Examination Scores , 1990 .

[12]  Charlene Polio,et al.  ESL writing assessment prompts: How students choose , 1996 .

[13]  Sara Cushing Weigle,et al.  Effects of training on raters of ESL compositions , 1994 .

[14]  G. Hull,et al.  Some Effects of Varying the Structure of a Topic on College Students' Writing , 1985 .

[15]  Diane J. Tedick,et al.  Reading in the Composition Classroom: Second Language Perspectives , 1993 .

[16]  K. Bailey The Research Manual: Design and Statistics for Applied Linguistics , 1991 .

[17]  Alister Cumming,et al.  Expertise in evaluating second language compositions , 1990 .

[18]  K. A. Ericsson,et al.  Verbal reports as data. , 1980 .

[19]  Barbara Kroll,et al.  Guidelines for designing writing prompts: Clarifications, caveats, and cautions , 1994 .

[20]  Liz Hamp-Lyons Assessing Second Language Writing in Academic Contexts , 1991 .

[21]  Marion Crowhurst Syntactic Complexity and Teachers' Quality Ratings of Narrations and Arguments. , 1980 .

[22]  Barbara Kroll Second Language Writing. Research Insights for the Classroom. , 1990 .

[23]  Mary E. Lunz,et al.  Judge Consistency and Severity Across Grading Periods , 1990 .

[24]  Sara Cushing Weigle,et al.  Using FACETS to model rater training effects , 1998 .

[25]  Linda Flower,et al.  The Dynamics of Composing : Making Plans and Juggling Constraints , 1980 .

[26]  Hayes identifying the organization of wi iiing processes , 1980 .

[27]  James Hoetker ESSAY EXAMINATION TOPICS AND STUDENTS' WRITING , 1982 .

[28]  Elana Shohamy,et al.  The Effect of Raters' Background and Training on the Reliability of Direct Writing Tests , 1992 .

[29]  Timothy D. Wilson,et al.  Telling more than we can know: Verbal reports on mental processes. , 1977 .

[30]  Carol O. Sweedler-Brown ESL Essay Evaluation: The Influence of Sentence-Level and Rhetorical Features. , 1993 .

[31]  B. Huot,et al.  Reliability, Validity, and Holistic Scoring: What We Know and What We Need to Know , 1990 .

[32]  Jay L. Robinson,et al.  Taking on Testing: Teachers as Tester-Researchers. , 1987 .

[33]  Andrew D. Cohen,et al.  On taking language tests , 1984 .

[34]  K. A. Ericsson,et al.  Protocol Analysis: Verbal Reports as Data , 1984 .

[35]  Donald E. Powers,et al.  Will They Think Less of My Handwritten Essay If Others Word Process Theirs? Effects on Essay Scores of Intermingling Handwritten and Word-Processed Essays. , 1994 .

[36]  Brian K. Lynch,et al.  Investigating variability in tasks and rater judgements in a performance test of foreign language speaking , 1995 .

[37]  Charles R. Cooper,et al.  Evaluating Writing: Describing, Measuring, Judging , 1977 .

[38]  J. Connor-Linton Looking Behind the Curtain: What Do L2 Composition Ratings Really Mean? , 1995 .

[39]  D. Charney,et al.  The Validity of Using Holistic Scoring to Evaluate Writing: A Critical Overview , 1984, Research in the Teaching of English.

[40]  Harvey S. Wiener,et al.  Writing Assessment: Issues and Strategies , 1987 .

[41]  B. Huot,et al.  Validating holistic scoring for writing assessment : theoretical and empirical foundations , 1993 .

[42]  Peter Smagorinsky,et al.  The Reliability and Validity of Protocol Analysis , 1989 .

[43]  Joy Reid,et al.  Second Language Writing: Responding to different topic types: a quantitative analysis from a contrastive rhetoric perspective , 1990 .

[44]  George Engelhard,et al.  The Measurement of Writing Ability With a Many-Faceted Rasch Model , 1992 .

[45]  T. McNamara Measuring Second Language Performance , 1996 .

[46]  Anselm L. Strauss,et al.  Qualitative Analysis For Social Scientists , 1987 .

[47]  R. Hambleton,et al.  Fundamentals of Item Response Theory , 1991 .

[48]  B. Huot,et al.  The Literature of Direct Writing Assessment: Major Concerns and Prevailing Trends , 1990 .