Writing evaluation: rater and task effects on the reliability of writing scores for children in Grades 3 and 4

Abstract We examined how raters and tasks influence measurement error in writing evaluation and how many raters and tasks are needed to reach a desirable level of .90 and .80 reliabilities for children in Grades 3 and 4. A total of 211 children (102 boys) were administered three tasks in narrative and expository genres, respectively, and their written compositions were evaluated in widely used evaluation methods for developing writers: holistic scoring, productivity, and curriculum-based writing scores. Results showed that 54 and 52% of variance in narrative and expository compositions were attributable to true individual differences in writing. Students’ scores varied largely by tasks (30.44 and 28.61% of variance), but not by raters. To reach the reliability of .90, multiple tasks and raters were needed, and for the reliability of .80, a single rater and multiple tasks were needed. These findings offer important implications about reliably evaluating children’s writing skills, given that writing is typically evaluated by a single task and a single rater in classrooms and even in some state accountability systems.

[1]  Khaled Barkaoui,et al.  Rating scale impact on EFL essay marking: A mixed-method study , 2007 .

[2]  Kristen D. Ritchey,et al.  Curriculum-Based Measurement of Writing in Kindergarten and First Grade: An Investigation of Production and Qualitative Scores , 2010 .

[3]  Alister Cumming,et al.  Decision Making While Rating ESL/EFL Writing Tasks: A Descriptive Framework. , 2002 .

[4]  Peter L. Cooper,et al.  THE ASSESSMENT OF WRITING ABILITY: A REVIEW OF RESEARCH , 1984 .

[5]  S. Cushing Using FACETS to model rater training effects , 1998 .

[6]  Sarah W. Beck,et al.  Genres of high-stakes writing assessments and the construct of writing competence , 2007 .

[7]  Rob Schoonen,et al.  Generalizability of writing scores: an application of structural equation modeling , 2005 .

[8]  Richard J. Shavelson,et al.  Generalizability Theory: A Primer , 1991 .

[9]  Troy R. Ellis,et al.  Curriculum-Based Measures of Beginning Writing: Technical Features of the Slope , 2011 .

[10]  Kristen L. McMaster,et al.  Technical Features of Curriculum-Based Measurement in Writing , 2007 .

[11]  M. Daane,et al.  The Nation's Report Card: Writing, 2002. , 2003 .

[12]  Donald B. Rubin,et al.  The Dependability of Behavioral Measurements: Theory of Generalizability for Scores and Profiles. , 1974 .

[13]  Christine A. Espin,et al.  The Relationship Between Curriculum-Based Measures in Written Expression and Quality and Completeness of Expository Writing for Middle School Students , 2005 .

[14]  Cynthia S. Puranik,et al.  Componential skills of beginning writing: An exploratory study. , 2011, Learning and individual differences.

[15]  Lyle F. Bachman Statistical analyses for language assessment , 2004 .

[16]  George H. Noell,et al.  An examination of the criterion validity and sensitivity to brief intervention of alternate curriculum‐based measures of writing skill , 2004 .

[17]  T. E. Dinero Scale development. , 1996, Journal of health & social policy.

[18]  Brent Bridgeman,et al.  A STUDY OF WRITING TASKS ASSIGNED IN ACADEMIC DEGREE PROGRAMS , 1995 .

[19]  Judith A. Langer,et al.  The State of Writing Instruction in America's Schools: What Existing Data Tell Us. , 2006 .

[20]  M. Scardamalia,et al.  The psychology of written composition , 1987 .

[21]  F. Kuiken,et al.  Rating written performance: What do raters do and why? , 2014 .

[22]  B. Huot,et al.  The Literature of Direct Writing Assessment: Major Concerns and Prevailing Trends , 1990 .

[23]  Thomas Eckes,et al.  Rater types in writing performance assessments: A classification approach to rater variability , 2008 .

[24]  Amanda M. VanDerHeyden,et al.  Moving Beyond Total Words Written: The Reliability, Criterion Validity, and Time Cost of Alternate Measures for Curriculum-Based Measurement in Writing , 2002 .

[25]  Erica S. Lembke,et al.  Identifying an Indicator of Growth in Early Writing Proficiency for Elementary School Students , 2003 .

[26]  Amanda M. VanDerHeyden,et al.  The Technical Adequacy of Curriculum-Based and Rating-Based Measures of Written Expression for Elementary School Students , 2006 .

[27]  Sara Cushing Weigle,et al.  Using FACETS to model rater training effects , 1998 .

[28]  Natalie G. Olinghouse Student- and instruction-level predictors of narrative writing in third-grade students , 2008 .

[29]  Tim Moore,et al.  Authenticity in the IELTS Academic Module Writing Test: a comparative study of Task 2 items and university assignments , 1999 .

[30]  Atta Gebril,et al.  Score generalizability of academic writing tasks: Does one test method fit it all? , 2009 .

[31]  Janice M. Stuhlmann,et al.  A Generalizability Study of the Effects of Training on Teachers' Abilities to Rate Children's Writing Using a Rubric. , 1999 .

[32]  Xiaohong Gao,et al.  Generalizability Analyses of Work Keys Listening and Writing Tests , 1995 .

[33]  Kristen L. McMaster,et al.  Technical Features of Curriculum-Based Measures for Beginning Writers , 2009, Journal of learning disabilities.

[34]  Nigel O'Brian,et al.  Generalizability Theory I , 2003 .

[35]  Stephen Graham,et al.  The relationship between the discourse knowledge and the writing performance of elementary-grade students. , 2009 .

[36]  Cynthia S. Puranik,et al.  Modeling the development of written language , 2011, Reading and writing.

[37]  James S. Braswell,et al.  The Nation's Report Card[TM]: Mathematics, 2003. NCES 2005-451. , 2005 .

[38]  R. Abbott,et al.  Structural equation modeling of relationships among developmental skills and writing skills in primary- and intermediate-grade writers. , 1993 .

[39]  Cynthia S. Puranik,et al.  Evaluating the dimensionality of first-grade written composition. , 2014, Journal of speech, language, and hearing research : JSLHR.

[40]  Eva Nick,et al.  The dependability of behavioral measurements: theory of generalizability for scores and profiles , 1973 .

[41]  Jeanne Wanzek,et al.  Towards an understanding of dimensions, predictors, and gender gap in written composition. , 2015, Journal of educational psychology.

[42]  Christine K. Malecki,et al.  The Utility of CBM Written Language Indices: An Investigation of Production-Dependent, Production-Independent, and Accurate-Production Scores , 2005 .

[43]  J. Hoffman,et al.  The Routledge International Handbook of English, Language and Literacy Teaching. , 2010 .

[44]  Anton Béguin,et al.  Effect of genre on the generalizability of writing scores , 2015 .

[45]  Liz Hamp-Lyons Worrying about rating , 2007 .

[46]  R. Brennan Generalizability Theory and Classical Test Theory , 2010 .

[47]  D. R. Smith,et al.  CHAPTER 6 – Wechsler Individual Achievement Test , 2001 .

[48]  Christopher Mushquash,et al.  SPSS and SAS programs for generalizability theory analyses , 2006, Behavior research methods.

[49]  Carl W. Swartz,et al.  Using Generalizability Theory to Estimate the Reliability of Writing Scores Derived from Holistic and Analytical Scoring Methods , 1999 .

[50]  Rob Schoonen,et al.  The validity and generalizability of writing scores: the effect of rater, task and language , 2012 .

[51]  Joshua Wilson,et al.  Examining the Validity of Single-Occasion, Single-Genre, Holistically Scored Writing Assessments , 2012 .

[52]  S. Deno,et al.  Curriculum-Based Measurement: The Emerging Alternative , 1985, Exceptional children.

[53]  Kimi Kondo-Brown,et al.  A FACETS analysis of rater bias in measuring Japanese second language writing performance , 2002 .

[54]  Suzanne Lane,et al.  Use of Generalizability Theory for Estimating the Dependability of a Scoring System for Sample Essays , 1989 .

[55]  H. Bergh,et al.  Generalizability of Text Quality Scores , 2012 .

[56]  Martin East,et al.  Evaluating the reliability of a detailed analytic scoring rubric for foreign language writing , 2009 .

[57]  Steve Graham,et al.  Informing Writing: The Benefits of Formative Assessment , 2011 .

[58]  Cynthia S. Puranik,et al.  Assessing the microstructure of written language using a retelling paradigm. , 2008, American journal of speech-language pathology.

[59]  S. Otaiba,et al.  Language, literacy, attentional behaviors, and instructional quality predictors of written composition for first graders. , 2013, Early childhood research quarterly.

[60]  Linda J. Lombardino,et al.  Writing through retellings: an exploratory study of language-impaired and dyslexic populations , 2006 .

[61]  T. Sanders,et al.  Quantifying the quality difference between L1 and L2 essays: A rating procedure with bilingual raters and L1 and L2 benchmark essays , 2013 .

[62]  R. Abbott,et al.  Role of mechanics in composing of elementary school students: A new methodological approach. , 1997 .

[63]  LaDonna Stout-Boone Inside Information: Developing Powerful Readers and Writers of Informational Text Through Project-Based Instruction , 2015 .