论文信息 - Writing evaluation: rater and task effects on the reliability of writing scores for children in Grades 3 and 4 - 字舞流文

Writing evaluation: rater and task effects on the reliability of writing scores for children in Grades 3 and 4

Abstract We examined how raters and tasks influence measurement error in writing evaluation and how many raters and tasks are needed to reach a desirable level of .90 and .80 reliabilities for children in Grades 3 and 4. A total of 211 children (102 boys) were administered three tasks in narrative and expository genres, respectively, and their written compositions were evaluated in widely used evaluation methods for developing writers: holistic scoring, productivity, and curriculum-based writing scores. Results showed that 54 and 52% of variance in narrative and expository compositions were attributable to true individual differences in writing. Students’ scores varied largely by tasks (30.44 and 28.61% of variance), but not by raters. To reach the reliability of .90, multiple tasks and raters were needed, and for the reliability of .80, a single rater and multiple tasks were needed. These findings offer important implications about reliably evaluating children’s writing skills, given that writing is typically evaluated by a single task and a single rater in classrooms and even in some state accountability systems.

Jeanne Wanzek | Christopher Schatschneider | Stephanie Al Otaiba | Young-Suk Grace Kim | C. Schatschneider | Jeanne Wanzek | Y. Kim | Stephanie Al Otaiba | Brandy Gatlin | Brandy Gatlin

[1] Khaled Barkaoui,et al. Rating scale impact on EFL essay marking: A mixed-method study , 2007 .

[2] Kristen D. Ritchey,et al. Curriculum-Based Measurement of Writing in Kindergarten and First Grade: An Investigation of Production and Qualitative Scores , 2010 .

[3] Alister Cumming,et al. Decision Making While Rating ESL/EFL Writing Tasks: A Descriptive Framework. , 2002 .

[4] Peter L. Cooper,et al. THE ASSESSMENT OF WRITING ABILITY: A REVIEW OF RESEARCH , 1984 .

[5] S. Cushing. Using FACETS to model rater training effects , 1998 .

[6] Sarah W. Beck,et al. Genres of high-stakes writing assessments and the construct of writing competence , 2007 .

[7] Rob Schoonen,et al. Generalizability of writing scores: an application of structural equation modeling , 2005 .

[8] Richard J. Shavelson,et al. Generalizability Theory: A Primer , 1991 .

[9] Troy R. Ellis,et al. Curriculum-Based Measures of Beginning Writing: Technical Features of the Slope , 2011 .

[10] Kristen L. McMaster,et al. Technical Features of Curriculum-Based Measurement in Writing , 2007 .

[11] M. Daane,et al. The Nation's Report Card: Writing, 2002. , 2003 .

[12] Donald B. Rubin,et al. The Dependability of Behavioral Measurements: Theory of Generalizability for Scores and Profiles. , 1974 .

[13] Christine A. Espin,et al. The Relationship Between Curriculum-Based Measures in Written Expression and Quality and Completeness of Expository Writing for Middle School Students , 2005 .

[14] Cynthia S. Puranik,et al. Componential skills of beginning writing: An exploratory study. , 2011, Learning and individual differences.

[15] Lyle F. Bachman. Statistical analyses for language assessment , 2004 .

[16] George H. Noell,et al. An examination of the criterion validity and sensitivity to brief intervention of alternate curriculum‐based measures of writing skill , 2004 .

[17] T. E. Dinero. Scale development. , 1996, Journal of health & social policy.

[18] Brent Bridgeman,et al. A STUDY OF WRITING TASKS ASSIGNED IN ACADEMIC DEGREE PROGRAMS , 1995 .

[19] Judith A. Langer,et al. The State of Writing Instruction in America's Schools: What Existing Data Tell Us. , 2006 .

[20] M. Scardamalia,et al. The psychology of written composition , 1987 .

[21] F. Kuiken,et al. Rating written performance: What do raters do and why? , 2014 .

[22] B. Huot,et al. The Literature of Direct Writing Assessment: Major Concerns and Prevailing Trends , 1990 .

[23] Thomas Eckes,et al. Rater types in writing performance assessments: A classification approach to rater variability , 2008 .

[24] Amanda M. VanDerHeyden,et al. Moving Beyond Total Words Written: The Reliability, Criterion Validity, and Time Cost of Alternate Measures for Curriculum-Based Measurement in Writing , 2002 .

[25] Erica S. Lembke,et al. Identifying an Indicator of Growth in Early Writing Proficiency for Elementary School Students , 2003 .

[26] Amanda M. VanDerHeyden,et al. The Technical Adequacy of Curriculum-Based and Rating-Based Measures of Written Expression for Elementary School Students , 2006 .

[27] Sara Cushing Weigle,et al. Using FACETS to model rater training effects , 1998 .

[28] Natalie G. Olinghouse. Student- and instruction-level predictors of narrative writing in third-grade students , 2008 .

[29] Tim Moore,et al. Authenticity in the IELTS Academic Module Writing Test: a comparative study of Task 2 items and university assignments , 1999 .

[30] Atta Gebril,et al. Score generalizability of academic writing tasks: Does one test method fit it all? , 2009 .

[31] Janice M. Stuhlmann,et al. A Generalizability Study of the Effects of Training on Teachers' Abilities to Rate Children's Writing Using a Rubric. , 1999 .

[32] Xiaohong Gao,et al. Generalizability Analyses of Work Keys Listening and Writing Tests , 1995 .

[33] Kristen L. McMaster,et al. Technical Features of Curriculum-Based Measures for Beginning Writers , 2009, Journal of learning disabilities.

[34] Nigel O'Brian,et al. Generalizability Theory I , 2003 .

[35] Stephen Graham,et al. The relationship between the discourse knowledge and the writing performance of elementary-grade students. , 2009 .

[36] Cynthia S. Puranik,et al. Modeling the development of written language , 2011, Reading and writing.

[37] James S. Braswell,et al. The Nation's Report Card[TM]: Mathematics, 2003. NCES 2005-451. , 2005 .

[38] R. Abbott,et al. Structural equation modeling of relationships among developmental skills and writing skills in primary- and intermediate-grade writers. , 1993 .

[39] Cynthia S. Puranik,et al. Evaluating the dimensionality of first-grade written composition. , 2014, Journal of speech, language, and hearing research : JSLHR.

[40] Eva Nick,et al. The dependability of behavioral measurements: theory of generalizability for scores and profiles , 1973 .

[41] Jeanne Wanzek,et al. Towards an understanding of dimensions, predictors, and gender gap in written composition. , 2015, Journal of educational psychology.

[42] Christine K. Malecki,et al. The Utility of CBM Written Language Indices: An Investigation of Production-Dependent, Production-Independent, and Accurate-Production Scores , 2005 .

[43] J. Hoffman,et al. The Routledge International Handbook of English, Language and Literacy Teaching. , 2010 .

[44] Anton Béguin,et al. Effect of genre on the generalizability of writing scores , 2015 .

[45] Liz Hamp-Lyons. Worrying about rating , 2007 .

[46] R. Brennan. Generalizability Theory and Classical Test Theory , 2010 .

[47] D. R. Smith,et al. CHAPTER 6 – Wechsler Individual Achievement Test , 2001 .

[48] Christopher Mushquash,et al. SPSS and SAS programs for generalizability theory analyses , 2006, Behavior research methods.

[49] Carl W. Swartz,et al. Using Generalizability Theory to Estimate the Reliability of Writing Scores Derived from Holistic and Analytical Scoring Methods , 1999 .

[50] Rob Schoonen,et al. The validity and generalizability of writing scores: the effect of rater, task and language , 2012 .

[51] Joshua Wilson,et al. Examining the Validity of Single-Occasion, Single-Genre, Holistically Scored Writing Assessments , 2012 .

[52] S. Deno,et al. Curriculum-Based Measurement: The Emerging Alternative , 1985, Exceptional children.

[53] Kimi Kondo-Brown,et al. A FACETS analysis of rater bias in measuring Japanese second language writing performance , 2002 .

[54] Suzanne Lane,et al. Use of Generalizability Theory for Estimating the Dependability of a Scoring System for Sample Essays , 1989 .

[55] H. Bergh,et al. Generalizability of Text Quality Scores , 2012 .

[56] Martin East,et al. Evaluating the reliability of a detailed analytic scoring rubric for foreign language writing , 2009 .

[57] Steve Graham,et al. Informing Writing: The Benefits of Formative Assessment , 2011 .

[58] Cynthia S. Puranik,et al. Assessing the microstructure of written language using a retelling paradigm. , 2008, American journal of speech-language pathology.

[59] S. Otaiba,et al. Language, literacy, attentional behaviors, and instructional quality predictors of written composition for first graders. , 2013, Early childhood research quarterly.

[60] Linda J. Lombardino,et al. Writing through retellings: an exploratory study of language-impaired and dyslexic populations , 2006 .

[61] T. Sanders,et al. Quantifying the quality difference between L1 and L2 essays: A rating procedure with bilingual raters and L1 and L2 benchmark essays , 2013 .

[62] R. Abbott,et al. Role of mechanics in composing of elementary school students: A new methodological approach. , 1997 .

[63] LaDonna Stout-Boone. Inside Information: Developing Powerful Readers and Writers of Informational Text Through Project-Based Instruction , 2015 .