Throw 'em out or make 'em better? State and district high-stakes writing assessments

The writing of school-aged children is assessed for many reasons (Graham, Harris, & Hebert, 2011). Teachers assess writing to monitor students' growth as writers, inform instruction, provide feedback, and evaluate the effectiveness of their teaching. Students assess their own writing to appraise growth, identify strengths, and determine areas in need of further development. Peers assess other students' writing to provide them with feedback on what works in a paper and what still needs work. States and school districts assess writing to determine how many students meet local or state performance standards, identify youngsters who need extra help, and evaluate the effectiveness of individual teachers and schools. The national government administers the National Assessment of Educational Progress (NAEP) writing test to measure American students' collective writing success, evaluating students' writing performance across time. Given the heavy emphasis now placed on assessment and evaluation as a tool for improving and reforming writing and other aspects of education in the United States (Gewertz, & Robelen, 2010; National Commission on Writing, 2003), it is important to ask whether the various forms of assessment, ranging from classroom-based writing assessments to state and district evaluations (the focus of this article) do, in fact, make a difference in improving how well students write? For students with disabilities, such questions are especially important, as so many of these students experience difficulty learning to write. On the 2007 NAEP (Salahu-Din, Persky, & Miller, 2008), just 6% of eighth-grade and 5% of twelfth-grade students with disabilities performed at or above the "proficient" level in writing (defined as solid academic performance). Students scoring below this level are classified as obtaining only partial mastery of the literacy skills needed at their respective grade. Thus, this assessment indicates that 19 of every 20 students with disabilities do not acquire the writing skills needed for success in school. Although classroom-based assessments are not the focus of this article, evidence shows that such assessments can make a difference in improving how well students write. A recent meta-analysis of experimental and quasi-experimental studies conducted mostly with typically developing students (Graham, Kiuhara, McKeown, & Harris, 2011 ) provided empirical evidence that writing assessments that are part of typical classroom practices improve the overall quality of students' writing. When students receive feedback about their writing and learning progress, writing improves. When students evaluate their own writing, writing

[1]  Gail Lynn Goldberg,et al.  Maintaining Scoring Standards over a Rubric Transition Process. , 1988 .

[2]  Elana Shohamy,et al.  The Effect of Raters' Background and Training on the Reliability of Direct Writing Tests , 1992 .

[3]  R. D. De Ayala,et al.  Partial Credit Analysis of Writing Ability , 1991 .

[4]  J. Isernhagen,et al.  A Statewide Writing Assessment Model: Student Proficiency and Future Implications. , 2008 .

[5]  Belita Gordon,et al.  Using Rating Augmentation to Expand the Scale of an Analytic Rubric , 2000 .

[6]  Therese M. Kuhs,et al.  Score Resolution: An Investigation of the Reliability and Validity of Resolved Scores , 2003 .

[7]  Peter Afflerbach National Reading Conference Policy Brief: High Stakes Testing and Reading Assessment , 2005 .

[8]  George Hillocks,et al.  The Testing Trap: How State Writing Assessments Control Learning , 2002 .

[9]  S. Callahan All Done with the Best of Intentions: One Kentucky High School after Six Years of State Portfolio Tests. , 1999 .

[10]  Belita Gordon,et al.  The Relation Between Score Resolution Methods and Interrater Reliability: An Empirical Study of an Analytic Scoring Rubric , 2000 .

[11]  Samuel Totten Book Review of The Neglected "R": The Need for a Writing Revolution , 2004 .

[12]  Carol S. Parke,et al.  Impact of a state performance assessment program in reading and writing , 2006 .

[13]  M. Daane,et al.  The Nation's Report Card: Writing, 2002. , 2003 .

[14]  Maryl Gearhart Writing Portfolios at the Elementary Level: A Study of Methods for Writing Assessment. , 1992 .

[15]  A. Kan Effects of Using a Scoring Guide on Essay Scores: Generalizability Theory , 2007, Perceptual and motor skills.

[16]  Jill V. Jeffery Constructs of writing proficiency in US state and national writing assessments: Exploring variability , 2009 .

[17]  Fred I. Godshalk The Measurement of Writing Ability. , 1966 .

[18]  Gail D. Hughes,et al.  Credibly assessing reading and writing abilities for both elementary student and program assessment , 2009 .

[19]  Gavin T. L. Brown,et al.  Accuracy in the scoring of writing: Studies of reliability and validity using a New Zealand writing assessment system , 2004 .

[20]  M. Shermis,et al.  Multitrait-Multimethod Analysis of FCAT Reading and Writing , 2009 .

[21]  M. Vergeer,et al.  The assessment of writing ability: expert readers versus lay readers , 1997 .

[22]  Belita Gordon,et al.  The Effect of Rating Augmentation on Inter-Rater Reliability: An Empirical Study of a Holistic Rubric. , 2000 .

[23]  L. Veal,et al.  Mode of Discourse Variation in the Evaluation of Children's Writing. , 1971 .

[24]  Darryl M. Hunter The Use of Holistic versus Analytic Scoring for Large-Scale Assessment of Writing , 1996, Canadian Journal of Program Evaluation.

[25]  George Engelhard,et al.  The Measurement of Writing Ability With a Many-Faceted Rasch Model , 1992 .

[26]  Robert Helwig,et al.  Writing Performance Assessments , 2004, Journal of learning disabilities.

[27]  William E. Coffman ON THE VALIDITY OF ESSAY TESTS OF ACHIEVEMENT1 , 1966 .

[28]  A. Purves,et al.  The IEA Study of Written Composition II: Education and Performance in Fourteen Countries , 1993 .

[29]  Sharlene A. Kiuhara,et al.  Teaching writing to high school students: A national survey , 2009 .

[30]  Tonya R. Moon,et al.  Training and Scoring Issues Involved in Large-Scale Writing Assessments , 2005 .

[31]  John T. Behrens,et al.  Operationalizing the Rubric: The Effect of Benchmark Selection on the Assessed Quality of Writing. , 2002 .

[32]  George Engelhard,et al.  Examining Rater Errors in the Assessment of Written Composition With a Many-Faceted Rasch Model , 1994 .

[33]  G. Tindal,et al.  Reliability and Decision Consistency: An Analysis of Writing Mode at Two Times on a Statewide Test , 1999 .

[34]  Katherine Schultz Looking Across Space and Time: Reconceptualizing Literacy Learning in and out of School , 2002, Research in the Teaching of English.

[35]  M. Ranney,et al.  Expertise in Essay Scoring , 1996, ICLS.

[36]  Pamela A. Moss,et al.  A Comparison of Procedures to Assess Written Language Skills at Grades 4, 7, and 10. , 1982 .

[37]  T. Underwood,et al.  Interrater reliability in a california middle school english/language arts portfolio assessment program , 1998 .

[38]  Stephen G. Sireci,et al.  Test Accommodations for Students With Disabilities: An Analysis of the Interaction Hypothesis , 2005 .

[39]  Belita Gordon,et al.  Score Resolution and the Interrater Reliability of Holistic Scores in Rating Essays , 2001 .

[40]  RELATIONSHIPS BETWEEN ESSAY TESTS AND OBJECTIVE TESTS OF LANGUAGE SKILLS FOR ELEMENTARY SCHOOL STUDENTS , 1980 .

[41]  Lindy Crawford,et al.  When a “sloppy copy” is good enough: Results of a state writing assessment , 2008 .

[42]  Edward W. Wolfe,et al.  The relationship between essay reading style and scoring proficiency in a psychometric scoring system , 1997 .

[43]  Michael Ranney,et al.  Cognitive Differences in Proficient and Nonproficient Essay Scorers , 1998 .

[44]  Jinyan Huang,et al.  How accurate are ESL students’ holistic writing scores on large-scale assessments?—A generalizability theory approach , 2008 .

[45]  Charles A. MacArthur,et al.  Dictation and Speech Recognition Technology as Test Accommodations , 2004 .

[46]  Bonnie Albertson,et al.  Organization and Development Features of Grade 8 and Grade 10 Writers: A Descriptive Study of Delaware Student Testing Program (DSTP) Essays. , 2007 .

[47]  G. Engelhard,et al.  Writing tasks and gender: influences on writing quality of Black and White students , 1994 .

[48]  Cross-State Comparability of Judgments of Student Writing: Results from the New Standards Project. , 1992 .

[49]  S. Graham,et al.  Teaching Writing to Elementary Students in Grades 4–6: A National Survey , 2010, The Elementary School Journal.

[50]  George Engelhard,et al.  The Influences of Mode of Discourse, Experiential Demand, and Gender on the Quality of Student Writing. , 1992 .

[51]  Direct and Indirect Measures for Large-Scale Evaluation of Writing. , 1983 .

[52]  G. Engelhard,et al.  The Effects of Task Choice on the Quality of Writing Obtained in a Statewide Assessment , 1995 .

[53]  Sarah W. Beck,et al.  Genres of high-stakes writing assessments and the construct of writing competence , 2007 .

[54]  Maryl Gearhart,et al.  Toward the Instructional Utility of Large-Scale Writing Assessment: Validation of a New Narrative Rubric. Project 3.1. Studies in Improving Classroom and Local Assessments. Portfolio Assessment: Reliability of Teachers' Judgments. , 1994 .

[55]  Huub van den Bergh,et al.  Effects of Writing Instruction and Assessment on Functional Composition Performance. , 1997 .

[56]  Carl W. Swartz,et al.  Using Generalizability Theory to Estimate the Reliability of Writing Scores Derived from Holistic and Analytical Scoring Methods , 1999 .

[57]  Eva L. Baker,et al.  Assessing Writing Portfolios: issues in the Validity and Meaning of Scores , 1993 .

[58]  Douglas S. Finlayson,et al.  THE RELIABILITY OF THE MARKING OF ESSAYS , 1951 .

[59]  Isabelle Beaudoin,et al.  Comparability of Writing Assessment Scores Across Languages: Searching for Evidence of Valid Interpretations , 2009 .

[60]  Cary H. Grobe,et al.  Syntactic Maturity, Mechanics of Writing, and Teachers' Quality Ratings. , 1979 .

[61]  Rainer H. Lehmann,et al.  Reliability and Generalizability of Ratings of Compositions. , 1990 .

[62]  Pu Hwang Errors and improvement in rating English compositions by means of a composition scale , 1930 .

[63]  Catherine J. Welch,et al.  Assessing Differential Item Functioning in Direct Writing Assessments: Problems and an Example , 1995 .

[64]  Thomas M. Haladyna,et al.  An Evaluation of Conjunctive and Compensatory Standard-Setting Strategies for Test Decisions , 1999 .