Comparing the accuracy of different scoring methods for identifying sixth graders at risk of failing a state writing assessment

Abstract Students who fail state writing tests may be subject to a number of negative consequences. Identifying students who are at risk of failure affords educators time to intervene and prevent such outcomes. Yet, little research has examined the classification accuracy of predictors used to identify at-risk students in the upper-elementary and middle-school grades. Hence, the current study compared multiple scoring methods with regards to their accuracy for identifying students at risk of failing a state writing test. In the fall of 2012, students composed a persuasive prompt in response to a computer-based benchmark writing test, and in the spring of 2013 they participated in the state writing assessment. Predictor measures included prior writing achievement, human holistic scoring, automated essay scoring via Project Essay Grade (PEG), total words written, compositional spelling, and sentence accuracy. Classification accuracy was measured using the area under the ROC curve. Results indicated that prior writing achievement and PEG Overall Score had the highest classification accuracy. A multivariate model combining these two measures resulted in only slight improvements over univariate prediction models. Study findings indicated that choice of scoring method affects classification accuracy, and automated essay scoring can be used to accurately identify at-risk students.

[1]  R. Linn Educational measurement, 3rd ed. , 1989 .

[2]  Leslie Nabors Oláh,et al.  Introduction to the Special Issue on Benchmarks for Success? Interim Assessments as a Strategy for Educational Improvement , 2010 .

[3]  Christine A. Espin,et al.  The Relationship Between Curriculum-Based Measures in Written Expression and Quality and Completeness of Expository Writing for Middle School Students , 2005 .

[4]  Michael Hebert,et al.  Informing writing: The benefits of formative assessment. A Carnegie Corporation Time to Act report , 2011 .

[5]  George H. Noell,et al.  An examination of the criterion validity and sensitivity to brief intervention of alternate curriculum‐based measures of writing skill , 2004 .

[6]  Kristen D. Ritchey,et al.  Identifying Writing Difficulties in First Grade: An Investigation of Writing and Reading Measures , 2014 .

[7]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[8]  Kristen L. McMaster,et al.  Using Curriculum-Based Measurement for Beginning Writers Within a Response to Intervention Framework , 2012 .

[9]  Virginia W. Berninger,et al.  Integrating Low- and High-Level Skills in Instructional Protocols for Writing Disabilities , 1995 .

[10]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[11]  Michael T. Kane,et al.  An argument-based approach to validity. , 1992 .

[12]  R. T. Kellogg,et al.  Training Advanced Writing Skills: The Case for Deliberate Practice , 2009 .

[13]  Natalie G. Olinghouse,et al.  The relationship between vocabulary and writing quality in three genres , 2012, Reading and Writing.

[14]  Gerald Tindal,et al.  Countable Indices of Writing Quality: Their Suitability for Screening-Eligibility Decisions. , 1991 .

[15]  Douglas Fuchs,et al.  Smart RTI: A Next-Generation Approach to Multilevel Prevention , 2012, Exceptional children.

[16]  Timothy Z. Keith,et al.  Computer Analysis of Student Essays: Finding Trait Differences in Student Profile , 1997 .

[17]  Laura A. Barquero,et al.  Selecting At-Risk First-Grade Readers for Early Intervention: Eliminating False Positives and Exploring the Promise of a Two-Stage Gated Screening Process. , 2010, Journal of educational psychology.

[18]  Evelyn S. Johnson,et al.  How Can We Improve the Accuracy of Screening Instruments , 2009 .

[19]  Deborah McCutchen,et al.  From novice to expert: Implications of language skills and writing-relevant knowledge for memory during the development of writing skill , 2011 .

[20]  William Wresch,et al.  The Imminence of Grading Essays by Computer-25 Years Later , 1993 .

[21]  Paul Deane,et al.  On the relation between automated essay scoring and modern views of the writing construct , 2013 .

[22]  Steve Graham,et al.  Writing Next: Effective Strategies to Improve Writing of Adolescents in Middle and High Schools. A Report to Carnegie Corporation of New York. , 2007 .

[23]  P. Macmillan,et al.  School-Based Evidence for the Validity of Curriculum-Based Measurement of Reading and Writing , 2002 .

[24]  Margaret E. Goertz,et al.  Mapping the Landscape of High-Stakes Testing and Accountability Programs , 2003 .

[25]  D. Borsboom Educational Measurement (4th ed.) , 2009 .

[26]  Amanda M. VanDerHeyden,et al.  Moving Beyond Total Words Written: The Reliability, Criterion Validity, and Time Cost of Alternate Measures for Curriculum-Based Measurement in Writing , 2002 .

[27]  J. Windsor,et al.  General language performance measures in spoken and written narrative and expository discourse of school-age children with language learning disabilities. , 2000, Journal of speech, language, and hearing research : JSLHR.

[28]  Robert J. Mislevy,et al.  Implications of Evidence‐Centered Design for Educational Testing , 2007 .

[29]  Scott Naftel,et al.  Explore RAND Education , 2004 .

[30]  Howard R. Mzumara,et al.  On-line Grading of Student Essays: PEG goes on the World Wide Web , 2001 .

[31]  Spss,et al.  Discovering Statistics (4th Edition ed. , 2013 .

[32]  Fred I. Godshalk The Measurement of Writing Ability. , 1966 .

[33]  Karen R. Harris,et al.  Assessing the Writing Achievement ofYoung Struggling Writers: Application ofGeneralizability Theory , 2016 .

[34]  Alija Kulenović,et al.  Standards for Educational and Psychological Testing , 1999 .

[35]  J. Hayes Modeling and Remodeling Writing , 2012 .

[36]  Les Perelman,et al.  When “the state of the art” is counting words , 2014 .

[37]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[38]  Mark D. Shermis,et al.  State-of-the-art automated essay scoring: Competition, results, and future directions from a United States demonstration , 2014 .

[39]  Natalie G. Olinghouse,et al.  State Writing Assessment: Inclusion of Motivational Factors in Writing Tasks , 2012 .

[40]  Walt Haney The Myth of the Texas Miracle in Education , 2000 .

[41]  Steve Graham,et al.  Throw 'em out or make 'em better? State and district high-stakes writing assessments , 2011 .

[42]  Linda Darling-Hammond Standards, Accountability, and School Reform , 2004, Teachers College Record: The Voice of Scholarship in Education.

[43]  Nickola Wolf Nelson,et al.  Measuring Written Language Ability in Narrative Samples , 2007 .

[44]  B. Wilson,et al.  Early-Screening Programs: When is Predictive Accuracy Sufficient? , 1985 .

[45]  Belita Gordon,et al.  The Effect of Rating Augmentation on Inter-Rater Reliability: An Empirical Study of a Holistic Rubric. , 2000 .

[46]  E. Wijsman,et al.  Writing problems in developmental dyslexia: under-recognized and under-treated. , 2008, Journal of school psychology.

[47]  T. Keith,et al.  Trait Ratings for Automated Essay Grading , 2002 .

[48]  Scott Marion,et al.  Moving toward a Comprehensive Assessment System: A Framework for Considering Interim Assessments. , 2009 .

[49]  B. Huot,et al.  Reliability, Validity, and Holistic Scoring: What We Know and What We Need to Know , 1990 .

[50]  Andy P. Field,et al.  Discovering Statistics Using Ibm Spss Statistics , 2017 .

[51]  L. Cronbach Five perspectives on the validity argument. , 1988 .

[52]  Kristen D. Ritchey,et al.  Universal Screening for Writing Risk in Kindergarten , 2014 .

[53]  C. Espin,et al.  Identifying Indicators of Written Expression Proficiency for Middle School Students , 2000 .

[54]  P. Meehl,et al.  Antecedent probability and the efficiency of psychometric signs, patterns, or cutting scores. , 1955, Psychological bulletin.

[55]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[56]  Linda Flower,et al.  The Dynamics of Composing : Making Plans and Juggling Constraints , 1980 .

[57]  M. Warschauer,et al.  Automated Writing Assessment in the Classroom , 2008 .

[58]  N. Elliot On a Scale: A Social History of Writing Assessment in America , 2005 .

[59]  Brett D. Jones,et al.  The Impact of High-Stakes Testing on Teachers and Students in North Carolina. , 1999 .

[60]  Jay P. Heubert,et al.  High stakes : testing for tracking, promotion, and graduation , 1999 .

[61]  S. Bolt,et al.  Challenges and Opportunities for Promoting Student Achievement Through Large-Scale Assessment Results , 2008 .

[62]  E. B. Page Computer Grading of Student Prose, Using Modern Concepts and Software , 1994 .

[63]  E. B. Page Project Essay Grade: PEG. , 2003 .

[64]  Francesca A. López,et al.  The Relationship Among Measures of Written Expression Using Curriculum-Based Measurement and the Arizona Instrument to Measure Skills (AIMS) at the Middle School Level , 2010 .

[65]  D. McCutchen,et al.  “Functional Automaticity” in Children's Writing , 1988 .

[66]  D. McCutchen A capacity theory of writing: Working memory in composition , 1996 .

[67]  Evelyn S. Johnson,et al.  Screening for At-Risk Readers in a Response to Intervention Framework , 2007 .

[68]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[69]  Kristen L. McMaster,et al.  New and Existing Curriculum-Based Writing Measures: Technical Features Within and Across Grades , 2008 .