The challenges of emulating human behavior in writing assessment

Abstract This is a response to Dr. Les Perelman's critique of Phase I of the Hewlett Trials. His argument is that the construct validity of the study was undermined because there was a high correlation between word count and vendor predicted scores. The response addresses the argument by showing that correlations do not mean causation. Further the reply illustrates how predications are actually formulated in automated essay scoring. The response concludes with an appeal for more research on the underlying constructs associated with writing.

[1]  Paul Deane,et al.  On the relation between automated essay scoring and modern views of the writing construct , 2013 .

[2]  Edward W. Wolfe,et al.  Identifying Rater Effects Using Latent Trait Models , 2004 .

[3]  Andrew Klobucar,et al.  Automated Scoring in Context: Rapid Assessment for Placed Students. , 2013 .

[4]  Computer Grading of Essay Traits in Student Writing , 1996 .

[5]  Andrew Klobucar,et al.  Automated Essay Evaluation and the Teaching of Writing , 2013 .

[6]  Ellis B. Page,et al.  Statistical and Linguistic Strategies in the Computer Grading of Essays , 1967, COLING.

[7]  Mary L. DeRemer,et al.  Writing assessment: Raters' elaboration of the rating task , 1998 .

[8]  Ben Hamner,et al.  Contrasting state-of-the-art automated scoring of essays: analysis , 2012 .

[9]  Les Perelman,et al.  When “the state of the art” is counting words , 2014 .

[10]  Peter Elbow,et al.  Writing Without Teachers , 1973 .

[11]  William Condon,et al.  Large-Scale Assessment, Locally-Developed Measures, and Automated Scoring of Essays: Fishing for Red Herrings? , 2013 .

[12]  Jill Burstein,et al.  AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .

[13]  Deborah McCutchen,et al.  Writing and cognition: Implications of the cognitive architecture for learning to write and writing to learn. , 2007 .

[14]  David M. Williamson,et al.  A Framework for Evaluation and Use of Automated Scoring , 2012 .

[15]  Mark D. Shermis,et al.  State-of-the-art automated essay scoring: Competition, results, and future directions from a United States demonstration , 2014 .

[16]  Michael Ranney,et al.  Cognitive Differences in Proficient and Nonproficient Essay Scorers , 1998 .

[17]  B. Huot,et al.  The Literature of Direct Writing Assessment: Major Concerns and Prevailing Trends , 1990 .

[18]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[19]  Joel R. Tetreault,et al.  Using Entity-Based Features to Model Coherence in Student Essays , 2010, HLT-NAACL.