Investigating the Application of Automated Writing Evaluation to Chinese Undergraduate English Majors: A Case Study of WriteToLearn

This study investigated the application of WriteToLearn on Chinese undergraduate English majors’ essays in terms of its scoring ability and the accuracy of its error feedback. Participants were 163 second-year English majors from a university located in Sichuan province who wrote 326 essays from two writing prompts. Each paper was marked by four human raters as well as WriteToLearn. Many-facet Rasch measurement (MFRM) was conducted to calibrate WriteToLearn’s rating performance in scoring the whole set of essays against those of four trained human raters. The accuracy of WriteToLearn’s feedback on 60 randomly selected essays was compared with the feedback provided by human raters. The two main findings related to scoring were that WriteToLearn was more consistent but highly stringent relative to the four trained human raters in scoring essays and that it failed to score 7 essays. In terms of error feedback, WriteToLearn had an overall precision and recall of 49% and 18.7% respectively. These figures did not meet the minimum threshold of 90% precision for it to be a reliable error detecting tool set by Burstein, Chodorow, and Leacock (2003). Furthermore, it had difficulty in identifying the errors made by Chinese undergraduate English majors in the use of articles, prepositions, word choice and expression.

[1]  Vahid Aryadoust,et al.  Predicting EFL writing ability from levels of mental representation measured by Coh-Metrix: A structural equation modeling study , 2015 .

[2]  Semire Dikli,et al.  Automated Essay Scoring feedback for second language writers: How does it compare to instructor feedback? , 2014 .

[3]  Martin Chodorow,et al.  CriterionSM Online Essay Evaluation: An Application for Automated Evaluation of Student Essays , 2003, IAAI.

[4]  Claudia Leacock,et al.  Automated Grammatical Error Correction for Language Learners , 2010, COLING.

[5]  Laura K. Allen,et al.  A Hierarchical Classification Approach to Automated Essay Scoring. , 2015 .

[6]  Dana R. Ferris,et al.  Written corrective feedback for individual L2 writers , 2013 .

[7]  Na-Rae Han,et al.  Detecting errors in English article usage by non-native speakers , 2006, Natural Language Engineering.

[8]  Aek Phakiti,et al.  The effects of computer-generated feedback on the quality of writing , 2014 .

[9]  Sara Cushing Weigle,et al.  English language learners and automated scoring of essays: Critical considerations , 2013 .

[10]  Andrea Everard,et al.  Does spell-checking software need a warning label? , 2005, CACM.

[11]  Sara Cushing Weigle English as a Second Language Writing and Automated Essay Evaluation , 2013 .

[12]  T. Landauer Automatic Essay Assessment , 2003 .

[13]  Martin Chodorow,et al.  Native Judgments of Non-Native Usage: Experiments in Preposition Error Detection , 2008, COLING 2008.

[14]  Chi-Fen Emily Chen,et al.  Beyond the Design of Automated Writing Evaluation: Pedagogical Practices and Perceived Learning Effectiveness in EFL Writing Classes. , 2008 .

[15]  Volker Hegelheimer,et al.  Rethinking the role of automated writing evaluation (AWE) feedback in ESL writing instruction , 2015 .

[16]  Mark Warschauer,et al.  Automated writing evaluation: defining the classroom research agenda , 2006 .

[17]  Jill Burstein,et al.  Automated Essay Scoring : A Cross-disciplinary Perspective , 2003 .

[18]  Mark D. Shermis,et al.  State-of-the-art automated essay scoring: Competition, results, and future directions from a United States demonstration , 2014 .

[19]  Donald E. Powers,et al.  STUMPING E‐RATER: CHALLENGING THE VALIDITY OF AUTOMATED ESSAY SCORING , 2001 .

[20]  Brent Bridgeman Human Ratings and Automated Essay Evaluation , 2013 .

[21]  Volker Hegelheimer,et al.  The role of automated writing evaluation holistic scores in the ESL classroom , 2014 .

[22]  M. Warschauer,et al.  Automated Writing Assessment in the Classroom , 2008 .

[23]  Martin Chodorow,et al.  Automated Scoring Using A Hybrid Feature Identification Technique , 1998, ACL.

[24]  Jill Burstein,et al.  The E-rater® scoring engine: Automated essay scoring with natural language processing. , 2003 .

[25]  Peter W. Foltz,et al.  The intelligent essay assessor: Applications to educational technology , 1999 .

[26]  Yigal Attali,et al.  Validity and Reliability of Automated Essay Scoring , 2013 .

[27]  Jill Burstein,et al.  AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .

[28]  Les Perelman,et al.  When “the state of the art” is counting words , 2014 .

[29]  Martin Chodorow,et al.  The Ups and Downs of Preposition Error Detection in ESL Writing , 2008, COLING.