Automated Assessment of Non-Native Learner Essays: Investigating the Role of Linguistic Features

Automatic essay scoring (AES) refers to the process of scoring free text responses to given prompts, considering human grader scores as the gold standard. Writing such essays is an essential component of many language and aptitude exams. Hence, AES became an active and established area of research, and there are many proprietary systems used in real life applications today. However, not much is known about which specific linguistic features are useful for prediction and how much of this is consistent across datasets. This article addresses that by exploring the role of various linguistic features in automatic essay scoring using two publicly available datasets of non-native English essays written in test taking scenarios. The linguistic properties are modeled by encoding lexical, syntactic, discourse and error types of learner language in the feature set. Predictive models are then developed using these features on both datasets and the most predictive features are compared. While the results show that the feature set used results in good predictive models with both datasets, the question ”what are the most predictive features?” has a different answer for each dataset.

[1]  Xiaofei Lu The Relationship of Lexical Richness to the Quality of ESL Learners' Oral Narratives. , 2012 .

[2]  Sowmya Vajjala,et al.  Automatic CEFR Level Prediction for Estonian Learner Text , 2014 .

[3]  Delphine Bernhard,et al.  Coherence and Cohesion for the Assessment of Text Readability , 2013, NLPCS 2013.

[4]  Xiaofei Lu A Corpus-Based Evaluation of Syntactic Complexity Measures as Indices of College-Level ESL Writers' Language Development , 2011 .

[5]  Benno Stein,et al.  The Eras and Trends of Automatic Short Answer Grading , 2015, International Journal of Artificial Intelligence in Education.

[6]  Sowmya Vajjala Balakrishna,et al.  Analyzing Text Complexity and Text Simplification: Connecting Linguistics, Processing and Educational Applications , 2015 .

[7]  Laura K. Allen,et al.  Linguistic Microfeatures to Predict L2 Writing Proficiency: A Case Study in Automated Writing Evaluation. , 2014 .

[8]  Robert Östling,et al.  Automated Essay Scoring for Swedish , 2013, BEA@NAACL-HLT.

[9]  Philip M. McCarthy,et al.  MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment , 2010, Behavior research methods.

[10]  Marko Robnik-Sikonja,et al.  An adaptation of Relief for attribute estimation in regression , 1997, ICML.

[11]  Ani Nenkova,et al.  Using Syntax to Disambiguate Explicit Discourse Connectives in Text , 2009, ACL.

[12]  Semire Dikli,et al.  An Overview of Automated Scoring of Essays. , 2006 .

[13]  Danielle S. McNamara,et al.  The development and use of cohesive devices in L2 writing and their relations to judgments of essay quality , 2016 .

[14]  Scott A. Crossley,et al.  Native language identification and writing proficiency , 2015 .

[15]  Seongho Kim ppcor: An R Package for a Fast Calculation to Semi-partial Correlation Coefficients. , 2015, Communications for statistical applications and methods.

[16]  Joel R. Tetreault,et al.  Using Entity-Based Features to Model Coherence in Student Essays , 2010, HLT-NAACL.

[17]  Xiaofei Lu,et al.  Syntactic complexity in college-level English writing: Differences among writers with diverse L1 backgrounds , 2015 .

[18]  Arthur C. Graesser,et al.  Coh-Metrix , 2011 .

[19]  Torsten Zesch,et al.  Task-Independent Features for Automated Essay Grading , 2015, BEA@NAACL-HLT.

[20]  Martin Chodorow,et al.  TOEFL11: A CORPUS OF NON‐NATIVE ENGLISH , 2013 .

[21]  Birk Diedenhofen,et al.  cocor: A Comprehensive Solution for the Statistical Comparison of Correlations , 2015, PloS one.

[22]  Beata Beigman Klebanov,et al.  Automated Essay Scoring , 2021, Synthesis Lectures on Human Language Technologies.

[23]  Alex Housen,et al.  Complexity, accuracy and fluency in second language acquisition , 2009 .

[24]  Jill Burstein,et al.  Automated Essay Scoring : A Cross-disciplinary Perspective , 2003 .

[25]  Neil J. Dorans,et al.  EXAMINING FREEDLE'S CLAIMS ABOUT BIAS AND HIS PROPOSED SOLUTION: DATED DATA, INAPPROPRIATE MEASUREMENT, AND INCORRECT AND UNFAIR SCORING , 2004 .

[26]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[27]  Lijun Feng,et al.  A Comparison of Features for Automatic Readability Assessment , 2010, COLING.

[28]  Danielle S. McNamara,et al.  Using Automatic Scoring Models to Detect Changes in Student Writing in an Intelligent Tutoring System , 2013, FLAIRS Conference.

[29]  W. Bruce Croft,et al.  Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2013 .

[30]  Jill Burstein,et al.  AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .

[31]  Sowmya Vajjala,et al.  Analyzing text complexity and text simplification : connecting linguistics, processing and educational applications , 2015 .

[32]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[33]  Roger Levy,et al.  Tregex and Tsurgeon: tools for querying and manipulating tree data structures , 2006, LREC.

[34]  Danielle S. McNamara,et al.  Cohesion-Based Prompt Effects in Argumentative Writing , 2013, FLAIRS.

[35]  S. Granger,et al.  Connector usage in the English essay writing of native and non‐native EFL speakers of English , 1996 .

[36]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[37]  E. B. Page Project Essay Grade: PEG. , 2003 .

[38]  Xiaofei Lu,et al.  Different topics, different discourse: Relationships among writing topic, measures of syntactic complexity, and judgments of writing quality , 2015 .

[39]  吉島 茂,et al.  文化と言語の多様性の中のCommon European Framework of Reference for Languages: Learning, teaching, assessment (CEFR)--それは基準か? (第10回明海大学大学院応用言語学研究科セミナー 講演) , 2008 .

[40]  Helen Yannakoudakis,et al.  A New Dataset and Method for Automatically Grading ESOL Texts , 2011, ACL.

[41]  M. Chodorow,et al.  BEYOND ESSAY LENGTH: EVALUATING E-RATER®'S PERFORMANCE ON TOEFL® ESSAYS , 2004 .

[42]  Jill Burstein,et al.  The E-rater® scoring engine: Automated essay scoring with natural language processing. , 2003 .

[43]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[44]  Sowmya Vajjala,et al.  Role of Morpho-Syntactic Features in Estonian Proficiency Classification , 2013, BEA@NAACL-HLT.

[45]  Andrea Horbach,et al.  Using Shallow Syntactic Features to Measure Influences of L1 and Proficiency Level in EFL Writings , 2015 .

[46]  Hwee Tou Ng,et al.  Flexible Domain Adaptation for Automated Essay Scoring Using Correlated Linear Regression , 2015, EMNLP.

[47]  Helen Yannakoudakis,et al.  Modeling coherence in ESOL learner texts , 2012, BEA@NAACL-HLT.

[48]  Martin Chodorow,et al.  Native Tongues, Lost and Found: Resources and Empirical Evaluations in Native Language Identification , 2012, COLING.

[49]  Danielle S. McNamara,et al.  To Aggregate or Not? Linguistic Features in Automatic Essay Scoring and Feedback Systems. , 2015 .

[50]  Xiaofei Lu,et al.  Automatic analysis of syntactic complexity in second language writing , 2010 .

[51]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[52]  Prema Nedungadi,et al.  Unsupervised Word Sense Disambiguation for Automatic Essay Scoring , 2014 .

[53]  Nina Vyatkina,et al.  The Development of Second Language Writing Complexity in Groups and Individuals: A Longitudinal Learner Corpus Study , 2012 .