Computationally Modeling the Impact of Task-Appropriate Language Complexity and Accuracy on Human Grading of German Essays

Computational linguistic research on the language complexity of student writing typically involves human ratings as a gold standard. However, educational science shows that teachers find it difficult to identify and cleanly separate accuracy, different aspects of complexity, contents, and structure. In this paper, we therefore explore the use of computational linguistic methods to investigate how task-appropriate complexity and accuracy relate to the grading of overall performance, content performance, and language performance as assigned by teachers. Based on texts written by students for the official school-leaving state examination (Abitur), we show that teachers successfully assign higher language performance grades to essays with higher task-appropriate language complexity and properly separate this from content scores. Yet, accuracy impacts teacher assessment for all grading rubrics, also the content score, overemphasizing the role of accuracy. Our analysis is based on broad computational linguistic modeling of German language complexity and an innovative theory- and data-driven feature aggregation method inferring task-appropriate language complexity.

[1]  Vincent Connelly,et al.  Rated age-of-acquisition norms for over 3,200 German words , 2016, Behavior Research Methods.

[2]  Jill Burstein,et al.  Handbook of Automated Essay Evaluation Current Applications and New Directions , 2018 .

[3]  Philip M. McCarthy,et al.  Linguistic Features of Writing Quality , 2010 .

[4]  Walt Detmar Meurers,et al.  Readability Classification for German using Lexical, Syntactic, and Morphological Features , 2012, COLING.

[5]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[6]  Dale P. Scannell,et al.  The Effect of Selected Composition Errors on Grades Assigned to Essay Examinations , 1966 .

[7]  Miriam Morek,et al.  Bildungssprache ‒ Kommunikative, epistemische,soziale und interaktive Aspekte ihres Gebrauchs , 2012 .

[8]  Carlo Brune 1.6. Bildungsstandards im Fach Deutsch für die Allgemeine Hochschulreife , 2020, Literarästhetische Literalität.

[9]  D. Larsen-Freeman The Emergence of Complexity, Fluency, and Accuracy in the Oral and Written Production of Five Chinese Learners of English , 2006 .

[10]  Elena Volodina,et al.  A Readable Read: Automatic Assessment of Language Learning Materials based on Linguistic Complexity , 2016, Int. J. Comput. Linguistics Appl..

[11]  Walt Detmar Meurers,et al.  Modeling the Readability of German Targeting Adults and Children: An empirically broad analysis and its cross-corpus validation , 2018, COLING.

[12]  Catherine E. Snow,et al.  The Challenge of Academic Language , 2009 .

[13]  Jens Möller,et al.  The influence of lexical features on teacher judgements of ESL argumentative essays , 2019, Assessing Writing.

[14]  Hyung-Jo Yoon,et al.  The Linguistic Development of Students of English as a Second Language in Two Written Genres , 2017 .

[15]  Robert Reynolds,et al.  Russian natural language processing for computer-assisted language learning: Capturing the benefits of deep morphological analysis in real-life applications , 2016 .

[16]  Gabriele Pallotti,et al.  A simple view of linguistic complexity , 2015 .

[17]  Magali Paquot,et al.  The phraseological dimension in interlanguage complexity research , 2019 .

[18]  Walt Detmar Meurers,et al.  Task Effects on Linguistic Complexity and Accuracy: A Large-Scale Learner Corpus Analysis Employing Natural Language Processing Techniques , 2017 .

[19]  G. Pallotti CAF: Defining, Refining and Differentiating Constructs , 2009 .

[20]  Mary J. Schleppegrell Linguistic Features of the Language of Schooling , 2001 .

[21]  Rod Ellis,et al.  Task-based Language Learning and Teaching , 2003 .

[22]  Shunji Inagaki,et al.  Second Language Development in Writing: Measures of Fluency, Accuracy, and Complexity , 1998 .

[23]  J. Lim,et al.  An investigation of the text features of discrepantly-scored ESL essays: A mixed methods study , 2019, Assessing Writing.

[24]  Xiaofei Lu,et al.  Different topics, different discourse: Relationships among writing topic, measures of syntactic complexity, and judgments of writing quality , 2015 .

[25]  Ali Reza Rezaei,et al.  Reliability and validity of rubrics for assessment through writing , 2010 .

[26]  P. Skehan A FRAMEWORK FOR THE IMPLEMENTATION OF TASK-BASED INSTRUCTION , 1996 .

[27]  Hong Jiao,et al.  Features of difficult-to-score essays , 2016 .

[28]  Helen Yannakoudakis,et al.  Developing an automated writing placement system for ESL learners , 2018 .

[29]  Arthur C. Graesser,et al.  Coh-Metrix: Analysis of text on cohesion and language , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[30]  William Schuler,et al.  Memory access during incremental sentence processing causes reading time latency , 2016, CL4LC@COLING 2016.

[31]  Danielle S. McNamara,et al.  Predicting human judgments of essay quality in both integrated and independent second language writing samples: A comparison study , 2013 .

[32]  Zarah Weiss,et al.  Analyzing Linguistic Complexity and Accuracy in Academic Language Development of German across Elementary and Secondary School , 2019, BEA@ACL.

[33]  Kristopher Kyle,et al.  Measuring Syntactic Development in L2 Writing: Fine Grained Indices of Syntactic Complexity and Usage-Based Indices of Syntactic Sophistication , 2016 .

[34]  Hyung-Jo Yoon,et al.  Linguistic complexity in L2 writing revisited: Issues of topic, proficiency, and construct multidimensionality , 2017 .

[35]  Alister Cumming,et al.  Decision Making While Rating ESL/EFL Writing Tasks: A Descriptive Framework. , 2002 .

[36]  Walt Detmar Meurers,et al.  Characterizing Text Difficulty with Word Frequencies , 2016, BEA@NAACL-HLT.

[37]  Alex Housen,et al.  Defining and operationalising L2 complexity , 2012 .

[38]  Yogendra Patil,et al.  Exploring the relationship between textual characteristics and rating quality in rater-mediated writing assessments: An illustration with L1 and L2 writing assessments , 2017 .

[39]  Alex Housen,et al.  Conceptualizing and measuring short-term changes in L2 writing complexity , 2014 .

[40]  Walt Detmar Meurers,et al.  On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition , 2012, BEA@NAACL-HLT.

[41]  E. Gibson The dependency locality theory: A distance-based theory of linguistic complexity. , 2000 .

[42]  P. Robinson Task complexity, task difficulty, and task production: exploring interactions in a componential framework , 2001 .

[43]  Walt Detmar Meurers,et al.  Broad linguistic modeling is beneficial for German L2 proficiency assessment , 2017 .

[44]  Sowmya Vajjala Automated Assessment of Non-Native Learner Essays: Investigating the Role of Linguistic Features , 2017, International Journal of Artificial Intelligence in Education.