Broad linguistic modeling is beneficial for German L2 proficiency assessment

We investigate the applicability of a broad range of language features to German second language proficiency assessment by comparing the performance of classification models based on linguistically diverse vs. homogeneous feature groups in terms of their overall performance and their success at individual proficiency levels (A1 to C1/ C2). For this, we extract 400 measures of linguistic complexity from the domains of syntax, lexicon, morphology, discourse, language use, and human language processing. Overall, our results show that a broad feature set integrating aspects of language as a system, language use, and human sentence processing costs results in higher classification performance on language learner data. At individual proficiency levels, lexical complexity in particular, but also clausal and phrasal complexities as well as discourse measures successfully distinguish several proficiency levels. Morphological complexity is particularly important for more advanced learners.

[1]  Erhard W. Hinrichs,et al.  GernEdiT - The GermaNet Editing Tool , 2010, LREC.

[2]  Joakim Nivre,et al.  A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing , 2012, EMNLP.

[3]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[4]  S. Wood Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models , 2011 .

[5]  Alexander Geyken,et al.  dlexDB : eine lexikalische Datenbank für die psychologische und linguistische Forschung , 2011 .

[6]  S. Wood Stable and Efficient Multiple Smoothing Parameter Estimation for Generalized Additive Models , 2004 .

[7]  S. Wood Thin plate regression splines , 2003 .

[8]  Christopher D. Manning,et al.  Parsing Three German Treebanks: Lexicalized and Unlexicalized Baselines , 2008 .

[9]  P. Robinson Task complexity, task difficulty, and task production: exploring interactions in a componential framework , 2001 .

[10]  Folkert Kuiken,et al.  Dimensions of L2 Performance and Proficiency: Complexity, Accuracy and Fluency in SLA. Language Learning & Language Teaching. Volume 32. , 2012 .

[11]  Walt Detmar Meurers,et al.  Readability Classification for German using Lexical, Syntactic, and Morphological Features , 2012, COLING.

[12]  Rod Ellis,et al.  Analysing Learner Language , 2005 .

[13]  N. Rescher Complexity: A Philosophical Overview , 1998 .

[14]  Xiaofei Lu,et al.  Automatic analysis of syntactic complexity in second language writing , 2010 .

[15]  Danielle S. McNamara,et al.  The development and use of cohesive devices in L2 writing and their relations to judgments of essay quality , 2016 .

[16]  Xiaofei Lu The Relationship of Lexical Richness to the Quality of ESL Learners' Oral Narratives. , 2012 .

[17]  E. Gibson The dependency locality theory: A distance-based theory of linguistic complexity. , 2000 .

[18]  P. Skehan 语言学习认知法 = A cognitive approach to language learning , 1998 .

[19]  Walt Detmar Meurers,et al.  MERLIN : An Online Trilingual Learner Corpus Empirically Grounding the European Reference Levels in Authentic Learner Data , 2013 .

[20]  Naoko Taguchi,et al.  What Linguistic Features Are Indicative of Writing Quality? A Case of Argumentative Essays in a College Composition Program , 2013 .

[21]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[22]  Kristopher Kyle,et al.  Measuring Syntactic Development in L2 Writing: Fine Grained Indices of Syntactic Complexity and Usage-Based Indices of Syntactic Sophistication , 2016 .

[23]  Walt Detmar Meurers,et al.  CTAP: A Web-Based Tool Supporting Automatic Complexity Analysis , 2016, CL4LC@COLING 2016.

[24]  Laura K. Allen,et al.  Linguistic Microfeatures to Predict L2 Writing Proficiency: A Case Study in Automated Writing Evaluation. , 2014 .

[25]  Philip M. McCarthy,et al.  Linguistic Features of Writing Quality , 2010 .

[26]  Walt Detmar Meurers,et al.  Task Effects on Linguistic Complexity and Accuracy: A Large-Scale Learner Corpus Analysis Employing Natural Language Processing Techniques , 2017 .

[27]  Liliana Tolchinsky,et al.  Developing linguistic literacy: a comprehensive model , 2002, Journal of Child Language.

[28]  Danielle S. McNamara,et al.  Does writing development equal writing quality? A computational investigation of syntactic complexity in L2 learners , 2014 .

[29]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[30]  Walt Detmar Meurers,et al.  Automatic Focus Annotation: Bringing Formal Pragmatics Alive in Analyzing the Information Structure of Authentic Data , 2018, NAACL.

[31]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[32]  Pauline Foster,et al.  Task type and task processing conditions as influences on foreign language performance , 1997 .

[33]  Yukio Tono Criterial feature extraction using parallel learner corpora and machine learning , 2013 .

[34]  Alex Housen,et al.  Conceptualizing and measuring short-term changes in L2 writing complexity , 2014 .

[35]  Delphine Bernhard,et al.  Coherence and Cohesion for the Assessment of Text Readability , 2013, NLPCS 2013.

[36]  Cédrick Fairon,et al.  An “AI readability” Formula for French as a Foreign Language , 2012, EMNLP.

[37]  Charlene Polio,et al.  SECOND LANGUAGE DEVELOPMENT IN WRITING: MEASURES OF FLUENCY, ACCURACY, AND COMPLEXITY. Kate Wolfe-Quintero, Shunji Inagaki, and Hae-Young Kim. Honolulu: University of Hawai‘i Press, 1998. Pp. viii + 187. $20.00 paper. , 2001, Studies in Second Language Acquisition.

[38]  L. Ortega Syntactic Complexity Measures and Their Relationship to L2 Proficiency: A Research Synthesis of College-Level L2 Writing. , 2003 .

[39]  J. Norris,et al.  Towards an Organic Approach to Investigating CAF in Instructed SLA: The Case of Complexity , 2009 .

[40]  William Schuler,et al.  Memory access during incremental sentence processing causes reading time latency , 2016, CL4LC@COLING 2016.

[41]  Lourdes Ortega,et al.  Interlanguage complexity A construct in search of theoretical renewal , 2012 .

[42]  Douglas Biber,et al.  Should we use characteristics of conversation to measure grammatical complexity in L2 writing development , 2011 .

[43]  Pauline Foster,et al.  Native Speakers and Task Performance: Comparing Effects on Complexity, Fluency, and Lexical Diversity. , 2009 .

[44]  Hyung-Jo Yoon,et al.  The Linguistic Development of Students of English as a Second Language in Two Written Genres , 2017 .

[45]  Xiaofei Lu,et al.  Syntactic complexity in college-level English writing: Differences among writers with diverse L1 backgrounds , 2015 .

[46]  Pauline Foster,et al.  Task Design and Second Language Performance: The Effect of Narrative Type on Learner Output , 2008 .

[47]  Sebastian Stüker,et al.  Preparing children's writing database for automated processing , 2015, LTLT@SLaTE.

[48]  Danielle S. McNamara,et al.  Predicting Second Language Writing Proficiency: The Roles of Cohesion and Linguistic Sophistication , 2012 .

[49]  Sven Hartrumpf,et al.  A Readability Checker with Supervised Learning Using Deep Indicators , 2008, Informatica.

[50]  Andrea Abel A Trilingual Learner Corpus illustrating European Reference Levels , 2014 .

[51]  Magali Paquot,et al.  The phraseological dimension in interlanguage complexity research , 2019 .

[52]  Nicole Tracy-Ventura,et al.  The importance of task variability in the design of learner corpora for SLA research , 2015 .