Experiments with Universal CEFR Classification

The Common European Framework of Reference (CEFR) guidelines describe language proficiency of learners on a scale of 6 levels. While the description of CEFR guidelines is generic across languages, the development of automated proficiency classification systems for different languages follow different approaches. In this paper, we explore universal CEFR classification using domain-specific and domain-agnostic, theory-guided as well as data-driven features. We report the results of our preliminary experiments in monolingual, cross-lingual, and multilingual classification with three languages: German, Czech, and Italian. Our results show that both monolingual and multilingual models achieve similar performance, and cross-lingual classification yields lower, but comparable results to monolingual classification.

[1]  Gilles Louppe,et al.  Independent consultant , 2013 .

[2]  Hwee Tou Ng,et al.  Flexible Domain Adaptation for Automated Essay Scoring Using Correlated Linear Regression , 2015, EMNLP.

[3]  Julia Hancke,et al.  Automatic Prediction of CEFR Proficiency Levels Based on Linguistic Features of Learner Language , 2013 .

[4]  Anastassia Loukina,et al.  Building Better Open-Source Tools to Support Fairness in Automated Scoring , 2017, EthNLP@EACL.

[5]  Helen Yannakoudakis,et al.  A New Dataset and Method for Automatically Grading ESOL Texts , 2011, ACL.

[6]  Hwee Tou Ng,et al.  A Neural Approach to Automated Essay Scoring , 2016, EMNLP.

[7]  Jill Burstein,et al.  AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .

[8]  Torsten Zesch,et al.  Task-Independent Features for Automated Essay Grading , 2015, BEA@NAACL-HLT.

[9]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[10]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[11]  Xiaofei Lu The Relationship of Lexical Richness to the Quality of ESL Learners' Oral Narratives. , 2012 .

[12]  Sowmya Vajjala,et al.  Automatic CEFR Level Prediction for Estonian Learner Text , 2014 .

[13]  David Alfter,et al.  Coursebook Texts as a Helping Hand for Classifying Linguistic Complexity in Language Learners’ Writings , 2016, CL4LC@COLING 2016.

[14]  Walt Detmar Meurers,et al.  The MERLIN corpus: Learner language and the CEFR , 2014, LREC.

[15]  Helen Yannakoudakis,et al.  Automatic Text Scoring Using Neural Networks , 2016, ACL.

[16]  Sowmya Vajjala Automated Assessment of Non-Native Learner Essays: Investigating the Role of Linguistic Features , 2017, International Journal of Artificial Intelligence in Education.

[17]  Jan Hajic,et al.  UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing , 2016, LREC.

[18]  Çağrı Çöltekin,et al.  Discriminating Similar Languages with Linear SVMs and Neural Networks , 2016, VarDial@COLING.