Classification of Language Proficiency Levels in Swedish Learners’ Texts

We evaluate a system for the automatic classification of texts written by learners of Swedish as a second language into levels of language proficiency. Since the amount of available annotated learner essay data for our target language is rather small, we explore also the potentials of domain adaptation for this task. The additional domain consists of coursebook texts written by experts for learners. We find that already with a smaller amount of in-domain Swedish learner essay data it is possible to obtain results that compare well to state-of-the-art systems for other languages, with domain adaptation methods yielding a slight

[1]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[2]  Cédrick Fairon,et al.  An “AI readability” Formula for French as a Foreign Language , 2012, EMNLP.

[3]  Robert Östling,et al.  Automated Essay Scoring for Swedish , 2013, BEA@NAACL-HLT.

[4]  Elena Volodina,et al.  You Get what You Annotate: A Pedagogically Annotated Corpus of Coursebooks for Swedish as a Second Language , 2014 .

[5]  Elena Volodina,et al.  A Readable Read: Automatic Assessment of Language Learning Materials based on Linguistic Complexity , 2016, Int. J. Comput. Linguistics Appl..

[6]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[7]  Sowmya Vajjala,et al.  Automatic CEFR Level Prediction for Estonian Learner Text , 2014 .

[8]  Markus Forsberg,et al.  Korp — the corpus infrastructure of Språkbanken , 2012, LREC.

[9]  Kuo-En Chang,et al.  Leveling L2 Texts Through Readability: Combining Multilevel Linguistic Features with the CEFR , 2015 .

[10]  Ted Briscoe,et al.  Text Readability Assessment for Second Language Learners , 2016, BEA@NAACL-HLT.

[11]  Jill Burstein,et al.  The E-rater® scoring engine: Automated essay scoring with natural language processing. , 2003 .

[12]  Elena Volodina,et al.  SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies , 2016, LREC.

[13]  Torsten Zesch,et al.  Task-Independent Features for Automated Essay Grading , 2015, BEA@NAACL-HLT.