SweLLex: Second language learners’ productive vocabulary

This paper presents a new lexical resource for learners of Swedish as a second language, SweLLex, and a know-how behind its creation. We concentrate on L2 learners’ productive vocabulary, i.e. words that they are actively able to produce, rather than the lexica they comprehend (receptive vocabulary). The proposed list covers productive vocabulary used by L2 learners in their essays. Each lexical item on the list is connected to its frequency distribution over the six levels of proficiency defined by the Common European Framework of Reference (CEFR) (Council of Europe, 2001). To make this list a more reliable resource, we experiment with normalizing L2 word-level errors by replacing them with their correct equivalents. SweLLex has been tested in a prototype system for automatic CEFR level classification of essays as well as in a visualization tool aimed at exploring L2 vocabulary contrasting receptive and productive vocabulary usage at different levels of language proficiency.

[1]  Julia Prentice,et al.  A Friend in Need? : Research agenda for electronic Second Language infrastructure , 2016 .

[2]  Annette Capel,et al.  A1–B2 vocabulary: insights and issues arising from the English Profile Wordlists project , 2010 .

[3]  R. Desjardins,et al.  OECD Skills Outlook 2013: First Results from the Survey of Adult Skills , 2013 .

[4]  Stefan Bordag,et al.  A Comparison of Co-occurrence and Similarity Measures as Simulations of Context , 2008, CICLing.

[5]  Lene Antonsen Improving feedback on L2 misspellings - an FST approach , 2012 .

[6]  Roman Grundkiewicz,et al.  Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, CoNLL 2014, Baltimore, Maryland, USA, June 26-27, 2014 , 2014, CoNLL Shared Task.

[7]  Walt Detmar Meurers,et al.  MERLIN : An Online Trilingual Learner Corpus Empirically Grounding the European Reference Levels in Authentic Learner Data , 2013 .

[8]  Michael A. West,et al.  A general service list of English words, with semantic frequencies and a supplementary word-list for the writing of popular science and technology , 1953 .

[9]  Emma Sköldberg,et al.  Lexin – a report from a recycling lexicographic project in the North , 2010 .

[10]  吉島 茂,et al.  文化と言語の多様性の中のCommon European Framework of Reference for Languages: Learning, teaching, assessment (CEFR)--それは基準か? (第10回明海大学大学院応用言語学研究科セミナー 講演) , 2008 .

[11]  E. Thorndike The Teacher's Word Book , 2007 .

[12]  A. Capel Completing the English Vocabulary Profile : C1 and C2 vocabulary , 2012 .

[13]  Markus Forsberg,et al.  Korp — the corpus infrastructure of Språkbanken , 2012, LREC.

[14]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[15]  Beáta Megyesi,et al.  The Uppsala Corpus of Student Writings: Corpus Creation, Annotation, and Analysis , 2016, LREC.

[16]  Elena Volodina,et al.  SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies , 2016, LREC.

[17]  Trude Heift,et al.  Language Learners and Generic Spell Checkers in CALL , 2013 .

[18]  Adam Kilgarriff,et al.  Corpus-based vocabulary lists for language learners for nine languages , 2014, Lang. Resour. Evaluation.

[19]  Markus Forsberg,et al.  SALDO: a touch of yin to WordNet’s yang , 2013, Lang. Resour. Evaluation.

[20]  Averil Coxhead A New Academic Word List , 2000 .

[21]  Markus Dickinson,et al.  Annotation for Learner English Guidelines, v. 0.1 , 2013 .

[22]  Thomas François,et al.  SVALex: a CEFR-graded Lexical Resource for Swedish Foreign and Second Language Learners , 2016, LREC.

[23]  B. Laufer,et al.  Lexical threshold revisited: Lexical text coverage, learners' vocabulary size and reading comprehension , 2010 .