SVALex: a CEFR-graded Lexical Resource for Swedish Foreign and Second Language Learners

The paper introduces SVALex, a lexical resource primarily aimed at learners and teachers of Swedish as a foreign and second language that describes the distribution of 15,681 words and expressions across the Common European Framework of Reference (CEFR). The resource is based on a corpus of coursebook texts, and thus describes receptive vocabulary learners are exposed to during reading activities, as opposed to productive vocabulary they use when speaking or writing. The paper describes the methodology applied to create the list and to estimate the frequency distribution. It also discusses some characteristics of the resulting resource and compares it to other lexical resources for Swedish. An interesting feature of this resource is the possibility to separate the wheat from the chaff, identifying the core vocabulary at each level, i.e. vocabulary shared by several coursebook writers at each level, from peripheral vocabulary which is used by the minority of the coursebook writers.

[1]  Adam Kilgarriff,et al.  Corpus-based vocabulary lists for language learners for nine languages , 2014, Lang. Resour. Evaluation.

[2]  Sofie Johansson Kokkinakis,et al.  A Swedish Academic Word List: Methods and Data , 2012 .

[3]  Richard Johansson,et al.  Defining the Eukalyptus forest – the Koala treebank of Swedish , 2015, NODALIDA.

[4]  Markus Forsberg,et al.  SALDO: a touch of yin to WordNet’s yang , 2013, Lang. Resour. Evaluation.

[5]  A. Capel Completing the English Vocabulary Profile : C1 and C2 vocabulary , 2012 .

[6]  J. Charles Alderson,et al.  The CEFR and the Need for More Research , 2007 .

[7]  Thomas François,et al.  An analysis of a French as a Foreign Language Corpus for Readability Assessment , 2014 .

[8]  Cédrick Fairon,et al.  Evaluating Lexical Simplification and Vocabulary Knowledge for Learners of French: Possibilities of Using the FLELex Resource , 2016, LREC.

[9]  Philipp Cimiano,et al.  Linking Lexical Resources and Ontologies on the Semantic Web with Lemon , 2011, ESWC.

[10]  Cédrick Fairon,et al.  FLELex: a graded lexical resource for French foreign learners , 2014, LREC.

[11]  B. Laufer,et al.  Lexical threshold revisited: Lexical text coverage, learners' vocabulary size and reading comprehension , 2010 .

[12]  Markus Forsberg,et al.  Swesaurus; or, The Frankenstein Approach to Wordnet Construction , 2014, GWC.

[13]  Jan H. Hulstijn,et al.  The shaky ground beneath the CEFR: Quantitative and qualitative dimensions of language proficiency , 2007 .

[14]  Julia Prentice,et al.  An academic word list for Swedish - a support for language learners in higher education , 2012 .

[15]  Eva Forsbom A Swedish Base Vocabulary Pool , 2006 .

[16]  Annette Capel,et al.  A1–B2 vocabulary: insights and issues arising from the English Profile Wordlists project , 2010 .

[17]  Emma Sköldberg,et al.  Lexin – a report from a recycling lexicographic project in the North , 2010 .

[18]  Markus Forsberg,et al.  Korp — the corpus infrastructure of Språkbanken , 2012, LREC.

[19]  António Branco,et al.  Rolling out Text Categorization for Language Learning Assessment Supported by Language Technology , 2014, PROPOR.