LEGATO: A flexible lexicographic annotation tool

This article is a report from an ongoing project aiming at analyzing lexical and grammatical competences of Swedish as a Second language (L2). To facilitate lexical analysis, we need access to metalinguistic information about relevant vocabulary that L2 learners can use and understand. The focus of the current article is on the lexical annotation of the vocabulary scope for a range of lexicographical aspects, such as morphological analysis, valency, types of multi-word units, etc. We perform parts of the analysis automatically, and other parts manually. The rationale behind this is that where there is no possibility to add information automatically, manual effort needs to be added. To facilitate the latter, a tool LEGATO has been designed, implemented and currently put to active testing.

[1]  Elena Volodina,et al.  You Get what You Annotate: A Pedagogically Annotated Corpus of Coursebooks for Swedish as a Second Language , 2014 .

[2]  Elena Volodina,et al.  SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies , 2016, LREC.

[3]  Elena Volodina,et al.  Investigating the importance of linguistic complexity features across different datasets related to language learning , 2018 .

[4]  Markus Forsberg,et al.  SALDO: a touch of yin to WordNet’s yang , 2013, Lang. Resour. Evaluation.

[5]  A. Capel Completing the English Vocabulary Profile : C1 and C2 vocabulary , 2012 .

[6]  James Milton,et al.  Measuring the contribution of vocabulary knowledge to proficiency in the four skills , 2013 .

[7]  Markus Forsberg,et al.  Språkbanken’s Open Lexical Infrastructure , 2016 .

[8]  David Alfter,et al.  From distributions to labels: A lexical proficiency analysis using learner corpora , 2016 .

[9]  B. Laufer,et al.  Lexical threshold revisited: Lexical text coverage, learners' vocabulary size and reading comprehension , 2010 .

[10]  Sven-Göran Malmgren Svenska Akademiens ordlista genom 140 år: mot fjortonde upplagan , 2016 .

[11]  Markus Forsberg,et al.  Sparv : Språkbanken ’ s corpus annotation pipeline infrastructure , 2016 .

[12]  Annette Capel,et al.  A1–B2 vocabulary: insights and issues arising from the English Profile Wordlists project , 2010 .

[13]  David Alfter,et al.  Towards Single Word Lexical Complexity Prediction , 2018, BEA@NAACL-HLT.

[14]  P. Nation,et al.  Word families , 2020 .

[15]  I. Nation How Large a Vocabulary Is Needed for Reading and Listening? , 2006 .

[16]  Batia Laufer,et al.  Measuring and Explaining the Reading Threshold Needed for English for Academic Purposes Texts. , 1985 .

[17]  Cédrick Fairon,et al.  Towards a French lexicon with difficulty measures: NLP helping to bridge the gap between traditional dictionaries and specialized lexicons. , 2013 .

[18]  Walt Detmar Meurers,et al.  On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition , 2012, BEA@NAACL-HLT.

[19]  Erik Andersson,et al.  Svenska Akademiens grammatik , 1999 .

[20]  Markus Forsberg,et al.  The open lexical infrastructure of Spräkbanken , 2012, LREC.

[21]  Claude Baudoin Iic,et al.  Vocabulary , 2007, Selections from Horace Odes III.

[22]  Ted Briscoe,et al.  Text Readability Assessment for Second Language Learners , 2016, BEA@NAACL-HLT.

[23]  Thomas François,et al.  SweLLex: Second language learners’ productive vocabulary , 2016 .

[24]  Markus Forsberg,et al.  Swesaurus; or, The Frankenstein Approach to Wordnet Construction , 2014, GWC.

[25]  I. Pilán,et al.  State-of-the-art on monolingual lexicography for Sweden , 2019, Slovenščina 2.0: empirical, applied and interdisciplinary research.

[26]  P. Nation,et al.  Unknown vocabulary density and reading comprehension , 2020 .

[27]  吉島 茂,et al.  文化と言語の多様性の中のCommon European Framework of Reference for Languages: Learning, teaching, assessment (CEFR)--それは基準か? (第10回明海大学大学院応用言語学研究科セミナー 講演) , 2008 .

[28]  Thomas François,et al.  SVALex: a CEFR-graded Lexical Resource for Swedish Foreign and Second Language Learners , 2016, LREC.

[29]  Emma Sköldberg,et al.  Lexin – a report from a recycling lexicographic project in the North , 2010 .

[30]  Thomas François,et al.  EFLLex: A Graded Lexical Resource for Learners of English as a Foreign Language , 2018, LREC.

[31]  Cédrick Fairon,et al.  Un modèle pour prédire la complexité lexicale et graduer les mots , 2014 .