Corpus-based vocabulary lists for language learners for nine languages

We present the KELLY project and its work on developing monolingual and bilingual word lists for language learning, using corpus methods, for nine languages and thirty-six language pairs. We describe the method and discuss the many challenges encountered. We have loaded the data into an online database to make it accessible for anyone to explore and we present our own first explorations of it. The focus of the paper is thus twofold, covering pedagogical and methodological aspects of the lists’ construction, and linguistic aspects of the by-product of the project, the KELLY database.

[1]  Jean Aitchison Comprar Words in the Mind: An Introduction to the Mental Lexicon | Jean Aitchison | 9780631232445 | Wiley , 2008 .

[2]  Michael A. West,et al.  A general service list of English words, with semantic frequencies and a supplementary word-list for the writing of popular science and technology , 1953 .

[3]  M. Piasecki,et al.  Polish tagger TaKIPI: rule based construction and optimization , 2007 .

[4]  Tony McEnery,et al.  A Frequency Dictionary of Mandarin Chinese: Core Vocabulary for Learners , 2009 .

[5]  Tullio De Mauro,et al.  Guida all'uso delle parole , 1980 .

[6]  J. Hulstijn Intentional and incidental second-language vocabulary learning: A reappraisal of elaboration, rehearsal and automaticity , 2001 .

[7]  Serge Sharoff,et al.  Open-source Corpora: Using the net to fish for linguistic data , 2006 .

[8]  Harry Hirsch Josselson The Russian word count and frequency analysis of grammatical categories of standard literary Russian , 1967 .

[9]  Albert Sydney Hornby,et al.  Oxford Advanced Learner's Dictionary , 1974 .

[10]  James Mccrostie Investigating the Accuracy of Teachers' Word Frequency Intuitions , 2007 .

[11]  Danny Jones,et al.  Words in the mind: An introduction to the mental lexicon , 2004, Machine Translation.

[12]  A. Kilgarriff Simple Maths for Keywords , 2009 .

[13]  Μαρία Γαβριηλίδου,et al.  Creating Frequency-Based Vocabulary Lists For L2 Learners , 2011 .

[14]  Tatsuya Nakata,et al.  English vocabulary learning with word lists, word cards and computers: implications from cognitive psychology research for optimal spaced learning , 2008, ReCALL.

[15]  J. Vizmuller-Zocco,et al.  Lessico di frequenza dell'italiano parlato , 1994 .

[16]  Markus Forsberg,et al.  Search Result Diversification Methods to Assist Lexicographers , 2012, LAW@ACL.

[17]  B. Laufer Vocabulary Acquisition in a Second Language: Do Learners Really Acquire Most Vocabulary by Reading? Some Empirical Evidence , 2003 .

[18]  G. Leech,et al.  Word Frequencies in Written and Spoken English: based on the British National Corpus , 2001 .

[19]  Jan H. Hulstijn,et al.  The Common European Framework of Reference for Languages: A challenge for applied linguistics , 2014 .

[20]  Serge Sharoff,et al.  A Frequency Dictionary of Russian: core vocabulary for learners , 2013 .

[21]  Guy Aston,et al.  Enriching reality: language corpora in language pedagogy , 2001 .

[22]  Eric Atwell,et al.  Fine-Grain Morphological Analyzer and Part-of-Speech Tagger for Arabic Text , 2010, LREC.

[23]  Carlo Tagliavini,et al.  Lessico di frequenza della lingua Italiana contemporanea , 1972 .

[24]  Tim Buckwalter,et al.  A Frequency Dictionary of Arabic: Core Vocabulary for Learners , 2010 .

[25]  Adam Kilgarriff,et al.  GDEX: Automatically Finding Good Dictionary Examples in a Corpus , 2008 .

[26]  Iztok Kosem,et al.  GDEX for Slovene , 2011 .

[27]  Norbert Schmitt,et al.  Vocabulary notebooks: theoretical underpinnings and practical suggestions , 1995 .

[28]  Michael Rundell,et al.  Macmillan English Dictionary for Advanced Learners , 2002 .

[29]  Xiaofei Lu Xiao, Richard, Paul Rayson & Tony McEnery. 2009. A Frequency Dictionary of Mandarin Chinese: Core Vocabulary for Learners , 2010 .

[30]  C. Davis N-Watch: A program for deriving neighborhood size and other psycholinguistic statistics , 2005, Behavior research methods.

[31]  Peter Robinson,et al.  Cognition and Second Language Instruction: COGNITION AND INSTRUCTION , 2001 .

[32]  Sofie Johansson Kokkinakis,et al.  Swedish KELLY: Technical report , 2012 .

[33]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[34]  Tomaz Erjavec,et al.  Designing and Evaluating a Russian Tagset , 2008, LREC.

[35]  NakataTatsuya English vocabulary learning with word lists, word cards and computers , 2008 .

[36]  Jan-Arjen Mondria,et al.  Efficiently memorizing words with the help of word cards and “hand computer”: Theory and applications , 1994 .

[37]  Adam Kilgarriff,et al.  "I Don’t Believe in Word Senses" , 1997, Comput. Humanit..

[38]  Adam Kilgarriff,et al.  Polish word sketches , 2011 .

[39]  P. Lewis Ethnologue : languages of the world , 2009 .

[40]  Raphael Marco Oliveira Carneiro,et al.  Teaching Vocabulary: Lessons from the Corpus, Lessons for the Classroom , 2014 .

[41]  Annette Capel,et al.  A1–B2 vocabulary: insights and issues arising from the English Profile Wordlists project , 2010 .

[42]  猫田 英伸,et al.  Common European Framework of Reference for Languagesの意義を考える : 日本の英語教育関係者の連携のために , 2002 .

[43]  I. S. P. Nation,et al.  Learning Vocabulary in Another Language: Frontmatter , 2001 .

[44]  Michael McCarthy,et al.  Vocabulary: Description, Acquisition and Pedagogy , 1990 .