Mining Words in the Minds of Second Language Learners: Learner-Specific Word Difficulty

While there have been many studies on measuring the size of learners’ vocabulary or the vocabulary they should learn, there have been few studies on what kind of words learners actually know. Therefore, we investigated theoretically and practically important models for predicting second language learners’ vocabulary and propose another model for this vocabulary prediction task. With the current models, the same word difficulty measure is shared by all learners. This is unrealistic because some learners have special interests. A learner interested in music may know special music-related terms regardless of their difficulty. To solve this problem, our model can define a learner-specific word difficulty measure. Our model is also an extension of these current models in the sense that these models are special cases of our model. In a qualitative evaluation, we defined a measure for how learner-specific a word is. Interestingly, the word with the highest learner-specificity was “twitter”. Although “twitter” is a difficult English word, some low-ability learners presumably knew this word through the famous micro-blogging service. Our qualitative evaluation successfully extracted such interesting and suggestive examples. Our model achieved an accuracy competitive with the current models.

[1]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[2]  Mark D. Reckase,et al.  Item Response Theory: Parameter Estimation Techniques , 1998 .

[3]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[4]  Rohit J. Kate,et al.  Learning to Predict Readability using Diverse Linguistic Features , 2010, COLING.

[5]  Michael Harrington,et al.  The Yes/No test as a measure of receptive vocabulary knowledge , 2006 .

[6]  P. Nation,et al.  A vocabulary-size test of controlled productive ability , 1999 .

[7]  Cédrick Fairon,et al.  An “AI readability” Formula for French as a Foreign Language , 2012, EMNLP.

[8]  Mark Davies The Corpus of Contemporary American English (COCA) , 2012 .

[9]  June Eyckmans Measuring Receptive Vocabulary Size : Reliability and Validity of the Yes/No Vocabulary Test for French-speaking Learners of Dutch , 2004 .

[10]  Thomas K. Landauer,et al.  Word Maturity: Computational Modeling of Word Knowledge , 2011, ACL.

[11]  Hisashi Kashima,et al.  A Convex Formulation for Learning from Crowds , 2012, AAAI.

[12]  Lijun Feng,et al.  A Comparison of Features for Automatic Readability Assessment , 2010, COLING.

[13]  Hiroshi Nakagawa,et al.  Personalized reading support for second-language web documents , 2013, TIST.

[14]  P. Meara,et al.  An alternative to multiple choice vocabulary tests , 1987 .

[15]  Phillip Rowles Teaching and Learning Vocabulary , 2003 .

[16]  Jeremy H. Clear,et al.  The British national corpus , 1993 .

[17]  I. Nation How Large a Vocabulary Is Needed for Reading and Listening? , 2006 .

[18]  Hiroshi Nakagawa,et al.  Formalizing Word Sampling for Vocabulary Prediction as Graph-based Active Learning , 2014, EMNLP.

[19]  Shigeaki Amano,et al.  Estimation of mental lexicon size with word familiarity database , 1998, ICSLP.

[20]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[21]  N. Schmitt,et al.  Developing and exploring the behaviour of two new versions of the Vocabulary Levels Test , 2001 .

[22]  G. H. Fischer,et al.  Logistic latent trait models with linear constraints , 1983 .