Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English

Word frequency is the most important variable in research on word processing and memory. Yet, the main criterion for selecting word frequency norms has been the availability of the measure, rather than its quality. As a result, much research is still based on the old Kučera and Francis frequency norms. By using the lexical decision times of recently published megastudies, we show how bad this measure is and what must be done to improve it. In particular, we investigated the size of the corpus, the language register on which the corpus is based, and the definition of the frequency measure. We observed that corpus size is of practical importance for small sizes (depending on the frequency of the word), but not for sizes above 16–30 million words. As for the language register, we found that frequencies based on television and film subtitles are better than frequencies based on written sources, certainly for the monosyllabic and bisyllabic words used in psycholinguistic research. Finally, we found that lemma frequencies are not superior to word form frequencies in English and that a measure of contextual diversity is better than a measure based on raw frequency of occurrence. Part of the superiority of the latter is due to the words that are frequently used as names. Assembling a new frequency norm on the basis of these considerations turned out to predict word processing times much better than did the existing norms (including Kučera & Francis and Celex). The new SUBTL frequency norms from the SUBTLEXUS corpus are freely available for research purposes from http://brm.psychonomic-journals.org/content/supplemental, as well as from the University of Ghent and Lexique Web sites.

[1]  P. Witty The teacher's word book of 30,000 words. , 1945 .

[2]  R. Solomon,et al.  Visual duration threshold as a function of word-probability. , 1951, Journal of experimental psychology.

[3]  B. Underwood Ten years of massed practice on distributed practice. , 1961 .

[4]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[5]  M. Livio,et al.  Who are they? , 2017, Nursing times.

[6]  M. Glanzer,et al.  Analysis of the word-frequency effect in recognition memory , 1976 .

[7]  W. Nelson Francis,et al.  FREQUENCY ANALYSIS OF ENGLISH USAGE: LEXICON AND GRAMMAR , 1983 .

[8]  D. Balota,et al.  Are lexical decisions a good measure of lexical access? The role of word frequency in the neglected decision stage. , 1984, Journal of experimental psychology. Human perception and performance.

[9]  M Glanzer,et al.  The mirror effect in recognition memory , 1984, Memory & cognition.

[10]  K. Rayner,et al.  Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity , 1986, Memory & cognition.

[11]  A. Caramazza,et al.  Lexical access and inflectional morphology , 1988, Cognition.

[12]  Michael C. Doyle,et al.  Effects of frequency on visual word recognition tasks: where are they? , 1989, Journal of experimental psychology. General.

[13]  G. Waters,et al.  Reading words aloud-a mega study , 1989 .

[14]  W. Levelt,et al.  Word frequency effects in speech production: Retrieval of syntactic information and of phonological form , 1994 .

[15]  R. H. Baayen,et al.  The CELEX Lexical Database (CD-ROM) , 1996 .

[16]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[17]  R. Baayen,et al.  Singulars and plurals in Dutch: Evidence for a parallel dual-route model , 1997 .

[18]  Curt Burgess,et al.  The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kučera and Francis , 1998 .

[19]  Janet G. van Hell,et al.  Disentangling Context Availability and Concreteness in Lexical Decision and Word Translation , 1998 .

[20]  K. Rayner Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.

[21]  Françoise Vitu,et al.  Word skipping: Implications for theories of eye movement control in reading , 1998 .

[22]  A. D. Groot,et al.  Disentangling Context Availability and Concreteness in Lexical Decision and W ord Translation , 1998 .

[23]  H. Clahsen,et al.  Lexical entries and rules of language: A multidisciplinary study of German inflection , 1999, Behavioral and Brain Sciences.

[24]  G. Leech,et al.  Word Frequencies in Written and Spoken English: based on the British National Corpus , 2001 .

[25]  Mark S. Seidenberg,et al.  Age of Acquisition Effects in Word Reading and Other Tasks , 2002 .

[26]  A. Yonelinas The Nature of Recollection and Familiarity: A Review of 30 Years of Research , 2002 .

[27]  Irene V Blair,et al.  Using Internet search engines to estimate word frequency , 2002, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[28]  M. Gaskell,et al.  Lexical competition and the acquisition of novel words , 2003, Cognition.

[29]  M. Taft Morphological Decomposition and the Reverse Base Frequency Effect , 2004, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[30]  Marc Brysbaert,et al.  WordGen: A tool for word selection and nonword generation in Dutch, English, German, and French , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[31]  F. Pulvermüller,et al.  Effects of word length and frequency on the human event-related potential , 2004, Clinical Neurophysiology.

[32]  Michael J Cortese,et al.  Visual word recognition of single-syllable words. , 2004, Journal of experimental psychology. General.

[33]  K. Rastle,et al.  The processing of singular and plural nouns in French and English , 2004 .

[34]  Marc Brysbaert,et al.  Lexique 2 : A new French lexical database , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[35]  Michael B. Lewis,et al.  Age of acquisition and the cumulative-frequency hypothesis: a review of the literature and a new multi-task investigation. , 2004, Acta psychologica.

[36]  R. Holloway The broth in my brother ’ s brothel : Morpho-orthographic segmentation in visual word recognition , 2005 .

[37]  Barbara J. Juhasz,et al.  Age-of-acquisition effects in word and picture identification. , 2005, Psychological bulletin.

[38]  R. Baayen,et al.  Morphological influences on the recognition of monosyllabic monomorphemic words , 2006 .

[39]  Gordon D. A. Brown,et al.  Contextual Diversity, Not Word Frequency, Determines Word-Naming and Lexical Decision Times , 2006, Psychological science.

[40]  R. Johnston,et al.  Age of acquisition and lexical processing , 2006 .

[41]  M. Brysbaert,et al.  Reexamining the word length effect in visual word recognition: New evidence from the English Lexicon Project , 2006, Psychonomic bulletin & review.

[42]  Michael J Cortese,et al.  Age of acquisition predicts naming and lexical-decision performance above and beyond 22 other predictor variables: An analysis of 2,342 words , 2007, Quarterly journal of experimental psychology.

[43]  Lee H. Wurm,et al.  Lexical dynamics for low-frequency complex words: A regression study across tasks and modalities , 2007 .

[44]  Matrhew J Pastizzo,et al.  Spoken word frequency counts based on 1.6 million words in American English , 2007, Behavior research methods.

[45]  Rebecca Treiman,et al.  The English Lexicon Project , 2007, Behavior research methods.

[46]  E. Thorndike The Teacher's Word Book , 2007 .

[47]  M. Brysbaert,et al.  The use of film subtitles to estimate word frequencies , 2007, Applied Psycholinguistics.

[48]  K. Rayner,et al.  The word grouping hypothesis and eye movements during reading. , 2008, Journal of experimental psychology. Learning, memory, and cognition.

[49]  Ian M. McDonough,et al.  Autobiographical elaboration reduces memory distortion: cognitive operations and the distinctiveness heuristic. , 2008, Journal of experimental psychology. Learning, memory, and cognition.

[50]  Jeffrey M. Zacks,et al.  Pictures of a thousand words: Investigating the neural mechanisms of reading with extremely rapid event-related fMRI , 2008, NeuroImage.

[51]  D. Titone,et al.  Making sense of word senses: the comprehension of polysemy depends on sense overlap. , 2008, Journal of experimental psychology. Learning, memory, and cognition.

[52]  W. Hockley The effects of environmental context on recognition memory and claims of remembering. , 2008, Journal of experimental psychology. Learning, memory, and cognition.

[53]  T. Curran,et al.  Effects of repetition priming on recognition memory: testing a perceptual fluency-disfluency model. , 2008, Journal of experimental psychology. Learning, memory, and cognition.

[54]  D. Besner,et al.  Reading aloud: qualitative differences in the relation between stimulus quality and word frequency as a function of context. , 2008, Journal of experimental psychology. Learning, memory, and cognition.

[55]  K. Szpunar,et al.  Testing during study insulates against the buildup of proactive interference. , 2008, Journal of experimental psychology. Learning, memory, and cognition.

[56]  C. Davis,et al.  Semantic involvement in reading aloud: evidence from a nonword training study. , 2008, Journal of experimental psychology. Learning, memory, and cognition.

[57]  C. Peirce,et al.  The Fixation of Belief , 2011, Philosophy after Darwin.