Subtlex-UK: A New and Improved Word Frequency Database for British English

We present word frequencies based on subtitles of British television programmes. We show that the SUBTLEX-UK word frequencies explain more of the variance in the lexical decision times of the British Lexicon Project than the word frequencies based on the British National Corpus and the SUBTLEX-US frequencies. In addition to the word form frequencies, we also present measures of contextual diversity part-of-speech specific word frequencies, word frequencies in children programmes, and word bigram frequencies, giving researchers of British English access to the full range of norms recently made available for other languages. Finally, we introduce a new measure of word frequency, the Zipf scale, which we hope will stop the current misunderstandings of the word frequency effect.

[1]  A. Jacobs,et al.  The word frequency effect: a review of recent developments and implications for the choice of frequency estimates in German. , 2011, Experimental psychology.

[2]  M. Carreiras,et al.  Subtitle-Based Word Frequencies as the Best Estimate of Reading Behavior: The Case of Greek , 2010, Front. Psychology.

[3]  R. H. Baayen,et al.  The CELEX Lexical Database (CD-ROM) , 1996 .

[4]  J. H. Steiger Tests for comparing elements of a correlation matrix. , 1980 .

[5]  Gordon D. A. Brown,et al.  Contextual Diversity, Not Word Frequency, Determines Word-Naming and Lexical Decision Times , 2006, Psychological science.

[6]  M. Brysbaert,et al.  Adding part-of-speech information to the SUBTLEX-US word frequencies , 2012, Behavior Research Methods.

[7]  M. Brysbaert,et al.  Dealing with zero word frequencies: A review of the existing rules of thumb and a suggestion for an evidence-based choice , 2012, Behavior Research Methods.

[8]  N. Snider,et al.  More than words: Frequency effects for multi-word phrases , 2010 .

[9]  Marc Brysbaert,et al.  SUBTLEX-NL: A new measure for Dutch word frequency based on film subtitles , 2010, Behavior research methods.

[10]  Gordon D A Brown,et al.  Modeling lexical decision: the form of frequency and diversity effects. , 2008, Psychological review.

[11]  Patrick Bonin,et al.  Does frequency trajectory influence word identification? A cross-task comparison , 2013, Quarterly journal of experimental psychology.

[12]  Margaret Dowie-Whybrow Copyright, Designs and Patents Act 1988 , 2013 .

[13]  M. Brysbaert,et al.  Age-of-acquisition ratings for 30,000 English words , 2012, Behavior research methods.

[14]  B. Rossion,et al.  Fixation Patterns During Recognition of Personally Familiar and Unfamiliar Faces , 2010, Front. Psychology.

[15]  Michael C. Doyle,et al.  Effects of frequency on visual word recognition tasks: where are they? , 1989, Journal of experimental psychology. General.

[16]  Marc Brysbaert,et al.  The French Lexicon Project: Lexical decision data for 38,840 French words and 38,840 pseudowords , 2010, Behavior research methods.

[17]  M. Brysbaert,et al.  The use of film subtitles to estimate word frequencies , 2007, Applied Psycholinguistics.

[18]  M. Brysbaert,et al.  SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles , 2010, PloS one.

[19]  Ian S. Hargreaves,et al.  Is more always better? Effects of semantic richness on lexical decision, speeded pronunciation, and semantic classification , 2011, Psychonomic bulletin & review.

[20]  Marc Brysbaert,et al.  Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English , 2009, Behavior research methods.

[21]  Andrew W. Ellis,et al.  ROLES OF WORD FREQUENCY AND AGE OF ACQUISITION IN WORD NAMING AND LEXICAL DECISION , 1995 .

[22]  John A. Carroll,et al.  Applied morphological processing of English , 2001, Natural Language Engineering.

[23]  Morag Stuart,et al.  Children's printed word database: continuities and changes over time in children's early reading vocabulary. , 2010, British journal of psychology.

[24]  Roger Garside,et al.  A hybrid grammatical tagger: CLAWS4 , 1997 .

[25]  Marc Brysbaert,et al.  SUBTLEX-ESP: Spanish word frequencies based on film subtitles , 2011 .

[26]  Marc Brysbaert,et al.  The British Lexicon Project: Lexical decision data for 28,730 monosyllabic and disyllabic English words , 2011, Behavior Research Methods.

[27]  M. Brysbaert,et al.  Age-of-acquisition ratings for 30 thousand English words , 2012 .

[28]  Montserrat Comesaña,et al.  Contextual diversity is a main determinant of word identification times in young readers. , 2013, Journal of experimental child psychology.

[29]  Michael J Cortese,et al.  Do the effects of subjective frequency and age of acquisition survive better word frequency norms? , 2011, Quarterly journal of experimental psychology.

[30]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[31]  Dušica Filipović Đurđević,et al.  An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. , 2011, Psychological review.

[32]  M. Brysbaert,et al.  Assessing the Usefulness of Google Books’ Word Frequencies for Psycholinguistic Research on Word Processing , 2011, Front. Psychology.

[33]  Kathy Conklin,et al.  Seeing a phrase "time and again" matters: the role of phrasal frequency in the processing of multiword sequences. , 2011, Journal of experimental psychology. Learning, memory, and cognition.

[34]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[35]  Victor Kuperman,et al.  Moving spaces: Spelling alternation in English noun-noun compounds , 2013 .

[36]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[37]  Rebecca Treiman,et al.  The English Lexicon Project , 2007, Behavior research methods.