Identifying Trends in Word Frequency Dynamics

The word-stock of a language is a complex dynamical system in which words can be created, evolve, and become extinct. Even more dynamic are the short-term fluctuations in word usage by individuals in a population. Building on the recent demonstration that word niche is a strong determinant of future rise or fall in word frequency, here we introduce a model that allows us to distinguish persistent from temporary increases in frequency. Our model is illustrated using a 108-word database from an online discussion group and a 1011-word collection of digitized books. The model reveals a strong relation between changes in word dissemination and changes in frequency. Aside from their implications for short-term word frequency dynamics, these observations are potentially important for language evolution as new words must survive in the short term in order to survive in the long term.

[1]  M. Pagel,et al.  Frequency of word-use predicts rates of lexical evolution throughout Indo-European history , 2007, Nature.

[2]  Damian H. Zanette,et al.  Dynamics of fashion: The case of given names , 2012, 1208.0576.

[3]  Gemma Boleda,et al.  Universal Complex Structures in Written Language , 2009, ArXiv.

[4]  Meyer,et al.  Clustering of independently diffusing individuals by birth and death processes. , 1996, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[5]  M. Gell-Mann,et al.  The origin and evolution of word order , 2011, Proceedings of the National Academy of Sciences.

[6]  Harry Eugene Stanley,et al.  Statistical Laws Governing Fluctuations in Word Use from Word Birth to Word Death , 2011, Scientific Reports.

[7]  Nadav M. Shnerb,et al.  You Name It – How Memory and Delay Govern First Name Dynamics , 2012, PloS one.

[8]  Adilson E. Motter,et al.  Niche as a Determinant of Word Fate in Online Groups , 2010, PloS one.

[9]  W. Bruce Croft,et al.  Building social cognitive models of language change , 2009, Trends in Cognitive Sciences.

[10]  Daniel Polani,et al.  Phase transitions in least-effort communications , 2010 .

[11]  James S. Crampton,et al.  On the bidirectional relationship between geographic range and taxonomic duration , 2008, Paleobiology.

[12]  Erez Lieberman,et al.  Quantifying the evolutionary dynamics of language , 2007, Nature.

[13]  Richard Fox,et al.  Spatial patterns in species distributions reveal biodiversity change , 2004, Nature.

[14]  Filippo Menczer,et al.  Modeling Statistical Properties of Written Text , 2009, PloS one.

[15]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[16]  Christopher M. Danforth,et al.  Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter , 2011, PloS one.

[17]  Marcelo A. Montemurro,et al.  Towards the Quantification of the Semantic Information Encoded in Written Language , 2009, Adv. Complex Syst..

[18]  Partha Dasgupta,et al.  Topology of the conceptual network of language. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Ricard V. Solé,et al.  Least effort and the origins of scaling in human language , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Ricard V Solé,et al.  Diversity, competition, extinction: the ecophysics of language change , 2010, Journal of The Royal Society Interface.

[21]  Matjaz Perc,et al.  Evolution of the most common English words and phrases over the centuries , 2012, Journal of The Royal Society Interface.

[22]  Mariano Sigman,et al.  Global organization of the Wordnet lexicon , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[23]  S N Dorogovtsev,et al.  Language as an evolving word web , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[24]  W. Bialek,et al.  Statistical mechanics of letters in words. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  Geoffrey Sampson,et al.  Word frequency distributions , 2002, Computational Linguistics.

[26]  Ramon Ferrer i Cancho,et al.  The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[27]  S. Fortunato,et al.  Statistical physics of social dynamics , 2007, 0710.3256.

[28]  M. Pagel Human language as a culturally transmitted replicator , 2009, Nature Reviews Genetics.

[29]  Erez Lieberman Aiden,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010, Science.

[30]  Adilson E. Motter,et al.  Beyond Word Frequency: Bursts, Lulls, and Scaling in the Temporal Distributions of Words , 2009, PloS one.