Diffusion of Lexical Change in Social Media

Language in social media is rich with linguistic innovations, most strikingly in the new words and spellings that constantly enter the lexicon. Despite assertions about the power of social media to connect people across the world, we find that many of these neologisms are restricted to geographically compact areas. Even for words that become ubiquituous, their growth in popularity is often geographical, spreading from city to city. Thus, social media text offers a unique opportunity to study the diffusion of lexical change. In this paper, we show how an autoregressive model of word frequencies in social media can be used to induce a network of linguistic influence between American cities. By comparing the induced network with the geographical and demographic characteristics of each city, we can measure the factors that drive the spread of lexical innovation.

[1]  John Nerbonne,et al.  Data-driven Dialectology , 2008 .

[2]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[3]  P. Eckert Linguistic variation as social practice , 2000 .

[4]  M. Lee Out of the hood and into the news : Borrowed black verbal expressions in a mainstream newspaper , 1999 .

[5]  Christos Faloutsos,et al.  Cascading Behavior in Large Blog Graphs , 2007 .

[6]  Thomas L Griffiths,et al.  Words as alleles: connecting language evolution with Bayesian learners to models of genetic drift , 2010, Proceedings of the Royal Society B: Biological Sciences.

[7]  Mary Bucholtz,et al.  Hella Nor Cal or Totally So Cal? , 2007 .

[8]  Kevin Murphy,et al.  Bayes net toolbox for Matlab , 1999 .

[9]  Mike Thelwall,et al.  Homophily in MySpace , 2009, J. Assoc. Inf. Sci. Technol..

[10]  Charles James Nice Bailey Variation and linguistic theory , 1973 .

[11]  S. Tagliamonte,et al.  LINGUISTIC RUIN? LOL! INSTANT MESSAGING AND TEEN LANGUAGE , 2008 .

[12]  David Crystal,et al.  Language and the Internet , 2001 .

[13]  Nicholas Diakopoulos,et al.  Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs , 2011, EMNLP.

[14]  A. Brenner Twitter Use 2012 , 2012 .

[15]  Charles Boberg Geolinguistic diffusion and the U.S.–Canada border , 2000, Language Variation and Change.

[16]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[17]  Simon J. Godsill,et al.  An Overview of Existing Methods and Recent Advances in Sequential Monte Carlo , 2007, Proceedings of the IEEE.

[18]  Jure Leskovec,et al.  Inferring networks of diffusion and influence , 2010, KDD.

[19]  W. Heeringa,et al.  Geographic Distributions of Linguistic Variation Reflect Dynamics of Differentiation , 2007 .

[20]  A. Doucet,et al.  Monte Carlo Smoothing for Nonlinear Time Series , 2004, Journal of the American Statistical Association.

[21]  Nando de Freitas,et al.  Variational MCMC , 2001, UAI.

[22]  Partha Niyogi,et al.  A Dynamical Systems Model for Language Change , 1994, Complex Syst..

[23]  W. Bruce Croft,et al.  Selection Model of , 2022 .

[24]  M. Nowak,et al.  Nash equilibria for an evolutionary language game , 2000, Journal of mathematical biology.

[25]  L. Gasser,et al.  Centers and peripheries: Network roles in language change , 2010 .

[26]  M. Gordon Phonological Correlates of Ethnic Identity: Evidence of Divergence? , 2000 .

[27]  W. Labov Pursuing the cascade model , 2003 .

[28]  Brendan T. O'Connor,et al.  TweetMotif: Exploratory Search and Topic Summarization for Twitter , 2010, ICWSM.

[29]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[30]  Eric P. Xing,et al.  Discovering Sociolinguistic Associations with Structured Sparsity , 2011, ACL.

[31]  Lars Backstrom,et al.  Find me if you can: improving geographical prediction with social and spatial proximity , 2010, WWW '10.

[32]  P. Trudgill Linguistic change and diffusion: description and explanation in sociolinguistic dialect geography , 1974, Language in Society.