Language Evolution in Social Media: a Preliminary Study

Language, as a social phenomenon, is in constant evolution. New words are added, disused ones are forgotten, and some others change their morphology and semantics to adapt to a dynamic World. Today we are leaving a new “Social Media” revolution, that is changing many languages. The pace with which new words are created in social media is unprecedented. People from different demographic groups are often “speaking different languages”, in that not only they use a different set of words, but also assign different meanings to the same words. In this paper, we investigate whether it is possible to lower the “linguistic barrier”, by analyzing the phenomenon of language evolution in social media, and by evaluating to what extent the use of cooperative on-line dictionaries and natural language processing techniques can help in tracking and regulate the evolution of languages in the social media era. We report a study of language evolution in a specific social media, Twitter; and we evaluate whether cooperative dictionaries (specifically Urban Dictionary) can be used to deal with the evolving language. We discover that this method partially solves the problem, by allowing a better understanding of the behavior of new words and expressions. We then analyze how natural language processing techniques can be used to capture the meaning of new words and expressions.

[1]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[2]  J. Firth Papers in linguistics , 1958 .

[3]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[4]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[5]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[6]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[7]  Steven Abney,et al.  Part-of-Speech Tagging and Partial Parsing , 1997 .

[8]  Hinrich Schütze,et al.  Ambiguity resolution in language learning , 1997 .

[9]  Roberto Basili,et al.  Parsing engineering and empirical robustness , 2002, Natural Language Engineering.

[10]  Peter D. Turney Similarity of Semantic Relations , 2006, CL.

[11]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[12]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[13]  Fabio Massimo Zanzotto,et al.  Natural Language Processing Across Time: An Empirical Investigation on Italian , 2008, GoTAL.

[14]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[15]  Mark Dredze,et al.  Annotating Named Entities in Twitter Data with Crowdsourcing , 2010, Mturk@HLT-NAACL.

[16]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[17]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[18]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.