TFW, DamnGina, Juvie, and Hotsie-Totsie: On the Linguistic and Social Aspects of Internet Slang

Slang is ubiquitous on the Internet. The emergence of new social contexts like micro-blogs, question-answering forums, and social networks has enabled slang and non-standard expressions to abound on the web. Despite this, slang has been traditionally viewed as a form of non-standard language -- a form of language that is not the focus of linguistic analysis and has largely been neglected. In this work, we use UrbanDictionary to conduct the first large-scale linguistic analysis of slang and its social aspects on the Internet to yield insights into this variety of language that is increasingly used all over the world online. We begin by computationally analyzing the phonological, morphological and syntactic properties of slang. We then study linguistic patterns in four specific categories of slang namely alphabetisms, blends, clippings, and reduplicatives. Our analysis reveals that slang demonstrates extra-grammatical rules of phonological and morphological formation that markedly distinguish it from the standard form shedding insight into its generative patterns. Next, we analyze the social aspects of slang by studying subject restriction and stereotyping in slang usage. Analyzing tens of thousands of such slang words reveals that the majority of slang on the Internet belongs to two major categories: sex and drugs. We also noted that not only is slang usage not immune to prevalent social biases and prejudices but also reflects such biases and stereotypes more intensely than the standard variety.

[1]  Brendan T. O'Connor,et al.  Diffusion of Lexical Change in Social Media , 2012, PloS one.

[2]  Kevin Knight,et al.  How to Make a Frenemy: Multitape FSTs for Portmanteau Generation , 2015, HLT-NAACL.

[3]  David Crystal,et al.  Internet Linguistics: A Student Guide , 2011 .

[4]  Bethany K. Dumas,et al.  IS SLANG A WORD FOR LINGUISTS , 1978 .

[5]  Elisa Mattiello,et al.  The Pervasiveness o fSlang in Standard and Non-Standard English , 2005 .

[6]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[7]  Arun Ross,et al.  Open Set Fingerprint Spoof Detection Across Novel Fabrication Materials , 2015, IEEE Transactions on Information Forensics and Security.

[8]  K. Crawford Artificial Intelligence's White Guy Problem , 2016 .

[9]  Elisa Mattiello,et al.  An Introduction to English Slang: A Description of its Morphology, Semantics and Sociology , 2008 .

[10]  Pushpak Bhattacharyya,et al.  SlangNet: A WordNet like resource for English Slang , 2016, LREC.

[11]  Tat-Seng Chua,et al.  Mining slang and urban opinion words and phrases from cQA services: an optimization approach , 2012, WSDM '12.

[12]  S. Tagliamonte,et al.  LINGUISTIC RUIN? LOL! INSTANT MESSAGING AND TEEN LANGUAGE , 2008 .

[13]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[14]  Adam Tauman Kalai,et al.  Quantifying and Reducing Stereotypes in Word Embeddings , 2016, ArXiv.

[15]  William Labov,et al.  Some principles of linguistic methodology , 1972, Language in Society.

[16]  Suzanne Romaine,et al.  Locating language in time and space: William Labov (ed.), Quantitative Analyses of Linguistic Structure, volume 1. Academic Press, New York. 271 pp. , 1983 .

[17]  Terrance E. Boult,et al.  The Extreme Value Machine , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Joanna Bryson,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[19]  Francesco Bonchi,et al.  Algorithmic Bias: From Discrimination Discovery to Fairness-aware Data Mining , 2016, KDD.

[20]  Indre Zliobaite,et al.  A survey on measuring indirect discrimination in machine learning , 2015, ArXiv.

[21]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[22]  Robert L. Chapman,et al.  New dictionary of American slang , 1986 .

[23]  Geoffrey Zweig,et al.  Sequence-to-sequence neural net models for grapheme-to-phoneme conversion , 2015, INTERSPEECH.

[24]  Jieyu Zhao,et al.  Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints , 2017, EMNLP.

[25]  Terrance E. Boult,et al.  Multi-class Open Set Recognition Using Probability of Inclusion , 2014, ECCV.

[26]  C. Eble Slang and Sociability: In-Group Language Among College Students , 1996 .

[27]  Yoshua Bengio,et al.  Learning to Understand Phrases by Embedding the Dictionary , 2015, TACL.

[28]  Henry Louis Mencken,et al.  The American Language: An Inquiry Into the Development of English in the United States , 1919 .

[29]  Finn Årup Nielsen,et al.  A New ANEW: Evaluation of a Word List for Sentiment Analysis in Microblogs , 2011, #MSM.

[30]  Mikko Kurimo,et al.  Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline , 2013 .

[31]  Karl Sornig,et al.  Lexical Innovation: A study of slang, colloquialisms and casual speech , 1981 .

[32]  David Bamman,et al.  Gender identity and lexical variation in social media , 2012, 1210.4567.

[33]  M. Asghar Detection and Scoring of Internet Slangs for Sentiment Analysis Using SentiWordNet , 2014 .

[34]  William M. Randall,et al.  Dictionary of American Slang. Maurice H. Weseen , 1935 .

[35]  W. Labov Locating Language in Time and Space , 1980 .

[36]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[37]  Terrance E. Boult,et al.  Probability Models for Open Set Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Jan Svartvik,et al.  A __ comprehensive grammar of the English language , 1988 .

[39]  Eric P. Xing,et al.  Discovering Sociolinguistic Associations with Structured Sparsity , 2011, ACL.

[40]  Mark Aronoff,et al.  Word Formation in Generative Grammar , 1979 .

[41]  Elisa Mattiello,et al.  Extra-grammatical Morphology in English: Abbreviations, Blends, Reduplicatives, and Related Phenomena , 2013 .