Expressively vulgar: The socio-dynamics of vulgarity and its effects on sentiment analysis in social media

Vulgarity is a common linguistic expression and is used to perform several linguistic functions. Understanding their usage can aid both linguistic and psychological phenomena as well as benefit downstream natural language processing applications such as sentiment analysis. This study performs a large-scale, data-driven empirical analysis of vulgar words using social media data. We analyze the socio-cultural and pragmatic aspects of vulgarity using tweets from users with known demographics. Further, we collect sentiment ratings for vulgar tweets to study the relationship between the use of vulgar words and perceived sentiment and show that explicitly modeling vulgar words can boost sentiment analysis performance.

[1]  Yi Yang,et al.  Overcoming Language Variation in Sentiment Analysis with Social Attention , 2015, TACL.

[2]  Mathieu Cliche,et al.  BB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs , 2017, *SEMEVAL.

[3]  Veselin Stoyanov,et al.  Evaluation Measures for the SemEval-2016 Task 4 “Sentiment Analysis in Twitter” (Draft: Version 1.13) , 2016 .

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Christopher Ellis,et al.  Ideology in America , 2012 .

[6]  Julia Hirschberg,et al.  Detecting Hate Speech on the World Wide Web , 2012 .

[7]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[8]  Matthew Purver,et al.  Twitter Language Use Reflects Psychological Differences between Democrats and Republicans , 2015, PloS one.

[9]  Amit P. Sheth,et al.  Cursing in English on twitter , 2014, CSCW.

[10]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[11]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[12]  Lyle H. Ungar,et al.  Studying the Dark Triad of Personality through Twitter Behavior , 2016, CIKM.

[13]  Ramón Fernández Astudillo,et al.  Learning Word Representations from Scarce and Noisy Data with Embedding Subspaces , 2015, ACL.

[14]  Gary W. Selnow,et al.  Sex differences in uses and perceptions of profanity , 1985 .

[15]  Timothy B. Jay The Utility and Ubiquity of Taboo Words , 2009, Perspectives on psychological science : a journal of the Association for Psychological Science.

[16]  Timothy B. Jay Cursing in America , 1992 .

[17]  Penelope Brown,et al.  Politeness: Some Universals in Language Usage , 1989 .

[18]  Jingyuan Li,et al.  Identifying vulgar content in eMule network through text classification , 2010, 2010 IEEE International Conference on Intelligence and Security Informatics.

[19]  Elizabeth F. Churchill,et al.  Profanity use in online communities , 2012, CHI.

[20]  Shervin Malmasi,et al.  Challenges in discriminating profanity from hate speech , 2017, J. Exp. Theor. Artif. Intell..

[21]  Alan Ritter,et al.  Unsupervised Modeling of Twitter Conversations , 2010, NAACL.

[22]  Dirk Hovy,et al.  Demographic Factors Improve Classification Performance , 2015, ACL.

[23]  Elizabeth F. Churchill,et al.  Using Crowdsourcing to Improve Profanity Detection , 2012, AAAI Spring Symposium: Wisdom of the Crowd.

[24]  David Yarowsky,et al.  Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media , 2013, EMNLP.

[25]  Salif Mahamane,et al.  Taboo Words in Expressive Language: Do Sex and Primary Language Matter? , 2012 .

[26]  Lucie Flekova,et al.  Analysing domain suitability of a sentiment lexicon by identifying distributionally bipolar words , 2015, WASSA@EMNLP.

[27]  Michael Gauthier,et al.  Text Mining and Twitter to Analyze British Swearing Habits , 2015 .

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Margaret L. Kern,et al.  Real Men Don’t Say “Cute” , 2016, Social Psychological and Personality Science.

[30]  Eugénio C. Oliveira,et al.  What We Can Learn from Looking at Profanity , 2014, PROPOR.

[31]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[32]  Timothy Jay,et al.  The pragmatics of swearing , 2008 .

[33]  Timothy Jay,et al.  Sex Roles and Dirty Word Usage: A Review of the Literature and a Reply to Haas. , 1980 .

[34]  Junyi Jessy Li,et al.  Improving the Annotation of Sentence Specificity , 2016, LREC.

[35]  A. McEnery Swearing in English: Bad Language, Purity and Power from 1586 to the Present , 2004 .

[36]  P Ryder-Davies,et al.  Bad language? , 1992, Veterinary Record.

[37]  Niranjan Balasubramanian,et al.  Human Centered NLP with User-Factor Adaptation , 2017, EMNLP.

[38]  Lyle H. Ungar,et al.  Beyond Binary Labels: Political Ideology Prediction of Twitter Users , 2017, ACL.

[39]  Preslav Nakov,et al.  SemEval-2016 Task 4: Sentiment Analysis in Twitter , 2016, *SEMEVAL.

[40]  S. Pinker The Stuff of Thought: Language as a Window into Human Nature , 2007 .

[41]  J. Pennebaker,et al.  Are Women Really More Talkative Than Men? , 2007, Science.