Why Swear? Analyzing and Inferring the Intentions of Vulgar Expressions

Vulgar words are employed in language use for several different functions, ranging from expressing aggression to signaling group identity or the informality of the communication. This versatility of usage of a restricted set of words is challenging for downstream applications and has yet to be studied quantitatively or using natural language processing techniques. We introduce a novel data set of 7,800 tweets from users with known demographic traits where all instances of vulgar words are annotated with one of the six categories of vulgar word use. Using this data set, we present the first analysis of the pragmatic aspects of vulgarity and how they relate to social factors. We build a model able to predict the category of a vulgar word based on the immediate context it appears in with 67.4 macro F1 across six classes. Finally, we demonstrate the utility of modeling the type of vulgar word use in context by using this information to achieve state-of-the-art performance in hate speech detection on a benchmark data set.

[1]  Christopher Ellis,et al.  Ideology in America , 2012 .

[2]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[3]  Julia Hirschberg,et al.  Detecting Hate Speech on the World Wide Web , 2012 .

[4]  Amit P. Sheth,et al.  Cursing in English on twitter , 2014, CSCW.

[5]  Chris Dyer,et al.  Part-of-Speech Tagging for Twitter : Word Clusters and Other Advances , 2012 .

[6]  Joel R. Tetreault,et al.  Abusive Language Detection in Online User Content , 2016, WWW.

[7]  Timothy Jay,et al.  The pragmatics of swearing , 2008 .

[8]  Ziqi Zhang,et al.  Hate Speech Detection: A Solved Problem? The Challenging Case of Long Tail on Twitter , 2018, Semantic Web.

[9]  Kalina Bontcheva,et al.  Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data , 2013, RANLP.

[10]  Matthew Purver,et al.  Twitter Language Use Reflects Psychological Differences between Democrats and Republicans , 2015, PloS one.

[11]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[12]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[13]  Jing Zhou,et al.  Hate Speech Detection with Comment Embeddings , 2015, WWW.

[14]  Junyi Jessy Li,et al.  Expressively vulgar: The socio-dynamics of vulgarity and its effects on sentiment analysis in social media , 2018, COLING.

[15]  Shervin Malmasi,et al.  Challenges in discriminating profanity from hate speech , 2017, J. Exp. Theor. Artif. Intell..

[16]  Gary W. Selnow,et al.  Sex differences in uses and perceptions of profanity , 1985 .

[17]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[18]  Michael Gauthier,et al.  Text Mining and Twitter to Analyze British Swearing Habits , 2015 .

[19]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[20]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[21]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[22]  P Ryder-Davies,et al.  Bad language? , 1992, Veterinary Record.

[23]  Alan Ritter,et al.  Unsupervised Modeling of Twitter Conversations , 2010, NAACL.

[24]  S. Hannabuss The Stuff of Thought: Language as a Window into Human Nature , 2008 .

[25]  Margaret L. Kern,et al.  Real Men Don’t Say “Cute” , 2016, Social Psychological and Personality Science.

[26]  Lyle H. Ungar,et al.  Beyond Binary Labels: Political Ideology Prediction of Twitter Users , 2017, ACL.

[27]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[28]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[29]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[30]  Richard Stephens,et al.  Swearing as a response to pain , 2009, Neuroreport.

[31]  J. Pennebaker,et al.  Are Women Really More Talkative Than Men? , 2007, Science.

[32]  Matthew Leighton Williams,et al.  Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making , 2015 .

[33]  M. González Politeness: some universals in language usage , 1995 .

[34]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[35]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[36]  A. McEnery Swearing in English: Bad Language, Purity and Power from 1586 to the Present , 2004 .

[37]  Quang Phuc Dong,et al.  1. English Sentences Without Overt Grammatical Subject , 1992 .

[38]  Timothy B. Jay The Utility and Ubiquity of Taboo Words , 2009, Perspectives on psychological science : a journal of the Association for Psychological Science.