Detecting Derogatory Compounds – An Unsupervised Approach

We examine the new task of detecting derogatory compounds (e.g. “curry muncher”). Derogatory compounds are much more difficult to detect than derogatory unigrams (e.g. “idiot”) since they are more sparsely represented in lexical resources previously found effective for this task (e.g. Wiktionary). We propose an unsupervised classification approach that incorporates linguistic properties of compounds. It mostly depends on a simple distributional representation. We compare our approach against previously established methods proposed for extracting derogatory unigrams.

[1]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[2]  Michael Wiegand,et al.  Inducing a Lexicon of Abusive Words – a Feature-Based Approach , 2018, NAACL.

[3]  Henry Lieberman,et al.  Common Sense Reasoning for Detection, Prevention, and Mitigation of Cyberbullying , 2012, TIIS.

[4]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[5]  Njagi Dennis Gitari,et al.  A Lexicon-based Approach for Hate Speech Detection , 2015, MUE 2015.

[6]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[7]  Roland Schäfer,et al.  Processing and querying large web corpora with the COW14 architecture , 2015 .

[8]  Partha Pratim Talukdar,et al.  Weakly-Supervised Acquisition of Labeled Class Instances using Graph Random Walks , 2008, EMNLP.

[9]  Corina Dima,et al.  Reverse-engineering Language: A Study on the Semantic Compositionality of German Compounds , 2015, EMNLP.

[10]  G. Zipf,et al.  The Psycho-Biology of Language , 1936 .

[11]  Eneko Agirre,et al.  Personalizing PageRank for Word Sense Disambiguation , 2009, EACL.

[12]  Stan Matwin,et al.  Offensive Language Detection Using Multi-level Classification , 2010, Canadian Conference on AI.

[13]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[14]  Cornelia Caragea,et al.  Content-Driven Detection of Cyberbullying on the Instagram Social Network , 2016, IJCAI.

[15]  Michael Wiegand,et al.  Overview of the GermEval 2018 Shared Task on the Identification of Offensive Language , 2018 .

[16]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[17]  Manfred Klenner,et al.  PolArt: A Robust Tool for Sentiment Analysis , 2009, NODALIDA.

[18]  Elizabeth F. Churchill,et al.  Automatic identification of personal insults on social news sites , 2012, J. Assoc. Inf. Sci. Technol..