Challenges in discriminating profanity from hate speech

ABSTRACT In this study, we approach the problem of distinguishing general profanity from hate speech in social media, something which has not been widely considered. Using a new dataset annotated specifically for this task, we employ supervised classification along with a set of features that includes -grams, skip-grams and clustering-based word representations. We apply approaches based on single classifiers as well as more advanced ensemble classifiers and stacked generalisation, achieving the best result of accuracy for this 3-class classification task. Analysis of the results reveals that discriminating hate speech and profanity is not a simple task, which may require features that capture a deeper understanding of the text not always possible with surface -grams. The variability of gold labels in the annotated data, due to differences in the subjective adjudications of the annotators, is also an issue. Other directions for future work are discussed.

[1]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[3]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Juan José Rodríguez Diez,et al.  A weighted voting framework for classifiers ensembles , 2012, Knowledge and Information Systems.

[5]  Tomaz Erjavec,et al.  Legal Framework, Dataset and Annotation Schema for Socially Unacceptable Online Discourse Practices in Slovene , 2017, ALW@ACL.

[6]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[7]  Björn Ross,et al.  Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis , 2016, ArXiv.

[8]  Kagan Tumer,et al.  Classifier ensembles: Select real-world applications , 2008, Inf. Fusion.

[9]  Henry Lieberman,et al.  Modeling the Detection of Textual Cyberbullying , 2011, The Social Mobile Web.

[10]  Shervin Malmasi,et al.  NLI Shared Task 2013: MQ Submission , 2013, BEA@NAACL-HLT.

[11]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[12]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[13]  Joel R. Tetreault,et al.  Oracle and Human Baselines for Native Language Identification , 2015, BEA@NAACL-HLT.

[14]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[15]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[16]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[17]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[18]  Jun-Ming Xu,et al.  Learning from Bullying Traces in Social Media , 2012, NAACL.

[19]  Shervin Malmasi,et al.  LTG at SemEval-2016 Task 11: Complex Word Identification with Classifier Ensembles , 2016, *SEMEVAL.

[20]  Yuzhou Wang,et al.  Locate the Hate: Detecting Tweets against Blacks , 2013, AAAI.

[21]  Walter Daelemans,et al.  A Dictionary-based Approach to Racism Detection in Dutch Social Media , 2016, ArXiv.

[22]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[23]  Scott Miller,et al.  Name Tagging with Word Clusters and Discriminative Training , 2004, NAACL.

[24]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[25]  Brendan T. O'Connor,et al.  Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[26]  Emilio Corchado,et al.  A survey of multiple classifier systems as hybrid systems , 2014, Inf. Fusion.

[27]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[28]  Moshe Koppel,et al.  Authorship verification as a one-class classification problem , 2004, ICML.

[29]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[30]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[31]  Dolf Trieschnigg,et al.  Improving Cyberbullying Detection with User Context , 2013, ECIR.

[32]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[33]  Yang Xiang,et al.  Chinese Grammatical Error Diagnosis Using Ensemble Learning , 2015, NLP-TEA@ACL/IJCNLP.

[34]  Walid Magdy,et al.  Abusive Language Detection on Arabic Social Media , 2017, ALW@ACL.

[35]  Xavier Carreras,et al.  Simple Semi-supervised Dependency Parsing , 2008, ACL.

[36]  Jing Zhou,et al.  Hate Speech Detection with Comment Embeddings , 2015, WWW.

[37]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[38]  Zeerak Waseem,et al.  Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter , 2016, NLP+CSS@EMNLP.

[39]  Gang Fu,et al.  Decision Combination of Multiple Classifiers , 2008, Int. J. Pattern Recognit. Artif. Intell..

[40]  Matthew Leighton Williams,et al.  Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making , 2015 .

[41]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[42]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[43]  Hui-Po Su,et al.  Rephrasing Profanity in Chinese Text , 2017, ALW@ACL.

[44]  Joel R. Tetreault,et al.  Abusive Language Detection in Online User Content , 2016, WWW.

[45]  Shervin Malmasi,et al.  Language Transfer Hypotheses with Linear SVM Weights , 2014, EMNLP.

[46]  Shervin Malmasi,et al.  Large-Scale Native Language Identification with Cross-Corpus Evaluation , 2015, NAACL.