A Unified Deep Learning Architecture for Abuse Detection

Hate speech, offensive language, sexism, racism, and other types of abusive behavior have become a common phenomenon in many online social media platforms. In recent years, such diverse abusive behaviors have been manifesting with increased frequency and levels of intensity. Despite social media's efforts to combat online abusive behaviors this problem is still apparent. In fact, up to now, they have entered an arms race with the perpetrators, who constantly change tactics to evade the detection algorithms deployed by these platforms. Such algorithms, not disclosed to the public for obvious reasons, are typically custom-designed and tuned to detect only one specific type of abusive behavior, but usually miss other related behaviors. In the present paper, we study this complex problem by following a more holistic approach, which considers the various aspects of abusive behavior. We focus on Twitter, due to its popularity, and analyze user and textual properties from different angles of abusive posting behavior. We propose a deep learning architecture, which utilizes a wide variety of available metadata, and combines it with automatically-extracted hidden patterns within the text of the tweets, to detect multiple abusive behavioral norms which are highly inter-related. The proposed unified architecture is applied in a seamless and transparent fashion without the need for any change of the architecture but only training a model for each task (i.e., different types of abusive behavior). We test the proposed approach with multiple datasets addressing different abusive behaviors on Twitter. Our results demonstrate high performance across all datasets, with the AUC value to range from 92% to 98%.

[1]  Vasudeva Varma,et al.  Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[2]  Yuzhou Wang,et al.  Locate the Hate: Detecting Tweets against Blacks , 2013, AAAI.

[3]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[6]  Matthew Leighton Williams,et al.  Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making , 2015 .

[7]  Reza Zafarani,et al.  Sarcasm Detection on Twitter: A Behavioral Modeling Approach , 2015, WSDM.

[8]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[9]  Shervin Malmasi,et al.  Challenges in discriminating profanity from hate speech , 2017, J. Exp. Theor. Artif. Intell..

[10]  Alexandros Karatzoglou,et al.  Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations , 2016, RecSys.

[11]  Henry Lieberman,et al.  Modeling the Detection of Textual Cyberbullying , 2011, The Social Mobile Web.

[12]  David Robinson,et al.  Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network , 2018, ESWC.

[13]  Alexander F. Gelbukh,et al.  Aggression Detection in Social Media: Using Deep Neural Networks, Data Augmentation, and Pseudo Labeling , 2018, TRAC@COLING 2018.

[14]  Sérgio Nunes,et al.  Merging Datasets for Hate Speech Classification in Italian , 2018, EVALITA@CLiC-it.

[15]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[16]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[17]  Joel R. Tetreault,et al.  Do Characters Abuse More Than Words? , 2016, SIGDIAL Conference.

[18]  Joel R. Tetreault,et al.  Abusive Language Detection in Online User Content , 2016, WWW.

[19]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[20]  Carmen Vaca,et al.  Requiem for online harassers: Identifying racism from political tweets , 2017, 2017 Fourth International Conference on eDemocracy & eGovernment (ICEDEG).

[21]  Ingmar Weber,et al.  Understanding Abuse: A Typology of Abusive Language Detection Subtasks , 2017, ALW@ACL.

[22]  Gianluca Stringhini,et al.  Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior , 2018, ICWSM.

[23]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[24]  Radhika Mamidi,et al.  When does a compliment become sexist? Analysis and classification of ambivalent sexism using twitter data , 2017, NLP+CSS@ACL.

[25]  Jing Zhou,et al.  Hate Speech Detection with Comment Embeddings , 2015, WWW.

[26]  Pascale Fung,et al.  One-step and Two-step Classification for Abusive Language Detection on Twitter , 2017, ALW@ACL.

[27]  Ying Chen,et al.  Detecting Offensive Language in Social Media to Protect Adolescent Online Safety , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[28]  Julia Hirschberg,et al.  Detecting Hate Speech on the World Wide Web , 2012 .

[29]  Haewoon Kwak,et al.  STFU NOOB!: predicting crowdsourced decisions on toxic behavior in online games , 2014, WWW.

[30]  Gianluca Stringhini,et al.  Mean Birds: Detecting Aggression and Bullying on Twitter , 2017, WebSci.

[31]  Björn Gambäck,et al.  Using Convolutional Neural Networks to Classify Hate-Speech , 2017, ALW@ACL.

[32]  Jack Grieve,et al.  Dimensions of Abusive Language on Twitter , 2017, ALW@ACL.

[33]  Heri Ramampiaro,et al.  Effective hate-speech detection in Twitter data using recurrent neural networks , 2018, Applied Intelligence.

[34]  Carolyn Penstein Rosé,et al.  Detecting offensive tweets via topical feature discovery over a large scale twitter corpus , 2012, CIKM.

[35]  Manish Shrivastava,et al.  Aggression Detection on Social Media Text Using Deep Neural Networks , 2018, ALW.