Detecting Offensive Language on Arabic Social Media Using Deep Learning

Offensive content on social media such as verbal attacks, demeaning comments or hate speech has many negative effects on its users. The automatic detection of offensive language on Arabic social media is an important step towards the regulation of such content for Arabic speaking users of social media. This paper presents the results of evaluating the performance of four different neural network architectures for this task: Convolutional Neural Network (CNN), Bidirectional Long Short-Term Memory (Bi-LSTM), Bi-LSTM with attention mechanism, and a combined CNN-LSTM architecture. These networks are trained and tested on a labeled dataset of Arabic YouTube comments. We run this dataset through a series of pre-processing steps and use Arabic word embeddings to represent the comments. We also apply Bayesian optimization techniques to tune the hyperparameters of the neural network models. We train and test each network using 5-fold cross validation. The CNN-LSTM achieves the highest recall (83.46%), followed by the CNN (82.24%), the Bi-LSTM with attention (81.51%) and the Bi-LSTM (80.97%).

[1]  Nikos Pelekis,et al.  DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis , 2017, *SEMEVAL.

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Zhiyong Luo,et al.  Combination of Convolutional and Recurrent Neural Network for Sentiment Analysis of Short Texts , 2016, COLING.

[4]  A. Al-Hassan,et al.  DETECTION OF HATE SPEECH IN SOCIAL NETWORKS: A SURVEY ON MULTILINGUAL CORPUS , 2019, Computer Science & Information Technology(CS & IT).

[5]  Ahmed Serhrouchni,et al.  Arabic Cyberbullying Detection: Using Deep Learning , 2018, 2018 7th International Conference on Computer and Communication Engineering (ICCCE).

[6]  David D. Cox,et al.  Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.

[7]  James H. Jones,et al.  Detection of Abusive Accounts with Arabic Tweets , 2022 .

[8]  Peng Zhou,et al.  Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling , 2016, COLING.

[9]  Zhiyuan Liu,et al.  A C-LSTM Neural Network for Text Classification , 2015, ArXiv.

[10]  Zachary Chase Lipton A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[13]  Tong Zhang,et al.  Effective Use of Word Order for Text Categorization with Convolutional Neural Networks , 2014, NAACL.

[14]  Walid Magdy,et al.  Abusive Language Detection on Arabic Social Media , 2017, ALW@ACL.

[15]  S. Mengü,et al.  Violence and Social Media , 2015 .

[16]  Heri Ramampiaro,et al.  Effective hate-speech detection in Twitter data using recurrent neural networks , 2018, Applied Intelligence.

[17]  Nikola S. Nikolov,et al.  Dataset Construction for the Detection of Anti-Social Behaviour in Online Communication in Arabic , 2018, ACLING.

[18]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[19]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[20]  Lei Huang,et al.  Text Classification Research with Attention-based Recurrent Neural Networks , 2018, Int. J. Comput. Commun. Control.

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  Muazzam Ahmed Siddiqui,et al.  Pre-trained Word Embeddings for Arabic Aspect-Based Sentiment Analysis of Airline Tweets , 2018, AISI.

[23]  Björn Gambäck,et al.  Using Convolutional Neural Networks to Classify Hate-Speech , 2017, ALW@ACL.

[24]  Gabriel Terejanu,et al.  Unsupervised Detection of Violent Content in Arabic Social Media , 2017 .

[25]  Nikola S. Nikolov,et al.  Towards Accurate Detection of Offensive Language in Online Communication in Arabic , 2018, ACLING.

[26]  James H. Jones,et al.  A Statistical Learning Approach to Detect Abusive Twitter Accounts , 2017, ICCDA '17.

[27]  James H. Jones,et al.  Improved Micro-Blog Classification for Detecting Abusive Arabic Twitter Accounts , 2016 .

[28]  Mirsad Hadzikadic,et al.  SEDAT: Sentiment and Emotion Detection in Arabic Text Using CNN-LSTM Deep Learning , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[29]  Joel R. Tetreault,et al.  Abusive Language Detection in Online User Content , 2016, WWW.

[30]  Vasudeva Varma,et al.  Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[31]  Hazem M. Hajj,et al.  EMA at SemEval-2018 Task 1: Emotion Mining for Arabic , 2018, *SEMEVAL.

[32]  Ye Zhang,et al.  A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification , 2015, IJCNLP.

[33]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[34]  Shivakant Mishra,et al.  International Conference on Advances in Social Networks Analysis and Mining ( ASONAM ) Are They Our Brothers ? Analysis and Detection of Religious Hate Speech in the Arabic Twittersphere , 2018 .

[35]  David Robinson,et al.  Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network , 2018, ESWC.

[36]  Samhaa R. El-Beltagy,et al.  AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP , 2017, ACLING.

[37]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[38]  Teona Gelashvili Hate Speech on Social Media: Implications of private regulation and governance gaps , 2018 .