论文信息 - Detecting Offensive Language on Arabic Social Media Using Deep Learning

Detecting Offensive Language on Arabic Social Media Using Deep Learning

Offensive content on social media such as verbal attacks, demeaning comments or hate speech has many negative effects on its users. The automatic detection of offensive language on Arabic social media is an important step towards the regulation of such content for Arabic speaking users of social media. This paper presents the results of evaluating the performance of four different neural network architectures for this task: Convolutional Neural Network (CNN), Bidirectional Long Short-Term Memory (Bi-LSTM), Bi-LSTM with attention mechanism, and a combined CNN-LSTM architecture. These networks are trained and tested on a labeled dataset of Arabic YouTube comments. We run this dataset through a series of pre-processing steps and use Arabic word embeddings to represent the comments. We also apply Bayesian optimization techniques to tune the hyperparameters of the neural network models. We train and test each network using 5-fold cross validation. The CNN-LSTM achieves the highest recall (83.46%), followed by the CNN (82.24%), the Bi-LSTM with attention (81.51%) and the Bi-LSTM (80.97%).

Nikola S. Nikolov | Asmaa Mourhir | Hanane Mohaouchane | Asmaa Mourhir | Hanane Mohaouchane

[1] Nikos Pelekis,et al. DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis , 2017, *SEMEVAL.

[2] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3] Zhiyong Luo,et al. Combination of Convolutional and Recurrent Neural Network for Sentiment Analysis of Short Texts , 2016, COLING.

[4] A. Al-Hassan,et al. DETECTION OF HATE SPEECH IN SOCIAL NETWORKS: A SURVEY ON MULTILINGUAL CORPUS , 2019, Computer Science & Information Technology(CS & IT).

[5] Ahmed Serhrouchni,et al. Arabic Cyberbullying Detection: Using Deep Learning , 2018, 2018 7th International Conference on Computer and Communication Engineering (ICCCE).

[6] David D. Cox,et al. Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.

[7] James H. Jones,et al. Detection of Abusive Accounts with Arabic Tweets , 2022 .

[8] Peng Zhou,et al. Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling , 2016, COLING.

[9] Zhiyuan Liu,et al. A C-LSTM Neural Network for Text Classification , 2015, ArXiv.

[10] Zachary Chase Lipton. A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[13] Tong Zhang,et al. Effective Use of Word Order for Text Categorization with Convolutional Neural Networks , 2014, NAACL.

[14] Walid Magdy,et al. Abusive Language Detection on Arabic Social Media , 2017, ALW@ACL.

[15] S. Mengü,et al. Violence and Social Media , 2015 .

[16] Heri Ramampiaro,et al. Effective hate-speech detection in Twitter data using recurrent neural networks , 2018, Applied Intelligence.

[17] Nikola S. Nikolov,et al. Dataset Construction for the Detection of Anti-Social Behaviour in Online Communication in Arabic , 2018, ACLING.