KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media

In this paper, we describe our approach to utilize pre-trained BERT models with Convolutional Neural Networks for sub-task A of the Multilingual Offensive Language Identification shared task (OffensEval 2020), which is a part of the SemEval 2020. We show that combining CNN with BERT is better than using BERT on its own, and we emphasize the importance of utilizing pre-trained language models for downstream tasks. Our system, ranked 4th with macro averaged F1-Score of 0.897 in Arabic, 4th with score of 0.843 in Greek, and 3rd with score of 0.814 in Turkish. Additionally, we present ArabicBERT, a set of pre-trained transformer language models for Arabic that we share with the community.

[1]  David Robinson,et al.  Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network , 2018, ESWC.

[2]  Preslav Nakov,et al.  Predicting the Type and Target of Offensive Posts in Social Media , 2019, NAACL.

[3]  Laurent Romary,et al.  A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages , 2020, ACL.

[4]  Harith Alani,et al.  Contextual semantics for sentiment analysis of Twitter , 2016, Inf. Process. Manag..

[5]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[6]  Marcos Zampieri,et al.  Offensive Language Identification in Greek , 2020, LREC.

[7]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[8]  Çağrı Çöltekin,et al.  A Corpus of Turkish Offensive Language on Social Media , 2020, LREC.

[9]  Preslav Nakov,et al.  SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020) , 2020, SemEval@COLING.

[10]  Ahmed Abdelali,et al.  Arabic Offensive Language on Twitter: Analysis and Experiments , 2020, ArXiv.

[11]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[12]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[13]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[14]  Ming-Wei Chang,et al.  Well-Read Students Learn Better: On the Importance of Pre-training Compact Models , 2019 .

[15]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[16]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[17]  Hong Zhou,et al.  The Automatic Text Classification Method Based on BERT and Feature Union , 2019, 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS).