论文信息 - KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media

KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media

In this paper, we describe our approach to utilize pre-trained BERT models with Convolutional Neural Networks for sub-task A of the Multilingual Offensive Language Identification shared task (OffensEval 2020), which is a part of the SemEval 2020. We show that combining CNN with BERT is better than using BERT on its own, and we emphasize the importance of utilizing pre-trained language models for downstream tasks. Our system, ranked 4th with macro averaged F1-Score of 0.897 in Arabic, 4th with score of 0.843 in Greek, and 3rd with score of 0.814 in Turkish. Additionally, we present ArabicBERT, a set of pre-trained transformer language models for Arabic that we share with the community.

[1] David Robinson,et al. Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network , 2018, ESWC.

[2] Preslav Nakov,et al. Predicting the Type and Target of Offensive Posts in Social Media , 2019, NAACL.

[3] Laurent Romary,et al. A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages , 2020, ACL.

[4] Harith Alani,et al. Contextual semantics for sentiment analysis of Twitter , 2016, Inf. Process. Manag..

[5] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[6] Marcos Zampieri,et al. Offensive Language Identification in Greek , 2020, LREC.

[7] Michael Wiegand,et al. A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[8] Çağrı Çöltekin,et al. A Corpus of Turkish Offensive Language on Social Media , 2020, LREC.

[9] Preslav Nakov,et al. SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020) , 2020, SemEval@COLING.

[10] Ahmed Abdelali,et al. Arabic Offensive Language on Twitter: Analysis and Experiments , 2020, ArXiv.

[11] Ingmar Weber,et al. Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[12] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[13] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[14] Ming-Wei Chang,et al. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models , 2019 .

[15] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[16] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[17] Hong Zhou,et al. The Automatic Text Classification Method Based on BERT and Feature Union , 2019, 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS).