A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media

Generated hateful and toxic content by a portion of users in social media is a rising phenomenon that motivated researchers to dedicate substantial efforts to the challenging direction of hateful content identification. We not only need an efficient automatic hate speech detection model based on advanced machine learning and natural language processing, but also a sufficiently large amount of annotated data to train a model. The lack of a sufficient amount of labelled hate speech data, along with the existing biases, has been the main issue in this domain of research. To address these needs, in this study we introduce a novel transfer learning approach based on an existing pre-trained language model called BERT (Bidirectional Encoder Representations from Transformers). More specifically, we investigate the ability of BERT at capturing hateful context within social media content by using new fine-tuning methods based on transfer learning. To evaluate our proposed approach, we use two publicly available datasets that have been annotated for racism, sexism, hate, or offensive content on Twitter. The results show that our solution obtains considerable performance on these datasets in terms of precision and recall in comparison to existing approaches. Consequently, our model can capture some biases in data annotation and collection process and can potentially lead us to a more accurate model.

[1]  David Robinson,et al.  Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network , 2018, ESWC.

[2]  Shervin Malmasi,et al.  Challenges in discriminating profanity from hate speech , 2017, J. Exp. Theor. Artif. Intell..

[3]  Jing Zhou,et al.  Hate Speech Detection with Comment Embeddings , 2015, WWW.

[4]  Joel R. Tetreault,et al.  Do Characters Abuse More Than Words? , 2016, SIGDIAL Conference.

[5]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[6]  Sérgio Nunes,et al.  A Survey on Automatic Detection of Hate Speech in Text , 2018, ACM Comput. Surv..

[7]  Ingmar Weber,et al.  Understanding Abuse: A Typology of Abusive Language Detection Subtasks , 2017, ALW@ACL.

[8]  Ingmar Weber,et al.  Racial Bias in Hate Speech and Abusive Language Detection Datasets , 2019, Proceedings of the Third Workshop on Abusive Language Online.

[9]  Yejin Choi,et al.  The Risk of Racial Bias in Hate Speech Detection , 2019, ACL.

[10]  Huai Liu,et al.  Metamorphic Testing , 2018, ACM Comput. Surv..

[11]  Zeerak Waseem,et al.  Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter , 2016, NLP+CSS@EMNLP.

[12]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[13]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[14]  Vasudeva Varma,et al.  Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[15]  Sebastian Ruder,et al.  Fine-tuned Language Models for Text Classification , 2018, ArXiv.

[16]  Matthew Leighton Williams,et al.  Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making , 2015 .

[17]  Savvas Zannettou,et al.  "And We Will Fight For Our Race!" A Measurement Study of Genetic Testing Conversations on Reddit and 4chan , 2019, ICWSM.

[18]  Joachim Bingel,et al.  Bridging the Gaps: Multi Task Learning for Domain Transfer of Hate Speech Detection , 2018 .

[19]  Joel R. Tetreault,et al.  Abusive Language Detection in Online User Content , 2016, WWW.

[20]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21]  Virgílio A. F. Almeida,et al.  Analyzing Right-wing YouTube Channels: Hate, Violence and Discrimination , 2018, WebSci.

[22]  Athena Vakali,et al.  A Unified Deep Learning Architecture for Abuse Detection , 2018, WebSci.

[23]  Björn Gambäck,et al.  Using Convolutional Neural Networks to Classify Hate-Speech , 2017, ALW@ACL.

[24]  Michael Wiegand,et al.  Detection of Abusive Language: the Problem of Biased Datasets , 2019, NAACL.

[25]  Kush R. Varshney,et al.  The Effect of Extremist Violence on Hateful Speech Online , 2018, ICWSM.

[26]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.