Leveraging Multilingual Transformers for Hate Speech Detection

Detecting and classifying instances of hate in social media text has been a problem of interest in Natural Language Processing in the recent years. Our work leverages state of the art Transformer language models to identify hate speech in a multilingual setting. Capturing the intent of a post or a comment on social media involves careful evaluation of the language style, semantic content and additional pointers such as hashtags and emojis. In this paper, we look at the problem of identifying whether a Twitter post is hateful and offensive or not. We further discriminate the detected toxic content into one of the following three classes: (a) Hate Speech (HATE), (b) Offensive (OFFN) and (c) Profane (PRFN). With a pre-trained multilingual Transformer-based text encoder at the base, we are able to successfully identify and classify hate speech from multiple languages. On the provided testing corpora, we achieve Macro F1 scores of 90.29, 81.87 and 75.40 for English, German and Hindi respectively while performing hate speech detection and of 60.70, 53.28 and 49.74 during fine-grained classification. In our experiments, we show the efficacy of Perspective API features for hate speech classification and the effects of exploiting a multilingual training scheme. A feature selection study is provided to illustrate impacts of specific features upon the architecture’s classification head.

[1]  Isabelle Augenstein,et al.  emoji2vec: Learning Emoji Representations from their Description , 2016, SocialNLP@EMNLP.

[2]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  Vasudeva Varma,et al.  Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[7]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[8]  Iryna Gurevych,et al.  Making Monolingual Sentence Embeddings Multilingual Using Knowledge Distillation , 2020, EMNLP.

[9]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[10]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[11]  Michael Wiegand,et al.  Overview of GermEval Task 2, 2019 Shared Task on the Identification of Offensive Language , 2019, KONVENS.

[12]  Prasenjit Majumder,et al.  Overview of the HASOC track at FIRE 2019: Hate Speech and Offensive Content Identification in Indo-European Languages , 2019, FIRE.

[13]  Yuzhou Wang,et al.  Locate the Hate: Detecting Tweets against Blacks , 2013, AAAI.

[14]  Joel R. Tetreault,et al.  Do Characters Abuse More Than Words? , 2016, SIGDIAL Conference.

[15]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[16]  Amit P. Sheth,et al.  Cursing in English on twitter , 2014, CSCW.

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[21]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[22]  Preslav Nakov,et al.  SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval) , 2019, *SEMEVAL.

[23]  Ying Chen,et al.  Detecting Offensive Language in Social Media to Protect Adolescent Online Safety , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.