Methods for Detoxification of Texts for the Russian Language

We introduce the first study of automatic detoxification of Russian texts to combat offensive language. Such a kind of textual style transfer can be used, for instance, for processing toxic content in social media. While much work has been done for the English language in this field, it has never been solved for the Russian language yet. We test two types of models – unsupervised approach based on BERT architecture that performs local corrections and supervised approach based on pretrained language GPT-2 model – and compare them with several baselines. In addition, we describe evaluation setup providing training datasets and metrics for automatic evaluation. The results show that the tested approaches can be successfully used for detoxification, although there is room for improvement.

[1]  I. Kivlichan,et al.  Capturing Covertly Toxic Speech via Crowdsourcing , 2021, HCINLP.

[2]  Endang Wahyu Pamungkas,et al.  Cross-domain and Cross-lingual Abusive Language Detection: A Hybrid Approach with Deep Learning and a Multilingual Lexicon , 2019, ACL.

[3]  Cícero Nogueira dos Santos,et al.  Improved Neural Text Attribute Transfer with Non-parallel Data , 2017, ArXiv.

[4]  Yipeng Zhang,et al.  Towards A Friendly Online Community: An Unsupervised Style Transfer Framework for Profanity Redaction , 2020, COLING.

[5]  Regina Barzilay,et al.  Style Transfer from Non-Parallel Text by Cross-Alignment , 2017, NIPS.

[6]  Yulia Tsvetkov,et al.  Fortifying Toxic Speech Detectors Against Veiled Toxicity , 2020, EMNLP.

[7]  Kevin Gimpel,et al.  Unsupervised Evaluation Metrics and Learning Criteria for Non-Parallel Textual Transfer , 2018, EMNLP.

[8]  Andrey Kutuzov,et al.  WebVectors: A Toolkit for Building Web Interfaces for Vector Semantic Models , 2016, AIST.

[9]  Zhiting Hu,et al.  Deep Learning for Text Style Transfer: A Survey , 2020, Computational Linguistics.

[10]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[11]  Yiming Yang,et al.  Politeness Transfer: A Tag and Generate Approach , 2020, ACL.

[12]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[13]  Cícero Nogueira dos Santos,et al.  Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer , 2018, ACL.

[14]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[15]  Irina Illina,et al.  Towards non-toxic landscapes: Automatic toxic comment detection using DNN , 2020, TRAC@LREC.

[16]  Diyi Yang,et al.  Automatically Neutralizing Subjective Bias in Text , 2019, AAAI.

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18]  James Pustejovsky,et al.  A Computational Theory of Prose Style for Natural Language Generation , 1985, EACL.

[19]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[20]  Emily Ahn,et al.  Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts , 2019, EMNLP.

[21]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[22]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[23]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[24]  Sérgio Nunes,et al.  A Survey on Automatic Detection of Hate Speech in Text , 2018, ACM Comput. Surv..

[25]  Joel R. Tetreault,et al.  Dear Sir or Madam, May I Introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer , 2018, NAACL.

[26]  Michael Wiegand,et al.  Overview of the GermEval 2018 Shared Task on the Identification of Offensive Language , 2018 .

[27]  Alexander Panchenko,et al.  Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution , 2020, COLING.

[28]  Percy Liang,et al.  Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer , 2018, NAACL.

[29]  Mikhail Arkhipov,et al.  Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language , 2019, ArXiv.

[30]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[31]  Preslav Nakov,et al.  Predicting the Type and Target of Offensive Posts in Social Media , 2019, NAACL.

[32]  Ivan P. Yamshchikov,et al.  What is wrong with style transfer for texts? , 2018, ArXiv.

[33]  Lili Mou,et al.  Disentangled Representation Learning for Non-Parallel Text Style Transfer , 2018, ACL.

[34]  Xing Wu,et al.  Conditional BERT Contextual Augmentation , 2018, ICCS.

[35]  D. W. Sue,et al.  Racial microaggressions in everyday life: implications for clinical practice. , 2007, The American psychologist.

[36]  Alexey Tikhonov,et al.  Style-transfer and Paraphrase: Looking for a Sensible Semantic Similarity Metric , 2021, AAAI.

[37]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Mohit Iyyer,et al.  Reformulating Unsupervised Style Transfer as Paraphrase Generation , 2020, EMNLP.

[39]  Tao Zhang,et al.  Mask and Infill: Applying Masked Language Model for Sentiment Transfer , 2019, IJCAI.

[40]  Camille Pradel,et al.  Load What You Need: Smaller Versions of Mutlilingual BERT , 2020, SUSTAINLP.

[41]  Paolo Rosso,et al.  SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter , 2019, *SEMEVAL.