论文信息 - Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer - 字舞流文

Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer

We introduce a new approach to tackle the problem of offensive language in online social media. Our approach uses unsupervised text style transfer to translate offensive sentences into non-offensive ones. We propose a new method for training encoder-decoders using non-parallel data that combines a collaborative classifier, attention and the cycle consistency loss. Experimental results on data from Twitter and Reddit show that our method outperforms a state-of-the-art text style transfer system in two out of three quantitative metrics and produces reliable non-offensive transferred sentences.

Cícero Nogueira dos Santos | Inkit Padhi | Igor Melnyk | C. D. Santos | Inkit Padhi | Igor Melnyk

[1] Amit P. Sheth,et al. Cursing in English on twitter , 2014, CSCW.

[2] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[3] Matthew Leighton Williams,et al. Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making , 2015 .

[4] Eneko Agirre,et al. Unsupervised Neural Machine Translation , 2017, ICLR.

[5] Yoav Goldberg,et al. Controlling Linguistic Style Aspects in Neural Language Generation , 2017, ArXiv.

[6] Joel R. Tetreault,et al. Abusive Language Detection in Online User Content , 2016, WWW.

[7] Joelle Pineau,et al. A Deep Reinforcement Learning Chatbot , 2017, ArXiv.

[8] Dongyan Zhao,et al. Style Transfer in Text: Exploration and Evaluation , 2017, AAAI.

[9] Athena Vakali,et al. A Unified Deep Learning Architecture for Abuse Detection , 2018, WebSci.

[10] Yuzhou Wang,et al. Locate the Hate: Detecting Tweets against Blacks , 2013, AAAI.

[11] Guillaume Lample,et al. Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[12] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[13] Regina Barzilay,et al. Style Transfer from Non-Parallel Text by Cross-Alignment , 2017, NIPS.

[14] Cícero Nogueira dos Santos,et al. Improved Neural Text Attribute Transfer with Non-parallel Data , 2017, ArXiv.

[15] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[16] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[17] Julia Hirschberg,et al. Detecting Hate Speech on the World Wide Web , 2012 .

[18] Carolyn Penstein Rosé,et al. Detecting offensive tweets via topical feature discovery over a large scale twitter corpus , 2012, CIKM.

[19] Peter Henderson,et al. Ethical Challenges in Data-Driven Dialogue Systems , 2017, AIES.

[20] Alan Ritter,et al. Unsupervised Modeling of Twitter Conversations , 2010, NAACL.

[21] Ingmar Weber,et al. Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[22] Eric P. Xing,et al. Toward Controlled Generation of Text , 2017, ICML.