FuzzE: Fuzzy Fairness Evaluation of Offensive Language Classifiers on African-American English

Hate speech and offensive language are rampant on social media. Machine learning has provided a way to moderate foul language at scale. However, much of the current research focuses on overall performance. Models may perform poorly on text written in a minority dialectal language. For instance, a hate speech classifier may produce more false positives on tweets written in African-American Vernacular English (AAVE). To measure these problems, we need text written in both AAVE and Standard American English (SAE). Unfortunately, it is challenging to curate data for all linguistic styles in a timely manner—especially when we are constrained to specific problems, social media platforms, or by limited resources. In this paper, we answer the question, “How can we evaluate the performance of classifiers across minority dialectal languages when they are not present within a particular dataset?” Specifically, we propose an automated fairness fuzzing tool called FuzzE to quantify the fairness of text classifiers applied to AAVE text using a dataset that only contains text written in SAE. Overall, we find that the fairness estimates returned by our technique moderately correlates with the use of real ground-truth AAVE text. Warning: Offensive language is displayed in this manuscript.

[1]  Ankur Taly,et al.  Counterfactual Fairness in Text Classification through Robustness , 2018, AIES.

[2]  Lucy Vasserman,et al.  Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification , 2019, WWW.

[3]  Yoav Goldberg,et al.  Adversarial Removal of Demographic Attributes from Text Data , 2018, EMNLP.

[4]  Pascale Fung,et al.  Reducing Gender Bias in Abusive Language Detection , 2018, EMNLP.

[5]  Stan Matwin,et al.  Offensive Language Detection Using Multi-level Classification , 2010, Canadian Conference on AI.

[6]  Brendan T. O'Connor,et al.  Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[7]  Cícero Nogueira dos Santos,et al.  Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer , 2018, ACL.

[8]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[9]  Preslav Nakov,et al.  Predicting the Type and Target of Offensive Posts in Social Media , 2019, NAACL.

[10]  Joel Escudé Font Determining Bias in Machine Translation with Deep Learning Techniques , 2019 .

[11]  Inioluwa Deborah Raji,et al.  Model Cards for Model Reporting , 2018, FAT.

[12]  Marta R. Costa-jussà,et al.  Equalizing Gender Bias in Neural Machine Translation with Word Embeddings Techniques , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[13]  Alon Lavie,et al.  Meteor, M-BLEU and M-TER: Evaluation Metrics for High-Correlation with Human Rankings of Machine Translation Output , 2008, WMT@ACL.

[14]  Yejin Choi,et al.  Generating Topical Poetry , 2016, EMNLP.

[15]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[16]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[17]  Alon Lavie,et al.  Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[18]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[19]  Dongyan Zhao,et al.  Style Transfer in Text: Exploration and Evaluation , 2017, AAAI.

[20]  Carlos Urias Munoz,et al.  Automatic Generation of Random Self-Checking Test Cases , 1983, IBM Syst. J..

[21]  Lucia Specia,et al.  Word embeddings and discourse information for Quality Estimation , 2016, WMT.

[22]  Ehud Reiter,et al.  A Structured Review of the Validity of BLEU , 2018, CL.

[23]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[24]  Björn Gambäck,et al.  Using Convolutional Neural Networks to Classify Hate-Speech , 2017, ALW@ACL.

[25]  Iyad Rahwan,et al.  Darling or Babygirl ? Investigating Stylistic Bias in Sentiment Analysis , 2018 .

[26]  Lucy Vasserman,et al.  Measuring and Mitigating Unintended Bias in Text Classification , 2018, AIES.

[27]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[28]  Vasudeva Varma,et al.  Stereotypical Bias Removal for Hate Speech Detection Task using Knowledge-based Generalizations , 2019, WWW.

[29]  Percy Liang,et al.  Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer , 2018, NAACL.

[30]  Ryan Cotterell,et al.  Gender Bias in Contextualized Word Embeddings , 2019, NAACL.

[31]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[32]  Michael Wiegand,et al.  Inducing a Lexicon of Abusive Words – a Feature-Based Approach , 2018, NAACL.

[33]  Indre Zliobaite,et al.  A survey on measuring indirect discrimination in machine learning , 2015, ArXiv.

[34]  Njagi Dennis Gitari,et al.  A Lexicon-based Approach for Hate Speech Detection , 2015, MUE 2015.

[35]  Yulia Tsvetkov,et al.  Style Transfer Through Back-Translation , 2018, ACL.

[36]  Yejin Choi,et al.  The Risk of Racial Bias in Hate Speech Detection , 2019, ACL.

[37]  Allison Woodruff,et al.  Putting Fairness Principles into Practice: Challenges, Metrics, and Improvements , 2019, AIES.

[38]  Guillaume Lample,et al.  Multiple-Attribute Text Rewriting , 2018, ICLR.

[39]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Brendan T. O'Connor,et al.  Demographic Dialectal Variation in Social Media: A Case Study of African-American English , 2016, EMNLP.

[41]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[42]  Zeyu Li,et al.  Learning Gender-Neutral Word Embeddings , 2018, EMNLP.

[43]  Xiao Liu,et al.  DeepFuzz: Automatic Generation of Syntax Valid C Programs for Fuzz Testing , 2019, AAAI.

[44]  Lucia Specia,et al.  BLEU Deconstructed: Designing a Better MT Evaluation Metric , 2013, Int. J. Comput. Linguistics Appl..

[45]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.