论文信息 - Diverse Adversaries for Mitigating Bias in Training

Diverse Adversaries for Mitigating Bias in Training

Adversarial learning can learn fairer and less biased models of language than standard methods. However, current adversarial techniques only partially mitigate model bias, added to which their training procedures are often unstable. In this paper, we propose a novel approach to adversarial learning based on the use of multiple diverse discriminators, whereby discriminators are encouraged to learn orthogonal hidden representations from one another. Experimental results show that our method substantially improves over standard adversarial removal methods, in terms of reducing bias and the stability of training.

[1] Yoav Goldberg,et al. Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection , 2020, ACL.

[2] Yoav Goldberg,et al. Adversarial Removal of Demographic Attributes from Text Data , 2018, EMNLP.

[3] George Trigeorgis,et al. Domain Separation Networks , 2016, NIPS.

[4] Jieyu Zhao,et al. Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5] Alexandra Chouldechova,et al. Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting , 2019, FAT.

[6] Trevor Darrell,et al. Factorized Orthogonal Latent Spaces , 2010, AISTATS.

[7] Iyad Rahwan,et al. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm , 2017, EMNLP.

[8] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9] Dirk Hovy,et al. Tagging Performance Correlates with Author Age , 2015, ACL.

[10] Saif Mohammad,et al. Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems , 2018, *SEMEVAL.

[11] Victor S. Lempitsky,et al. Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[12] Brendan T. O'Connor,et al. Demographic Dialectal Variation in Social Media: A Case Study of African-American English , 2016, EMNLP.

[13] Timothy Baldwin,et al. Towards Robust and Privacy-preserving Text Representations , 2018, ACL.

[14] Timothy Baldwin,et al. What’s in a Domain? Learning Domain-Robust Text Representations using Adversarial Training , 2018, NAACL.

[15] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.