论文信息 - Debunking Biases in Attention

Debunking Biases in Attention

Despite the remarkable performances in various applications, machine learning (ML) models could potentially discriminate. They may result in biasness in decision-making, leading to an impact negatively on individuals and society. Recently, various methods have been developed to mitigate biasness and achieve significant performance. Attention mechanisms are a fundamental component of many state-of-the-art ML models and may potentially impact the fairness of ML models. However, how they explicitly influence fairness has yet to be thoroughly explored. In this paper, we investigate how different attention mechanisms affect the fairness of ML models, focusing on models used in Natural Language Processing (NLP) models. We evaluate the performance of fairness of several models with and without different attention mechanisms on widely used benchmark datasets. Our results indicate that the majority of attention mechanisms that have been assessed can improve the fairness performance of Bidirectional Gated Recurrent Unit (BiGRU) and Bidirectional Long Short-Term Memory (BiLSTM) in all three datasets regarding religious and gender-sensitive groups, however, with varying degrees of trade-offs in accuracy measures. Our findings highlight the possibility of fairness being affected by adopting specific attention mechanisms in machine learning models for certain datasets

Usman Naseem | Shijing Chen | Imran Razzak

[1] D. Zhu,et al. Fairness-aware Vision Transformer via Debiased Self-Attention , 2023, ArXiv.

[2] Thilo Hagendorff,et al. Speciesist bias in AI: how AI applications perpetuate discrimination and unfair outcomes against animals , 2022, AI and Ethics.

[3] Benjamin L. Edelman,et al. Inductive Biases and Variable Creation in Self-Attention Mechanisms , 2021, ICML.

[4] Fred Morstatter,et al. Attributing Fair Decisions with Attention Interventions , 2021, TRUSTNLP.

[5] Adam Lopez,et al. Intrinsic Bias Metrics Do Not Correlate with Application Bias , 2020, ACL.

[6] Seid Muhie Yimam,et al. HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection , 2020, AAAI.

[7] Imran Razzak,et al. A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter , 2020, Multimedia Tools and Applications.

[8] Imran Razzak,et al. A Comprehensive Survey on Word Representation Models: From Classical to State-of-the-Art Word Representation Language Models , 2020, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[9] Fei Wu,et al. Dice Loss for Data-imbalanced NLP Tasks , 2019, ACL.

[10] Lucy Vasserman,et al. Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification , 2019, WWW.

[11] Jun Fu,et al. Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).