NeuronFair: Interpretable White-Box Fairness Testing through Biased Neuron Identification

Deep neural networks (DNNs) have demonstrated their outperformance in various domains. However, it raises a social concern whether DNNs can produce reliable and fair decisions especially when they are applied to sensitive domains involving valuable resource allocation, such as education, loan, and employment. It is crucial to conduct fairness testing before DNNs are reliably deployed to such sensitive domains, i.e., generating as many instances as possible to uncover fairness violations. However, the existing testing methods are still limited from three aspects: interpretability, performance, and generalizability. To overcome the challenges, we propose NeuronFair, a new DNN fairness testing framework that differs from previous work in several key aspects: (1) interpretable it quantitatively interprets DNNs’ fairness violations for the biased decision; (2) effective it uses the interpretation results to guide the generation of more diverse instances in less time; (3) generic it can handle both structured and unstructured data. Extensive evaluations across 7 datasets and the corresponding DNNs demonstrate NeuronFair’s superior performance. For instance, on structured datasets, it generates much more instances (∼×5.84) and saves more time (with an average speedup of 534.56%) compared with the state-of-the-art methods. Besides, the instances of NeuronFair can also be leveraged to improve the fairness of the biased DNNs, which helps build more fair and trustworthy deep learning systems. The code of NeuronFair is open-sourced at https://github.com/haibinzheng/NeuronFair .

[1]  Lalana Kagal,et al.  Iterative Orthogonal Feature Projection for Diagnosing Bias in Black-Box Models , 2016, ArXiv.

[2]  Yuming Zhou,et al.  Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[3]  Nitesh V. Chawla,et al.  DeepCrime: Attentive Hierarchical Recurrent Networks for Crime Prediction , 2018, CIKM.

[4]  Han Hu,et al.  Robustness of on-Device Models: Adversarial Attack to Deep Learning Models on Android Apps , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[5]  Julia Rubin,et al.  Fairness Definitions Explained , 2018, 2018 IEEE/ACM International Workshop on Software Fairness (FairWare).

[6]  Gordon Fraser,et al.  Simulating Student Mistakes to Evaluate the Fairness of Automated Grading , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET).

[7]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[8]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[9]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[10]  Xiangyu Zhang,et al.  ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation , 2019, CCS.

[11]  Ling Ma,et al.  Deep learning models for bankruptcy prediction using textual disclosures , 2019, Eur. J. Oper. Res..

[12]  Eduardo Valle,et al.  Exploring the space of adversarial images , 2015, 2016 International Joint Conference on Neural Networks (IJCNN).

[13]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[14]  Yuriy Brun,et al.  Fairness testing: testing software for discrimination , 2017, ESEC/SIGSOFT FSE.

[15]  Olga Russakovsky,et al.  Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Mark Harman,et al.  "Ignorance and Prejudice" in Software Fairness , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[17]  Yuriy Brun,et al.  Software fairness , 2018, ESEC/SIGSOFT FSE.

[18]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19]  Xia Hu,et al.  Fairness in Deep Learning: A Computational Perspective , 2019, IEEE Intelligent Systems.

[20]  Gordana Dodig-Crnkovic,et al.  Avoiding the Intrinsic Unfairness of the Trolley Problem , 2018, 2018 IEEE/ACM International Workshop on Software Fairness (FairWare).

[21]  Anil K. Jain,et al.  Face Recognition Performance: Role of Demographic Information , 2012, IEEE Transactions on Information Forensics and Security.

[22]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[23]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[24]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[25]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[26]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Hridesh Rajan,et al.  Do the machine learning models on a crowd sourced platform exhibit bias? an empirical study on model fairness , 2020, ESEC/SIGSOFT FSE.

[28]  Bashar Nuseibeh,et al.  On Adaptive Fairness in Software Systems , 2021, 2021 International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS).

[29]  Diptikalyan Saha,et al.  Black box fairness testing of machine learning models , 2019, ESEC/SIGSOFT FSE.

[30]  Hajimu Iida,et al.  "Was My Contribution Fairly Reviewed?" A Framework to Study the Perception of Fairness in Modern Code Reviews , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[31]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Sanjay Singh,et al.  Semi-supervised deep learning based named entity recognition model to parse education section of resumes , 2020, Neural Computing and Applications.

[34]  Hridesh Rajan,et al.  Fair preprocessing: towards understanding compositional fairness of data transformers in machine learning pipeline , 2021, ESEC/SIGSOFT FSE.

[35]  Yueling Zhang,et al.  Efficient white-box fairness testing through gradient search , 2021, ISSTA.

[36]  Yunxin Liu,et al.  DeepPayload: Black-box Backdoor Attack on Deep Learning Models through Neural Payload Injection , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[37]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[38]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[39]  Jinyin Chen,et al.  MAG-GAN: Massive attack generator via GAN , 2020, Inf. Sci..

[40]  Hayden Melton,et al.  On Fairness in Continuous Electronic Markets , 2018, 2018 IEEE/ACM International Workshop on Software Fairness (FairWare).

[41]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[43]  Muhammad Haris,et al.  Application of deep learning for retinal image analysis: A review , 2020, Comput. Sci. Rev..

[44]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[45]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[46]  Yi Liu,et al.  Software Visualization and Deep Transfer Learning for Effective Software Defect Prediction , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[47]  Rachel K. E. Bellamy,et al.  AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias , 2018, ArXiv.

[48]  Diptikalyan Saha,et al.  Automated Test Generation to Detect Individual Discrimination in AI Models , 2018, ArXiv.

[49]  Jin Song Dong,et al.  White-box Fairness Testing through Adversarial Sampling , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[50]  Rishabh Singh,et al.  Deep Learning & Software Engineering: State of Research and Future Directions , 2020, ArXiv.

[51]  Steffen Staab,et al.  Model-Based Discrimination Analysis: A Position Paper , 2018, 2018 IEEE/ACM International Workshop on Software Fairness (FairWare).

[52]  Matt Fredrikson,et al.  FlipTest: fairness testing via optimal transport , 2019, FAT*.

[53]  Alberto Martin-Lopez,et al.  Deep Learning-Based Prediction of Test Input Validity for RESTful APIs , 2021, 2021 IEEE/ACM Third International Workshop on Deep Learning for Testing and Testing for Deep Learning (DeepTest).

[54]  Mahshid Helali Moghadam,et al.  Automated Performance Testing Based on Active Deep Learning , 2021, 2021 IEEE/ACM International Conference on Automation of Software Test (AST).

[55]  Roxana Geambasu,et al.  FairTest: Discovering Unwarranted Associations in Data-Driven Applications , 2015, 2017 IEEE European Symposium on Security and Privacy (EuroS&P).

[56]  Daniela Rus,et al.  Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure , 2019, AIES.

[57]  Sudipta Chattopadhyay,et al.  Automated Directed Fairness Testing , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).