Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure

Recent research has highlighted the vulnerabilities of modern machine learning based systems to bias, especially towards segments of society that are under-represented in training data. In this work, we develop a novel, tunable algorithm for mitigating the hidden, and potentially unknown, biases within training data. Our algorithm fuses the original learning task with a variational autoencoder to learn the latent structure within the dataset and then adaptively uses the learned latent distributions to re-weight the importance of certain data points while training. While our method is generalizable across various data modalities and learning tasks, in this work we use our algorithm to address the issue of racial and gender bias in facial detection systems. We evaluate our algorithm on the Pilot Parliaments Benchmark (PPB), a dataset specifically designed to evaluate biases in computer vision systems, and demonstrate increased overall performance as well as decreased categorical bias with our debiasing approach.

[1]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  Stefanos Zafeiriou,et al.  A survey on face detection in the wild: Past, present and future , 2015, Comput. Vis. Image Underst..

[4]  Anil K. Jain,et al.  Face Recognition Performance: Role of Demographic Information , 2012, IEEE Transactions on Information Forensics and Security.

[5]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[6]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[7]  Jacek M. Zurada,et al.  Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance , 2008, Neural Networks.

[8]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[9]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[10]  Luc Van Gool,et al.  Face Detection without Bells and Whistles , 2014, ECCV.

[11]  Bernard Ghanem,et al.  Multi-scale Fully Convolutional Network for Face Detection in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12]  Kush R. Varshney,et al.  Optimized Pre-Processing for Discrimination Prevention , 2017, NIPS.

[13]  R. Berk,et al.  Forecasting Domestic Violence: A Machine Learning Approach to Help Inform Arraignment Decisions , 2016 .

[14]  R. Courtland Bias detectives: the researchers striving to make algorithms fair , 2018, Nature.

[15]  Ajinkya More,et al.  Survey of resampling techniques for improving classification performance in unbalanced datasets , 2016, ArXiv.

[16]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[17]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[18]  A. Lo,et al.  Consumer Credit Risk Models Via Machine-Learning Algorithms , 2010 .

[19]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[20]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[21]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[22]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[23]  Bernhard Schölkopf,et al.  Avoiding Discrimination through Causal Reasoning , 2017, NIPS.

[24]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[25]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[26]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[27]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[29]  T. Fitzpatrick The validity and practicality of sun-reactive skin types I through VI. , 1988, Archives of dermatology.

[30]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Kush R. Varshney,et al.  Fairness GAN , 2018, IBM J. Res. Dev..

[32]  Abdesselam Bouzerdoum,et al.  A supervised learning approach for imbalanced data sets , 2008, 2008 19th International Conference on Pattern Recognition.

[34]  Yi Lu,et al.  Robust neural learning from unbalanced data samples , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[35]  Chuah Chai Wen,et al.  Face recognition for criminal identification: An implementation of principal component analysis for face recognition , 2017 .

[36]  Bhaskar Mitra,et al.  Improving Document Ranking with Dual Word Embeddings , 2016, WWW.

[37]  Guy Rosman,et al.  Variational Autoencoder for End-to-End Control of Autonomous Driving with Novelty Detection and Training De-biasing , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[38]  Guy Rosman,et al.  Variational End-to-End Navigation and Localization , 2018, 2019 International Conference on Robotics and Automation (ICRA).