Detecting Bias with Generative Counterfactual Face Attribute Augmentation

We introduce a simple framework for identifying biases of a smiling attribute classifier. Our method poses counterfactual questions of the form: how would the prediction change if this face characteristic had been different? We leverage recent advances in generative adversarial networks to build a realistic generative model of face images that affords controlled manipulation of specific image characteristics. We introduce a set of metrics that measure the effect of manipulating a specific property of an image on the output of a trained classifier. Empirically, we identify several different factors of variation that affect the predictions of a smiling classifier trained on CelebA.

[1]  Yarin Gal,et al.  Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[2]  Lalana Kagal,et al.  Explaining Explanations: An Overview of Interpretability of Machine Learning , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[3]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[4]  Ankur Taly,et al.  Counterfactual Fairness in Text Classification through Robustness , 2018, AIES.

[5]  Issa Kohler-Hausmann Eddie Murphy and the Dangers of Counterfactual Causal Thinking About Detecting Racial Discrimination , 2019 .

[6]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[7]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Jieyu Zhao,et al.  Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints , 2017, EMNLP.

[9]  David Duvenaud,et al.  Explaining Image Classifiers by Counterfactual Generation , 2018, ICLR.

[10]  Judy Hoffman,et al.  Predictive Inequity in Object Detection , 2019, ArXiv.

[11]  Inioluwa Deborah Raji,et al.  Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products , 2019, AIES.

[12]  T. Fitzpatrick The validity and practicality of sun-reactive skin types I through VI. , 1988, Archives of dermatology.

[13]  Inioluwa Deborah Raji,et al.  Model Cards for Model Reporting , 2018, FAT.

[14]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[16]  Trevor Darrell,et al.  Women also Snowboard: Overcoming Bias in Captioning Models , 2018, ECCV.

[17]  Os Keyes,et al.  The Misgendering Machines , 2018, Proc. ACM Hum. Comput. Interact..

[18]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[19]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[20]  Bernhard Schölkopf,et al.  Avoiding Discrimination through Causal Reasoning , 2017, NIPS.

[21]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[22]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[23]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.