Counterfactual Generation and Fairness Evaluation Using Adversarially Learned Inference

Recent studies have reported biases in machine learning image classifiers, especially against particular demographic groups. Counterfactual examples for an input---perturbations that change specific features but not others---have been shown to be useful for evaluating explainability and fairness of machine learning models. However, generating counterfactual examples for images is non-trivial due to the underlying causal structure governing the various features of an image. To be meaningful, generated perturbations need to satisfy constraints implied by the causal model. We present a method for generating counterfactuals by incorporating a known causal graph structure in a conditional variant of Adversarially Learned Inference (ALI). The proposed approach learns causal relationships between the specified attributes of an image and generates counterfactuals in accordance with these relationships. On Morpho-MNIST and CelebA datasets, the method generates counterfactuals that can change specified attributes and their causal descendants while keeping other attributes constant. As an application, we apply the generated counterfactuals from CelebA images to evaluate fairness biases in a classifier that predicts attractiveness of a face.

[1]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[2]  Andrew Slavin Ross,et al.  Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations , 2017, IJCAI.

[3]  Kimmo Kärkkäinen,et al.  Gender Slopes: Counterfactual Fairness for Computer Vision Models by Attribute Manipulation , 2020, Proceedings of the 2nd International Workshop on Fairness, Accountability, Transparency and Ethics in Multimedia.

[4]  Emily Denton,et al.  Detecting Bias with Generative Counterfactual Face Attribute Augmentation , 2019, ArXiv.

[5]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[6]  Amit Sharma,et al.  Preserving Causal Constraints in Counterfactual Explanations for Machine Learning Classifiers , 2019, ArXiv.

[7]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[8]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[10]  Daniel C. Castro,et al.  Morpho-MNIST: Quantitative Assessment and Diagnostics for Representation Learning , 2018, J. Mach. Learn. Res..

[11]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[12]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[13]  Amit Sharma,et al.  Explaining machine learning classifiers through diverse counterfactual explanations , 2020, FAT*.

[14]  Daniel McDuff,et al.  Characterizing Bias in Classifiers using Generative Models , 2019, NeurIPS.

[15]  Judea Pearl,et al.  The seven tools of causal inference, with reflections on machine learning , 2019, Commun. ACM.

[16]  Alexandros G. Dimakis,et al.  CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training , 2017, ICLR.

[17]  Judea Pearl,et al.  Bayesian Networks , 1998, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[18]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Yong Han,et al.  Generative Counterfactual Introspection for Explainable Deep Learning , 2019, 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[20]  Ole Winther,et al.  Auxiliary Deep Generative Models , 2016, ICML.

[21]  Amit Dhurandhar,et al.  Generating Contrastive Explanations with Monotonic Attribute Functions , 2019, ArXiv.

[22]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[23]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[24]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[26]  Jieyu Zhao,et al.  Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints , 2017, EMNLP.

[27]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Guillaume Lample,et al.  Fader Networks: Manipulating Images by Sliding Attributes , 2017, NIPS.

[30]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[31]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[32]  Trevor Darrell,et al.  Women also Snowboard: Overcoming Bias in Captioning Models , 2018, ECCV.

[33]  Aaron C. Courville,et al.  Adversarially Learned Inference , 2016, ICLR.

[34]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Daniel C. Castro,et al.  Deep Structural Causal Models for Tractable Counterfactual Inference , 2020, NeurIPS.

[36]  Jianguo Ding,et al.  Probabilistic Inferences in Bayesian Networks , 2010, ArXiv.