Born Identity Network: Multi-way Counterfactual Map Generation to Explain a Classifier's Decision

There exists an apparent negative correlation between performance and interpretability of deep learning models. In an effort to reduce this negative correlation, we propose Born Identity Network (BIN), which is a post-hoc approach for producing multi-way counterfactual maps. A counterfactual map transforms an input sample to be classified as a target label, which is similar to how humans process knowledge through counterfactual thinking. Thus, producing a better counterfactual map may be a step towards explanation at the level of human knowledge. For example, a counterfactual map can localize hypothetical abnormalities from a normal brain image that may cause it to be diagnosed with a disease. Specifically, our proposed BIN consists of two core components: Counterfactual Map Generator and Target Attribution Network. The Counterfactual Map Generator is a variation of conditional GAN which can synthesize a counterfactual map conditioned on an arbitrary target label. The Target Attribution Network works in a complementary manner to enforce target label attribution to the synthesized map. We have validated our proposed BIN in qualitative, quantitative analysis on MNIST, 3D Shapes, and ADNI datasets, and show the comprehensibility and fidelity of our method from various ablation studies.

[1]  David Duvenaud,et al.  Explaining Image Classifiers by Counterfactual Generation , 2018, ICLR.

[2]  Amit Sharma,et al.  Counterfactual Generation and Fairness Evaluation Using Adversarially Learned Inference , 2020, ArXiv.

[3]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[4]  Ender Konukoglu,et al.  Visual Feature Attribution Using Wasserstein GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Alexander Binder,et al.  Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[7]  Nuno Vasconcelos,et al.  SCOUT: Self-Aware Discriminant Counterfactual Explanations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[9]  Lalana Kagal,et al.  Explaining Explanations: An Overview of Interpretability of Machine Learning , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[10]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Emma Robinson,et al.  ICAM: Interpretable Classification via Disentangled Representations and Feature Attribution Mapping , 2020, NeurIPS.

[12]  Konstantinos Kamnitsas,et al.  SonoNet: Real-Time Detection and Localisation of Fetal Standard Scan Planes in Freehand Ultrasound , 2016, IEEE Transactions on Medical Imaging.

[13]  Amit Dhurandhar,et al.  Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives , 2018, NeurIPS.

[14]  Dumitru Erhan,et al.  A Benchmark for Interpretability Methods in Deep Neural Networks , 2018, NeurIPS.

[15]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[16]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[17]  C. Jack,et al.  Alzheimer's Disease Neuroimaging Initiative , 2008 .

[18]  Ziyan Wu,et al.  Counterfactual Visual Explanations , 2019, ICML.

[19]  Deniz Erdogmus,et al.  Structured Adversarial Attack: Towards General Implementation and Better Interpretability , 2018, ICLR.

[20]  Yash Goyal,et al.  Explaining Classifiers with Causal Concept Effect (CaCE) , 2019, ArXiv.

[21]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[22]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[23]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[26]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[27]  Hilkka Soininen,et al.  Subgroups of Alzheimer's disease based on cerebrospinal fluid molecular markers , 2005, Annals of neurology.

[28]  Yarin Gal,et al.  Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[29]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[30]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[31]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[32]  Janis Klaise,et al.  Interpretable Counterfactual Explanations Guided by Prototypes , 2019, ECML/PKDD.

[33]  Klaus H. Maier-Hein,et al.  Automated brain extraction of multisequence MRI using artificial neural networks , 2019, Human Brain Mapping.