Latent Space Explanation by Intervention

The success of deep neural nets heavily relies on their ability to encode complex relations between their input and their output. While this property serves to fit the training data well, it also obscures the mechanism that drives prediction. This study aims to reveal hidden concepts by employing an intervention mechanism that shifts the predicted class based on discrete variational autoencoders. An explanatory model then visualizes the encoded information from any hidden layer and its corresponding intervened representation. By the assessment of differences between the original representation and the intervened representation, one can determine the concepts that can alter the class, hence providing interpretability. We demonstrate the effectiveness of our approach on CelebA, where we show various visualizations for bias in the data and suggest different interventions to reveal and change bias.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[3]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Uri Shalit,et al.  CausaLM: Causal Model Explanation Through Counterfactual Language Models , 2020, CL.

[5]  Alexander Binder,et al.  Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[6]  K. Borgwardt,et al.  Machine Learning in Medicine , 2015, Mach. Learn. under Resour. Constraints Vol. 3.

[7]  Geoffrey E. Hinton,et al.  Distilling a Neural Network Into a Soft Decision Tree , 2017, CEx@AI*IA.

[8]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[9]  Lei Zhang,et al.  PatternNet: Visual Pattern Mining with Deep Neural Network , 2018, ICMR.

[10]  Marc Aubreville,et al.  CLCNET: Deep Learning-Based Noise Reduction for Hearing aids using Complex Linear Coding , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Ciarán M Lee,et al.  Improving the accuracy of medical diagnosis with causal machine learning , 2020, Nature Communications.

[13]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[14]  Amit Dhurandhar,et al.  Generating Contrastive Explanations with Monotonic Attribute Functions , 2019, ArXiv.

[15]  Volker Tresp,et al.  Understanding Individual Decisions of CNNs via Contrastive Backpropagation , 2018, ACCV.

[16]  Junaid Qadir,et al.  Secure and Robust Machine Learning for Healthcare: A Survey , 2020, IEEE Reviews in Biomedical Engineering.

[17]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[18]  Matthew Mattina,et al.  TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids , 2020, INTERSPEECH.

[19]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[20]  Alexander Schwing,et al.  Perceptual Score: What Data Modalities Does Your Model Perceive? , 2021, NeurIPS.

[21]  Lei Xing,et al.  Machine Learning Techniques for Biomedical Image Segmentation: An Overview of Technical Aspects and Introduction to State-of-Art Applications , 2019, Medical physics.

[22]  Tom Diethe,et al.  Similarity of Neural Networks with Gradients , 2020, ArXiv.

[23]  Ting Liu,et al.  Recent advances in convolutional neural networks , 2015, Pattern Recognit..

[24]  Andrea Vedaldi,et al.  Understanding Deep Networks via Extremal Perturbations and Smooth Masks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Andrea Vedaldi,et al.  Visualizing Deep Convolutional Neural Networks Using Natural Pre-images , 2015, International Journal of Computer Vision.

[27]  Olga Russakovsky,et al.  Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Yarin Gal,et al.  Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[29]  Lior Wolf,et al.  Transformer Interpretability Beyond Attention Visualization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[31]  Lior Wolf,et al.  Video and Text Matching with Conditioned Embeddings , 2021, ArXiv.

[32]  Tamir Hazan,et al.  A Simple Baseline for Audio-Visual Scene-Aware Dialog , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[35]  Subhransu Maji,et al.  Task2Vec: Task Embedding for Meta-Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Ariel Shamir,et al.  Towards Coherent Visual Storytelling with Ordered Image Attention , 2021, ArXiv.

[37]  Gal Chechik,et al.  A causal view of compositional zero-shot recognition , 2020, NeurIPS.

[38]  Roi Reichart,et al.  Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions , 2021, ACL.

[39]  Tamir Hazan,et al.  High-Order Attention Models for Visual Question Answering , 2017, NIPS.

[40]  Daniel Cohen-Or,et al.  StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Tamir Hazan,et al.  Factor Graph Attention , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Amnon Shashua,et al.  Learning a Metric Embedding for Face Recognition using the Multibatch Method , 2016, NIPS.

[43]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[44]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[45]  Francois Fleuret,et al.  Full-Gradient Representation for Neural Network Visualization , 2019, NeurIPS.

[46]  Aaron Clauset,et al.  Machine learning improves hearing aids , 2019 .

[47]  Andrew Lavin,et al.  Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Sinan Kalkan,et al.  Investigating Bias and Fairness in Facial Expression Recognition , 2020, ECCV Workshops.

[49]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[50]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[51]  Jason Tyler Rolfe,et al.  Discrete Variational Autoencoders , 2016, ICLR.

[52]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[53]  Yash Goyal,et al.  Explaining Classifiers with Causal Concept Effect (CaCE) , 2019, ArXiv.

[54]  L. Shapley A Value for n-person Games , 1988 .

[55]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[56]  Jaesik Choi,et al.  Relative Attributing Propagation: Interpreting the Comparative Contributions of Individual Units in Deep Neural Networks , 2020, AAAI.

[57]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[58]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[59]  Yoav Goldberg,et al.  Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets , 2019, EMNLP.

[60]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[61]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[62]  Tamir Hazan,et al.  Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies , 2020, NeurIPS.