CoCoX: Generating Conceptual and Counterfactual Explanations via Fault-Lines

We present CoCoX (short for Conceptual and Counterfactual Explanations), a model for explaining decisions made by a deep convolutional neural network (CNN). In Cognitive Psychology, the factors (or semantic-level features) that humans zoom in on when they imagine an alternative to a model prediction are often referred to as fault-lines. Motivated by this, our CoCoX model explains decisions made by a CNN using fault-lines. Specifically, given an input image I for which a CNN classification model M predicts class cpred, our fault-line based explanation identifies the minimal semantic-level features (e.g., stripes on zebra, pointed ears of dog), referred to as explainable concepts, that need to be added to or deleted from I in order to alter the classification category of I by M to another specified class calt. We argue that, due to the conceptual and counterfactual nature of fault-lines, our CoCoX explanations are practical and more natural for both expert and non-expert users to understand the internal workings of complex deep learning models. Extensive quantitative and qualitative experiments verify our hypotheses, showing that CoCoX significantly outperforms the state-of-the-art explainable AI models. Our implementation is available at https://github.com/arjunakula/CoCoX

[1]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[2]  Song-Chun Zhu,et al.  Natural Language Interaction with Explainable AI Models , 2019, CVPR Workshops.

[3]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[4]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[5]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Trevor Darrell,et al.  Generating Visual Explanations , 2016, ECCV.

[7]  Alexander M. Rush,et al.  Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks , 2016, ArXiv.

[8]  Cynthia Rudin,et al.  The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification , 2014, NIPS.

[9]  Trevor Darrell,et al.  Generating Counterfactual Explanations with Natural Language , 2018, ICML 2018.

[10]  Song-Chun Zhu,et al.  X-ToM: Explaining with Theory-of-Mind for Gaining Justified Human Trust , 2019, ArXiv.

[11]  Been Kim,et al.  Automating Interpretability: Discovering and Testing Visual Concepts Learned by Neural Networks , 2019, ArXiv.

[12]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[13]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[14]  Robert R. Hoffman,et al.  A Taxonomy of Emergent Trusting in the Human–Machine Relationship , 2017 .

[15]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[16]  Quanshi Zhang,et al.  Interpretable Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  A. Tversky,et al.  The simulation heuristic , 1982 .

[18]  Amit Dhurandhar,et al.  Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives , 2018, NeurIPS.

[19]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[22]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[23]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[24]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[25]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[26]  Zhiyong Lu,et al.  Explaining Naive Bayes Classifications , 2003 .

[27]  Song-Chun Zhu,et al.  Explainable AI as Collaborative Task Solving , 2019, CVPR Workshops.

[28]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[29]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[30]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[31]  Gary Klein,et al.  Metrics for Explainable AI: Challenges and Prospects , 2018, ArXiv.

[32]  Ziyan Wu,et al.  Counterfactual Visual Explanations , 2019, ICML.