论文信息 - CoCoX: Generating Conceptual and Counterfactual Explanations via Fault-Lines - 字舞流文

CoCoX: Generating Conceptual and Counterfactual Explanations via Fault-Lines

We present CoCoX (short for Conceptual and Counterfactual Explanations), a model for explaining decisions made by a deep convolutional neural network (CNN). In Cognitive Psychology, the factors (or semantic-level features) that humans zoom in on when they imagine an alternative to a model prediction are often referred to as fault-lines. Motivated by this, our CoCoX model explains decisions made by a CNN using fault-lines. Specifically, given an input image I for which a CNN classification model M predicts class cpred, our fault-line based explanation identifies the minimal semantic-level features (e.g., stripes on zebra, pointed ears of dog), referred to as explainable concepts, that need to be added to or deleted from I in order to alter the classification category of I by M to another specified class calt. We argue that, due to the conceptual and counterfactual nature of fault-lines, our CoCoX explanations are practical and more natural for both expert and non-expert users to understand the internal workings of complex deep learning models. Extensive quantitative and qualitative experiments verify our hypotheses, showing that CoCoX significantly outperforms the state-of-the-art explainable AI models. Our implementation is available at https://github.com/arjunakula/CoCoX

Song-Chun Zhu | Shuai Wang | Arjun R. Akula | Song-Chun Zhu | Shuai Wang | Arjun Reddy Akula

[1] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[2] Song-Chun Zhu,et al. Natural Language Interaction with Explainable AI Models , 2019, CVPR Workshops.

[3] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[4] Martin Wattenberg,et al. SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[5] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Trevor Darrell,et al. Generating Visual Explanations , 2016, ECCV.

[7] Alexander M. Rush,et al. Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks , 2016, ArXiv.

[8] Cynthia Rudin,et al. The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification , 2014, NIPS.

[9] Trevor Darrell,et al. Generating Counterfactual Explanations with Natural Language , 2018, ICML 2018.

[10] Song-Chun Zhu,et al. X-ToM: Explaining with Theory-of-Mind for Gaining Justified Human Trust , 2019, ArXiv.

[11] Been Kim,et al. Automating Interpretability: Discovering and Testing Visual Concepts Learned by Neural Networks , 2019, ArXiv.

[12] Alexander Binder,et al. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[13] Martin Wattenberg,et al. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[14] Robert R. Hoffman,et al. A Taxonomy of Emergent Trusting in the Human–Machine Relationship , 2017 .

[15] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[16] Quanshi Zhang,et al. Interpretable Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17] A. Tversky,et al. The simulation heuristic , 1982 .

[18] Amit Dhurandhar,et al. Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives , 2018, NeurIPS.

[19] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[20] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.

[22] Byron C. Wallace,et al. Attention is not Explanation , 2019, NAACL.

[23] Tim Miller,et al. Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[24] Pascal Vincent,et al. Visualizing Higher-Layer Features of a Deep Network , 2009 .

[25] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[26] Zhiyong Lu,et al. Explaining Naive Bayes Classifications , 2003 .

[27] Song-Chun Zhu,et al. Explainable AI as Collaborative Task Solving , 2019, CVPR Workshops.

[28] Fei-Fei Li,et al. Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[29] Chris Russell,et al. Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[30] Zachary Chase Lipton. The mythos of model interpretability , 2016, ACM Queue.

[31] Gary Klein,et al. Metrics for Explainable AI: Challenges and Prospects , 2018, ArXiv.

[32] Ziyan Wu,et al. Counterfactual Visual Explanations , 2019, ICML.