Debiasing Concept-based Explanations with Causal Analysis

Concept-based explanation approach is a popular model interpertability tool because it expresses the reasons for a model's predictions in terms of concepts that are meaningful for the domain experts. In this work, we study the problem of the concepts being correlated with confounding information in the features. We propose a new causal prior graph for modeling the impacts of unobserved variables and a method to remove the impact of confounding information and noise using the instrumental variable techniques. We also model the completeness of the concepts set and show that our debiasing method works when the concepts are not complete. Our synthetic and real-world experiments demonstrate the success of our method in removing biases and improving the ranking of the concepts in terms of their contribution to the explanation of the predictions.

[1]  Yash Goyal,et al.  Explaining Classifiers with Causal Concept Effect (CaCE) , 2019, ArXiv.

[2]  Chun-Liang Li,et al.  On Completeness-aware Concept-Based Explanations in Deep Neural Networks , 2020, NeurIPS.

[3]  Dumitru Erhan,et al.  A Benchmark for Interpretability Methods in Deep Neural Networks , 2018, NeurIPS.

[4]  Lennart Brocki,et al.  Concept Saliency Maps to Visualize Relevant Features in Deep Generative Models , 2019, 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA).

[5]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[6]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  James Zou,et al.  Towards Automatic Concept-based Explanations , 2019, NeurIPS.

[9]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[10]  Martin Wattenberg,et al.  Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making , 2019, CHI.

[11]  James H. Stock,et al.  Instrumental Variables in Statistics and Econometrics , 2001 .

[12]  Henning Müller,et al.  Regression Concept Vectors for Bidirectional Explanations in Histopathology , 2018, MLCN/DLF/iMIMIC@MICCAI.

[13]  Bernt Schiele,et al.  Interpretability Beyond Classification Output: Semantic Bottleneck Networks , 2019, ArXiv.

[14]  Raymond J. Carroll,et al.  Measurement error in nonlinear models: a modern perspective , 2006 .

[15]  Alan Fern,et al.  Interactive Naming for Explaining Deep Neural Networks: A Formative Study , 2018, IUI Workshops.

[16]  Been Kim,et al.  Concept Bottleneck Models , 2020, ICML.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Ilkay Öksüz,et al.  Global and Local Interpretability for Cardiac MRI Classification , 2019, MICCAI.

[19]  Kevin Leyton-Brown,et al.  Deep IV: A Flexible Approach for Counterfactual Prediction , 2017, ICML.