Measurable Counterfactual Local Explanations for Any Classifier

We propose a novel method for explaining the predictions of any classifier. In our approach, local explanations are expected to explain both the outcome of a prediction and how that prediction would change if 'things had been different'. Furthermore, we argue that satisfactory explanations cannot be dissociated from a notion and measure of fidelity, as advocated in the early days of neural networks' knowledge extraction. We introduce a definition of fidelity to the underlying classifier for local explanation models which is based on distances to a target decision boundary. A system called CLEAR: Counterfactual Local Explanations via Regression, is introduced and evaluated. CLEAR generates w-counterfactual explanations that state minimum changes necessary to flip a prediction's classification. CLEAR then builds local regression models, using the w-counterfactuals to measure and improve the fidelity of its regressions. By contrast, the popular LIME method, which also uses regression to generate local explanations, neither measures its own fidelity nor generates counterfactuals. CLEAR's regressions are found to have significantly higher fidelity than LIME's, averaging over 45% higher in this paper's four case studies.

[1]  Keith A. Markus,et al.  Making Things Happen: A Theory of Causal Explanation , 2007 .

[2]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[3]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Megan Kurka,et al.  Machine Learning Interpretability with H2O Driverless AI , 2019 .

[5]  Margo I. Seltzer,et al.  Learning Certifiably Optimal Rule Lists , 2017, KDD.

[6]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[7]  Son N. Tran,et al.  Deep Logic Networks: Inserting and Extracting Knowledge From Deep Belief Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Bob L. Sturm,et al.  Local Interpretable Model-Agnostic Explanations for Music Content Analysis , 2017, ISMIR.

[9]  Jie Chen,et al.  Locally Interpretable Models and Effects based on Supervised Partitioning (LIME-SUP) , 2018, ArXiv.

[10]  Joachim Diederich,et al.  Survey and critique of techniques for extracting rules from trained artificial neural networks , 1995, Knowl. Based Syst..

[11]  Franco Turini,et al.  Local Rule-Based Explanations of Black Box Decision Systems , 2018, ArXiv.

[12]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[13]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[14]  Xue Liu,et al.  An Empirical Evaluation of Rule Extraction from Recurrent Neural Networks , 2017, Neural Computation.

[15]  Henrik Jacobsson,et al.  Rule Extraction from Recurrent Neural Networks: ATaxonomy and Review , 2005, Neural Computation.

[16]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[17]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[18]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[19]  Zoubin Ghahramani,et al.  Probabilistic machine learning and artificial intelligence , 2015, Nature.

[20]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[21]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[22]  Chris Russell,et al.  Explaining Explanations in AI , 2018, FAT.

[23]  Tim Miller,et al.  Contrastive explanation: a structural-model approach , 2018, The Knowledge Engineering Review.