Generating Interpretable Counterfactual Explanations By Implicit Minimisation of Epistemic and Aleatoric Uncertainties

Counterfactual explanations (CEs) are a practical tool for demonstrating why machine learning classifiers make particular decisions. For CEs to be useful, it is important that they are easy for users to interpret. Existing methods for generating interpretable CEs rely on auxiliary generative models, which may not be suitable for complex datasets, and incur engineering overhead. We introduce a simple and fast method for generating interpretable CEs in a white-box setting without an auxiliary model, by using the predictive uncertainty of the classifier. Our experiments show that our proposed algorithm generates more interpretable CEs, according to IM1 scores (Van Looveren and Klaise, 2019), than existing methods. Additionally, our approach allows us to estimate the uncertainty of a CE, which may be important in safety-critical applications, such as those in the medical domain.

[1]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[2]  Somesh Jha,et al.  Concise Explanations of Neural Networks using Adversarial Training , 2018, ICML.

[3]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[6]  Francesca Lagioia,et al.  The impact of the General Data Protection Regulation (GDPR) on artificial intelligence , 2020 .

[7]  Aki Vehtari,et al.  A decision-theoretic approach for model interpretability in Bayesian framework , 2019, Machine Learning.

[8]  Joost R. van Amersfoort,et al.  Simple and Scalable Epistemic Uncertainty Estimation Using a Single Deep Deterministic Neural Network , 2020, ICML 2020.

[9]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[10]  Marie-Jeanne Lesot,et al.  The Dangers of Post-hoc Interpretability: Unjustified Counterfactual Explanations , 2019, IJCAI.

[11]  Dustin Tran,et al.  Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness , 2020, NeurIPS.

[12]  Bart Baesens,et al.  An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models , 2011, Decis. Support Syst..

[13]  Oluwasanmi Koyejo,et al.  Towards Realistic Individual Recourse and Actionable Explanations in Black-Box Decision Making Systems , 2019, ArXiv.

[14]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[15]  Mihaela van der Schaar,et al.  What is Interpretable? Using Machine Learning to Design Interpretable Decision-Support Systems , 2018, ArXiv.

[16]  Brandon M. Greenwell,et al.  Interpretable Machine Learning , 2019, Hands-On Machine Learning with R.

[17]  Yang Liu,et al.  Actionable Recourse in Linear Classification , 2018, FAT.

[18]  Martín Abadi,et al.  Adversarial Patch , 2017, ArXiv.

[19]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[20]  Yarin Gal,et al.  Sufficient Conditions for Idealised Models to Have No Adversarial Examples: a Theoretical and Empirical Study with Bayesian Neural Networks , 2018, 1806.00667.

[21]  Lujo Bauer,et al.  Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition , 2016, CCS.

[22]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[23]  Mohammad Norouzi,et al.  Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One , 2019, ICLR.

[24]  John P. Dickerson,et al.  Counterfactual Explanations for Machine Learning: A Review , 2020, ArXiv.

[25]  Yarin Gal,et al.  Understanding Measures of Uncertainty for Adversarial Example Detection , 2018, UAI.

[26]  Amit Dhurandhar,et al.  Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives , 2018, NeurIPS.

[27]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[28]  Janis Klaise,et al.  Interpretable Counterfactual Explanations Guided by Prototypes , 2019, ECML/PKDD.

[29]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[30]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.