Consistent Counterfactuals for Deep Models

Counterfactual examples are one of the most commonly-cited methods for explaining the predictions of machine learning models in key areas such as finance and medical diagnosis. Counterfactuals are often discussed under the assumption that the model on which they will be used is static, but in deployment models may be periodically retrained or fine-tuned. This paper studies the consistency of model prediction on counterfactual examples in deep networks under small changes to initial training conditions, such as weight initialization and leave-one-out variations in data, as often occurs during model deployment. We demonstrate experimentally that counterfactual examples for deep models are often inconsistent across such small changes, and that increasing the cost of the counterfactual, a stability-enhancing mitigation suggested by prior work in the context of simpler models, is not a reliable heuristic in deep networks. Rather, our analysis shows that a model’s Lipschitz continuity around the counterfactual, along with confidence of its prediction, is key to its consistency across related models. To this end, we propose Stable Neighbor Search as a way to generate more consistent counterfactual explanations, and illustrate the effectiveness of this approach on several benchmark datasets.

[1]  Timo Freiesleben,et al.  Counterfactual Explanations & Adversarial Examples - Common Grounds, Essential Differences, and Potential Transfers , 2020, ArXiv.

[2]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[3]  Sidong Liu,et al.  Early diagnosis of Alzheimer's disease with deep learning , 2014, 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI).

[4]  Klaus-Robert Müller,et al.  Explanations can be manipulated and geometry is to blame , 2019, NeurIPS.

[5]  Marie-Jeanne Lesot,et al.  Issues with post-hoc counterfactual explanations: a discussion , 2019, ArXiv.

[6]  Matt Fredrikson,et al.  Leave-one-out Unfairness , 2021, FAccT.

[7]  Corina S. Pasareanu,et al.  Fast Geometric Projections for Local Robustness Certification , 2020, ICLR.

[8]  Wenqing Sun,et al.  Computer aided lung cancer diagnosis with deep learning algorithms , 2016, SPIE Medical Imaging.

[9]  Oluwasanmi Koyejo,et al.  Towards Realistic Individual Recourse and Actionable Explanations in Black-Box Decision Making Systems , 2019, ArXiv.

[10]  John P. Dickerson,et al.  Counterfactual Explanations for Machine Learning: A Review , 2020, ArXiv.

[11]  Himabindu Lakkaraju,et al.  Can I Still Trust You?: Understanding the Impact of Distribution Shifts on Algorithmic Recourses , 2020, ArXiv.

[12]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[13]  Bernhard Schölkopf,et al.  Algorithmic recourse under imperfect causal knowledge: a probabilistic approach , 2020, NeurIPS.

[14]  Amit Sharma,et al.  Explaining machine learning classifiers through diverse counterfactual explanations , 2020, FAT*.

[15]  Freddy Lécué,et al.  Interpretable Credit Application Predictions With Counterfactual Explanations , 2018, NIPS 2018.

[16]  Geraint Rees,et al.  Clinically applicable deep learning for diagnosis and referral in retinal disease , 2018, Nature Medicine.

[17]  Matthias Hein,et al.  Provable Robustness of ReLU networks via Maximization of Linear Regions , 2018, AISTATS.

[18]  Barry Smyth,et al.  Good Counterfactuals and Where to Find Them: A Case-Based Technique for Generating Counterfactuals for Explainable AI (XAI) , 2020, ICCBR.

[19]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[20]  David Rolnick,et al.  Deep ReLU Networks Have Surprisingly Few Activation Patterns , 2019, NeurIPS.

[21]  Bertrand K. Hassani,et al.  Credit Risk Analysis Using Machine and Deep Learning Models , 2018 .

[22]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[23]  Amit Sharma,et al.  Preserving Causal Constraints in Counterfactual Explanations for Machine Learning Classifiers , 2019, ArXiv.

[24]  Bernd Bischl,et al.  Multi-Objective Counterfactual Explanations , 2020, PPSN.

[25]  Peter A. Flach,et al.  FACE: Feasible and Actionable Counterfactual Explanations , 2020, AIES.

[26]  Alexander D'Amour,et al.  Underspecification Presents Challenges for Credibility in Modern Machine Learning , 2020, J. Mach. Learn. Res..

[27]  M. Kaminski The Right to Explanation, Explained , 2018 .

[28]  Taesup Moon,et al.  Fooling Neural Network Interpretations via Adversarial Model Manipulation , 2019, NeurIPS.

[29]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[30]  Yibo Wang,et al.  Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud , 2018, Decis. Support Syst..

[31]  Franco Turini,et al.  Local Rule-Based Explanations of Black Box Decision Systems , 2018, ArXiv.

[32]  Gjergji Kasneci,et al.  On Counterfactual Explanations under Predictive Multiplicity , 2020, UAI.

[33]  Janis Klaise,et al.  Interpretable Counterfactual Explanations Guided by Prototypes , 2019, ECML/PKDD.

[34]  Klaus Broelemann,et al.  Learning Model-Agnostic Counterfactual Explanations for Tabular Data , 2019, WWW.

[35]  J. Zilinskas,et al.  Analysis of different norms and corresponding Lipschitz constants for global optimization in multidi , 2007 .

[36]  Joydeep Ghosh,et al.  CERTIFAI: Counterfactual Explanations for Robustness, Transparency, Interpretability, and Fairness of Artificial Intelligence models , 2019, ArXiv.

[37]  Solon Barocas,et al.  The hidden assumptions behind counterfactual explanations and principal reasons , 2019, FAT*.

[38]  William Nick Street,et al.  Generalized Inverse Classification , 2016, SDM.

[39]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[40]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[41]  Matt Fredrikson,et al.  Influence-Directed Explanations for Deep Convolutional Networks , 2018, 2018 IEEE International Test Conference (ITC).

[42]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[43]  Peter A. Flach,et al.  Counterfactual Explanations of Machine Learning Predictions: Opportunities and Challenges for AI Safety , 2019, SafeAI@AAAI.

[44]  Yang Liu,et al.  Actionable Recourse in Linear Classification , 2018, FAT.

[45]  Ben Swift,et al.  Semantics and explanation: why counterfactual explanations produce adversarial examples in deep neural networks , 2020, ArXiv.

[46]  Marie-Jeanne Lesot,et al.  Comparison-Based Inverse Classification for Interpretability in Machine Learning , 2018, IPMU.

[47]  R. Altman,et al.  Estimation of the warfarin dose with clinical and pharmacogenetic data. , 2009, The New England journal of medicine.

[48]  Alexander Tuzhilin,et al.  E.T.-RNN: Applying Deep Learning to Credit Loan Applications , 2019, KDD.

[49]  Alexandros G. Dimakis,et al.  Provable Certificates for Adversarial Examples: Fitting a Ball in the Union of Polytopes , 2019, NeurIPS.

[50]  Abubakar Abid,et al.  Interpretation of Neural Networks is Fragile , 2017, AAAI.

[51]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.