Towards Unifying Feature Attribution and Counterfactual Explanations: Different Means to the Same End

Feature attributions and counterfactual explanations are popular approaches to explain a ML model. The former assigns an importance score to each input feature, while the latter provides input examples with minimal changes to alter the model's predictions. To unify these approaches, we provide an interpretation based on the actual causality framework and present two key results in terms of their use. First, we present a method to generate feature attribution explanations from a set of counterfactual examples. These feature attributions convey how important a feature is to changing the classification outcome of a model, especially on whether a subset of features is necessary and/or sufficient for that change, which attribution-based methods are unable to provide. Second, we show how counterfactual examples can be used to evaluate the goodness of an attribution-based explanation in terms of its necessity and sufficiency. As a result, we highlight the complimentary of these two approaches. Our evaluation on three benchmark datasets --- Adult-Income, LendingClub, and German-Credit --- confirms the complimentary. Feature attribution methods like LIME and SHAP and counterfactual explanation methods like Wachter et al. and DiCE often do not agree on feature importance rankings. In addition, by restricting the features that can be modified for generating counterfactual examples, we find that the top-k features from LIME or SHAP are often neither necessary nor sufficient explanations of a model's prediction. Finally, we present a case study of different explanation methods on a real-world hospital triage problem.

[1]  Suresh Venkatasubramanian,et al.  Problems with Shapley-value-based explanations as feature importance measures , 2020, ICML.

[2]  Amit Dhurandhar,et al.  One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques , 2019, ArXiv.

[3]  Joseph Y. Halpern,et al.  Actual Causality , 2016, A Logical Theory of Causality.

[4]  Gayane Yenokyan,et al.  An Electronic Emergency Triage System to Improve Patient Distribution by Critical Outcomes. , 2016, The Journal of emergency medicine.

[5]  Yang Liu,et al.  Actionable Recourse in Linear Classification , 2018, FAT.

[6]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[7]  J. Woodward Sensitive and Insensitive Causation , 2006 .

[8]  Kjersti Aas,et al.  Explaining individual predictions when features are dependent: More accurate approximations to Shapley values , 2019, Artif. Intell..

[9]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[10]  Chenhao Tan,et al.  Many Faces of Feature Importance: Comparing Built-in and Post-hoc Feature Importance in Text Classification , 2019, EMNLP/IJCNLP.

[11]  R. Eisenstein,et al.  Decreasing length of stay in the emergency department with a split emergency severity index 3 patient flow model. , 2013, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[12]  Amit Sharma,et al.  Explaining machine learning classifiers through diverse counterfactual explanations , 2020, FAT*.

[13]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[14]  Jette Henderson,et al.  CERTIFAI: A Common Framework to Provide Explanations and Analyse the Fairness and Robustness of Black-box Models , 2020, AIES.

[15]  Tommi S. Jaakkola,et al.  On the Robustness of Interpretability Methods , 2018, ArXiv.

[16]  M. Woodward,et al.  Risk prediction models: I. Development, internal validation, and assessing the incremental value of a new (bio)marker , 2012, Heart.

[17]  Steven Horng,et al.  Creating an automated trigger for sepsis clinical decision support at emergency department triage using machine learning , 2017, PloS one.

[18]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[19]  Johannes Gehrke,et al.  Intelligible models for classification and regression , 2012, KDD.

[20]  Vivian Lai,et al.  On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection , 2018, FAT.

[21]  Sameer Singh,et al.  Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods , 2020, AIES.

[22]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[23]  Franco Turini,et al.  Local Rule-Based Explanations of Black Box Decision Systems , 2018, ArXiv.

[24]  Q. Liao,et al.  Questioning the AI: Informing Design Practices for Explainable AI User Experiences , 2020, CHI.

[25]  Uli K. Chettipally,et al.  Prediction of Sepsis in the Intensive Care Unit With Minimal Electronic Health Record Data: A Machine Learning Approach , 2016, JMIR medical informatics.

[26]  Amit Dhurandhar,et al.  Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives , 2018, NeurIPS.

[27]  Mukund Sundararajan,et al.  The many Shapley values for model explanation , 2019, ICML.

[28]  Peter A. Flach,et al.  Explainability fact sheets: a framework for systematic assessment of explainable approaches , 2019, FAT*.

[29]  Chenhao Tan,et al.  Evaluating and Characterizing Human Rationales , 2020, EMNLP.

[30]  Woo Suk Hong,et al.  Predicting hospital admission at emergency department triage using machine learning , 2018, PloS one.

[31]  Joydeep Ghosh,et al.  CERTIFAI: Counterfactual Explanations for Robustness, Transparency, Interpretability, and Fairness of Artificial Intelligence models , 2019, ArXiv.

[32]  Tommi S. Jaakkola,et al.  Rethinking Cooperative Rationalization: Introspective Extraction and Complement Control , 2019, EMNLP.

[33]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[34]  John P. Dickerson,et al.  Counterfactual Explanations for Machine Learning: A Review , 2020, ArXiv.

[35]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[36]  Peter A. Flach,et al.  FACE: Feasible and Actionable Counterfactual Explanations , 2020, AIES.

[37]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[38]  H. Krumholz,et al.  Discovery of temporal and disease association patterns in condition-specific hospital utilization rates , 2017, PloS one.

[39]  Chris Russell,et al.  Efficient Search for Diverse Coherent Explanations , 2019, FAT.

[40]  Amir-Hossein Karimi,et al.  Model-Agnostic Counterfactual Explanations for Consequential Decisions , 2019, AISTATS.

[41]  R. Caruana,et al.  Detecting Bias in Black-Box Models Using Transparent Model Distillation. , 2017 .

[42]  Scott Levin,et al.  Machine‐Learning‐Based Electronic Triage More Accurately Differentiates Patients With Respect to Clinical Outcomes Compared With the Emergency Severity Index , 2017, Annals of emergency medicine.

[43]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[44]  J. Pearl,et al.  Causal Inference in Statistics: A Primer , 2016 .

[45]  Chih-Kuan Yeh,et al.  On the (In)fidelity and Sensitivity for Explanations. , 2019, 1901.09392.

[46]  Himabindu Lakkaraju,et al.  Can I Still Trust You?: Understanding the Impact of Distribution Shifts on Algorithmic Recourses , 2020, ArXiv.

[47]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[48]  Solon Barocas,et al.  The hidden assumptions behind counterfactual explanations and principal reasons , 2019, FAT*.

[49]  Bernhard Schölkopf,et al.  Algorithmic Recourse: from Counterfactual Explanations to Interventions , 2020, FAccT.

[50]  Byron C. Wallace,et al.  ERASER: A Benchmark to Evaluate Rationalized NLP Models , 2020, ACL.

[51]  Jure Leskovec,et al.  Interpretable Decision Sets: A Joint Framework for Description and Prediction , 2016, KDD.

[52]  Z. Obermeyer,et al.  Predicting the Future - Big Data, Machine Learning, and Clinical Medicine. , 2016, The New England journal of medicine.

[53]  Amit Sharma,et al.  Preserving Causal Constraints in Counterfactual Explanations for Machine Learning Classifiers , 2019, ArXiv.

[54]  Babak Salimi,et al.  Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals , 2021, SIGMOD Conference.

[55]  Kuangyan Song,et al.  "Why Should You Trust My Explanation?" Understanding Uncertainty in LIME Explanations , 2019 .

[56]  Yihong Zhang,et al.  GeCo: Quality Counterfactual Explanations in Real Time , 2021, Proc. VLDB Endow..

[57]  Sameer Singh,et al.  How can we fool LIME and SHAP? Adversarial Attacks on Post hoc Explanation Methods , 2019, ArXiv.

[58]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.