论文信息 - Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals

Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals

There has been a recent resurgence of interest in explainable artificial intelligence (XAI) that aims to reduce the opaqueness of AI-based decision-making systems, allowing humans to scrutinize and trust them. Prior work in this context has focused on the attribution of responsibility for an algorithm's decisions to its inputs wherein responsibility is typically approached as a purely associational concept. In this paper, we propose a principled causality-based approach for explaining black-box decision-making systems that addresses limitations of existing methods in XAI. At the core of our framework lies probabilistic contrastive counterfactuals, a concept that can be traced back to philosophical, cognitive, and social foundations of theories on how humans generate and select explanations. We show how such counterfactuals can quantify the direct and indirect influences of a variable on decisions made by an algorithm, and provide actionable recourse for individuals negatively affected by the algorithm's decision. Unlike prior work, our system, LEWIS: (1)~can compute provably effective explanations and recourse at local, global and contextual levels; (2)~is designed to work with users with varying levels of background knowledge of the underlying causal model; and (3)~makes no assumptions about the internals of an algorithmic system except for the availability of its input-output data. We empirically evaluate LEWIS on four real-world datasets and show that it generates human-understandable explanations that improve upon state-of-the-art approaches in XAI, including the popular LIME and SHAP. Experiments on synthetic data further demonstrate the correctness of LEWIS's explanations and the scalability of its recourse algorithm.

Babak Salimi | Sainyam Galhotra | Romila Pradhan

[1] Jin Tian,et al. Probabilities of causation: Bounds and identification , 2000, Annals of Mathematics and Artificial Intelligence.

[2] Carlos Guestrin,et al. Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[3] Brandon M. Greenwell,et al. A Simple and Effective Model-Based Variable Importance Measure , 2018, ArXiv.

[4] Dan Suciu,et al. Capuchin: Causal Database Repair for Algorithmic Fairness , 2019, ArXiv.

[5] A. Gorban,et al. The Five Factor Model of personality and evaluation of drug consumption risk , 2015, 1506.06297.

[6] Bernhard Schölkopf,et al. Avoiding Discrimination through Causal Reasoning , 2017, NIPS.

[7] Matt J. Kusner,et al. Counterfactual Fairness , 2017, NIPS.

[8] S Greenland,et al. On the measurement of susceptibility in epidemiologic studies. , 1989, American journal of epidemiology.

[9] Marie-Jeanne Lesot,et al. Inverse Classification for Comparison-based Interpretability in Machine Learning , 2017, ArXiv.

[10] S. Lipovetsky,et al. Analysis of regression in game theory approach , 2001 .

[11] J. Pearl. Detecting Latent Heterogeneity , 2013, Probabilistic and Causal Inference.

[12] Elias Bareinboim,et al. Controlling Selection Bias in Causal Inference , 2011, AISTATS.

[13] Bernhard Schölkopf,et al. A survey of algorithmic recourse: definitions, formulations, solutions, and prospects , 2020, ArXiv.

[14] Suresh Venkatasubramanian,et al. Problems with Shapley-value-based explanations as feature importance measures , 2020, ICML.

[15] John P. Dickerson,et al. Counterfactual Explanations for Machine Learning: A Review , 2020, ArXiv.

[16] Kjersti Aas,et al. Explaining individual predictions when features are dependent: More accurate approximations to Shapley values , 2019, Artif. Intell..

[17] S Greenland,et al. Relation of probability of causation to relative risk and doubling dose: a methodologic error that has become a social problem. , 1999, American journal of public health.

[18] Emil Pitkin,et al. Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation , 2013, 1309.6392.

[19] Richard Berk,et al. Machine Learning Risk Assessments in Criminal Justice Settings , 2018, Springer International Publishing.

[20] Cynthia Rudin,et al. Model Class Reliance: Variable Importance Measures for any Machine Learning Model Class, from the "Rashomon" Perspective , 2018 .

[21] Toon Calders,et al. Handling Conditional Discrimination , 2011, 2011 IEEE 11th International Conference on Data Mining.

[22] Yang Liu,et al. Actionable Recourse in Linear Classification , 2018, FAT.

[23] J. Pearl,et al. Clarifying the Usage of Structural Models for Commonsense Causal Reasoning , 2003 .

[24] Amit Sharma,et al. Explaining machine learning classifiers through diverse counterfactual explanations , 2020, FAT*.

[25] Scott Lundberg,et al. A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[26] Daniel W. Apley,et al. Visualizing the effects of predictor variables in black box supervised learning models , 2016, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[27] Brandon M. Greenwell,et al. Interpretable Machine Learning , 2019, Hands-On Machine Learning with R.

[28] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[29] Joshua B. Tenenbaum,et al. How, whether, why: Causal judgments as counterfactual contrasts , 2015, CogSci.

[30] S Greenland,et al. The probability of causation under a stochastic model for individual risk. , 1989, Biometrics.

[31] Oluwasanmi Koyejo,et al. Towards Realistic Individual Recourse and Actionable Explanations in Black-Box Decision Making Systems , 2019, ArXiv.

[32] Giles Hooker,et al. Discovering additive structure in black box functions , 2004, KDD.

[33] Moritz Hardt,et al. Strategic Classification is Causal Modeling in Disguise , 2019, ICML.

[34] E. Bareinboim,et al. On Pearl’s Hierarchy and the Foundations of Causal Inference , 2022 .

[35] Janis Klaise,et al. Interpretable Counterfactual Explanations Guided by Prototypes , 2019, ECML/PKDD.

[36] Dan Suciu,et al. Bias in OLAP Queries: Detection, Explanation, and Removal , 2018, SIGMOD Conference.

[37] Peter A. Flach,et al. Counterfactual Explanations of Machine Learning Predictions: Opportunities and Challenges for AI Safety , 2019, SafeAI@AAAI.

[38] Louis Anthony Cox,et al. Probability of Causation and the Attributable Proportion Risk , 1984 .

[39] Kush R. Varshney,et al. Fair Data Integration , 2020, ArXiv.

[40] Amit Sharma,et al. Preserving Causal Constraints in Counterfactual Explanations for Machine Learning Classifiers , 2019, ArXiv.

[41] Colin Rowat,et al. Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability , 2020, NeurIPS.

[42] Yuriy Brun,et al. Fairness testing: testing software for discrimination , 2017, ESEC/SIGSOFT FSE.

[43] Ilya Shpitser,et al. Fair Inference on Outcomes , 2017, AAAI.

[44] C. J. Ducasse. On the Nature and the Observability of the Causal Relation , 2018, Routledge Library Editions: Epistemology.

[45] J. Friedman. Greedy function approximation: A gradient boosting machine. , 2001 .

[46] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[47] Raymond Reiter,et al. Characterizing Diagnoses and Systems , 1992, Artif. Intell..

[48] Keith A. Markus,et al. Making Things Happen: A Theory of Causal Explanation , 2007 .

[49] Chris Russell,et al. Explaining Explanations in AI , 2018, FAT.

[50] Alexey Ignatiev,et al. Towards Trustable Explainable AI , 2020, IJCAI.

[51] Dan Suciu,et al. Interventional Fairness: Causal Database Repair for Algorithmic Fairness , 2019, SIGMOD Conference.

[52] Dan Suciu,et al. Causal Relational Learning , 2020, SIGMOD Conference.

[53] Amir-Hossein Karimi,et al. Model-Agnostic Counterfactual Explanations for Consequential Decisions , 2019, AISTATS.

[54] Yong Han,et al. Generative Counterfactual Introspection for Explainable Deep Learning , 2019, 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[55] Silvia Chiappa,et al. Path-Specific Counterfactual Fairness , 2018, AAAI.

[56] Chenhao Tan,et al. Towards Unifying Feature Attribution and Counterfactual Explanations: Different Means to the Same End , 2021, AIES.

[57] Judea Pearl,et al. Direct and Indirect Effects , 2001, UAI.

[58] Bernhard Schölkopf,et al. Algorithmic recourse under imperfect causal knowledge: a probabilistic approach , 2020, NeurIPS.

[59] Dan Suciu,et al. Database Repair Meets Algorithmic Fairness , 2020, SIGMOD Rec..

[60] Amit Dhurandhar,et al. Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives , 2018, NeurIPS.

[61] Solon Barocas,et al. The Intuitive Appeal of Explainable Machines , 2018 .

[62] Dan Suciu,et al. HypDB: A Demonstration of Detecting, Explaining and Resolving Bias in OLAP queries , 2018, Proc. VLDB Endow..

[63] Giles Hooker,et al. Please Stop Permuting Features: An Explanation and Alternatives , 2019, ArXiv.

[64] Adnan Darwiche,et al. A Symbolic Approach to Explaining Bayesian Network Classifiers , 2018, IJCAI.

[65] Adam Morton,et al. Contrastive knowledge , 2003 .

[66] Yair Zick,et al. Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[67] Chris Russell,et al. Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[68] Maartje M. A. de Graaf,et al. How People Explain Action (and Autonomous Intelligent Systems Should Too) , 2017, AAAI Fall Symposia.

[69] Matt J. Kusner,et al. When Worlds Collide: Integrating Different Counterfactual Assumptions in Fairness , 2017, NIPS.

[70] C. Poole. A history of the population attributable fraction and related measures. , 2015, Annals of epidemiology.

[71] Tommi S. Jaakkola,et al. On the Robustness of Interpretability Methods , 2018, ArXiv.

[72] Mark Alfano,et al. The philosophical basis of algorithmic recourse , 2020, FAT*.

[73] Suresh Venkatasubramanian,et al. Auditing black-box models for indirect influence , 2016, Knowledge and Information Systems.

[74] Tim Miller,et al. Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[75] Solon Barocas,et al. The hidden assumptions behind counterfactual explanations and principal reasons , 2019, FAT*.

[76] Ankur Taly,et al. The Explanation Game: Explaining Machine Learning Models with Cooperative Game Theory , 2019, ArXiv.

[77] Franco Turini,et al. A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[78] Sumit Gulwani,et al. ExTuNe: Explaining Tuple Non-conformance , 2020, SIGMOD Conference.

[79] Roxana Geambasu,et al. FairTest: Discovering Unwarranted Associations in Data-Driven Applications , 2015, 2017 IEEE European Symposium on Security and Privacy (EuroS&P).

[80] Sameer Singh,et al. Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods , 2020, AIES.

[81] Erik Strumbelj,et al. Explaining prediction models and individual predictions with feature contributions , 2014, Knowledge and Information Systems.

[82] Dan Suciu,et al. Causality-based Explanation of Classification Outcomes , 2020, DEEM@SIGMOD.

[83] Scott M. Lundberg,et al. Consistent Individualized Feature Attribution for Tree Ensembles , 2018, ArXiv.

[84] Adnan Darwiche,et al. On The Reasons Behind Decisions , 2020, ECAI.

[85] Joseph Y. Halpern,et al. Causes and Explanations: A Structural-Model Approach. Part II: Explanations , 2001, The British Journal for the Philosophy of Science.

[86] Judea Pearl,et al. Symbolic Causal Networks , 1994, AAAI.

[87] J. Pearl,et al. Causal Inference in Statistics: A Primer , 2016 .

[88] Judea Pearl,et al. The seven tools of causal inference, with reflections on machine learning , 2019, Commun. ACM.

[89] David W. Robertson,et al. The Common Sense of Cause in Fact , 1997 .

[90] P. Spirtes,et al. Review of Causal Discovery Methods Based on Graphical Models , 2019, Front. Genet..

[91] Eric Grynaviski. Contrasts, counterfactuals,and causes , 2013 .