Counterfactual Explanations in Explainable AI: A Tutorial

Deep learning has shown powerful performances in many fields, however its black-box nature hinders its further applications. In response, explainable artificial intelligence emerges, aiming to explain the predictions and behaviors of deep learning models. Among many explanation methods, counterfactual explanation has been identified as one of the best methods due to its resemblance to human cognitive process: to deliver an explanation by constructing a contrastive situation so that human may interpret the underlying mechanism by cognitively demonstrating the difference. In this tutorial, we will introduce the cognitive concept and characteristics of counterfactual explanation, its computational form, mainstream methods, and various adaptation in terms of different explanation settings. In addition, we will demonstrate several typical use cases of counterfactual explanations in popular research areas. Finally, in light of practice, we outline the potential applications of counterfactual explanations like data augmentation or conversation system. We hope this tutorial can help the participants get an overview sense of counterfactual explanations.

[1]  R. Byrne Précis of The Rational Imagination: How People Create Alternatives to Reality , 2007, Behavioral and Brain Sciences.

[2]  Carlos Guestrin,et al.  Semantically Equivalent Adversarial Rules for Debugging NLP models , 2018, ACL.

[3]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[4]  Eduard Hovy,et al.  Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2020, ICLR.

[5]  Jure Leskovec,et al.  GNNExplainer: Generating Explanations for Graph Neural Networks , 2019, NeurIPS.

[6]  Song-Chun Zhu,et al.  CoCoX: Generating Conceptual and Counterfactual Explanations via Fault-Lines , 2020, AAAI.

[7]  Yejin Choi,et al.  Counterfactual Story Reasoning and Generation , 2019, EMNLP.

[8]  Thorsten Joachims,et al.  Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.

[9]  Yong Han,et al.  Generative Counterfactual Introspection for Explainable Deep Learning , 2019, 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[10]  Mélanie Frappier,et al.  The Book of Why: The New Science of Cause and Effect , 2018, Science.

[11]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[12]  Ruth M. J. Byrne,et al.  Counterfactuals in Explainable Artificial Intelligence (XAI): Evidence from Human Reasoning , 2019, IJCAI.

[13]  Amit Sharma,et al.  Explaining machine learning classifiers through diverse counterfactual explanations , 2020, FAT*.

[14]  Gerhard Weikum,et al.  PRINCE: Provider-side Interpretability with Counterfactual Explanations in Recommender Systems , 2020, WSDM.

[15]  Foster J. Provost,et al.  Explaining Data-Driven Document Classifications , 2013, MIS Q..

[16]  Artur S. d'Avila Garcez,et al.  Measurable Counterfactual Local Explanations for Any Classifier , 2019, ECAI.

[17]  Ziyan Wu,et al.  Counterfactual Visual Explanations , 2019, ICML.

[18]  Weng-Keen Wong,et al.  Open Set Learning with Counterfactual Images , 2018, ECCV.

[19]  Freddy Lécué,et al.  Interpretable Credit Application Predictions With Counterfactual Explanations , 2018, NIPS 2018.

[20]  Amit Dhurandhar,et al.  Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives , 2018, NeurIPS.

[21]  Chris Russell,et al.  Efficient Search for Diverse Coherent Explanations , 2019, FAT.