Counterfactual Explanations for Machine Learning: A Review

Machine learning plays a role in many deployed decision systems, often in ways that are difficult or impossible to understand by human stakeholders. Explaining, in a human-understandable way, the relationship between the input and output of machine learning models is essential to the development of trustworthy machine-learning-based systems. A burgeoning body of research seeks to define the goals and methods of explainability in machine learning. In this paper, we seek to review and categorize research on counterfactual explanations, a specific class of explanation that provides a link between what could have happened had input to a model been changed in a particular way. Modern approaches to counterfactual explainability in machine learning draw connections to the established legal doctrine in many countries, making them appealing to fielded systems in high-impact areas such as finance and healthcare. Thus, we design a rubric with desirable properties of counterfactual explanation algorithms and comprehensively evaluate all currently-proposed algorithms against that rubric. Our rubric provides easy comparison and comprehension of the advantages and disadvantages of different approaches and serves as an introduction to major research themes in this field. We also identify gaps and discuss promising research directions in the space of counterfactual explainability.

[1]  Ruth M. J. Byrne,et al.  Counterfactuals in Explainable Artificial Intelligence (XAI): Evidence from Human Reasoning , 2019, IJCAI.

[2]  Daniel W. Apley,et al.  Visualizing the effects of predictor variables in black box supervised learning models , 2016, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[3]  Martin Wattenberg,et al.  The What-If Tool: Interactive Probing of Machine Learning Models , 2019, IEEE Transactions on Visualization and Computer Graphics.

[4]  Marie-Jeanne Lesot,et al.  The Dangers of Post-hoc Interpretability: Unjustified Counterfactual Explanations , 2019, IJCAI.

[5]  R. Darrell Bock,et al.  Fitting a response model forn dichotomously scored items , 1970 .

[6]  Jaime S. Cardoso,et al.  Machine Learning Interpretability: A Survey on Methods and Metrics , 2019, Electronics.

[7]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[8]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[9]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Panagiotis Papapetrou,et al.  A peek into the black box: exploring classifiers by randomization , 2014, Data Mining and Knowledge Discovery.

[11]  Manuel Gomez-Rodriguez,et al.  Decisions, Counterfactual Explanations and Strategic Behavior , 2020, NeurIPS.

[12]  Aws Albarghouthi,et al.  Synthesizing Action Sequences for Modifying Model Decisions , 2019, AAAI.

[13]  Oluwasanmi Koyejo,et al.  Towards Realistic Individual Recourse and Actionable Explanations in Black-Box Decision Making Systems , 2019, ArXiv.

[14]  Dale T. Miller,et al.  Norm theory: Comparing reality to its alternatives , 1986 .

[15]  Franco Turini,et al.  Local Rule-Based Explanations of Black Box Decision Systems , 2018, ArXiv.

[16]  Houtao Deng,et al.  Interpreting tree ensembles with inTrees , 2018, International Journal of Data Science and Analytics.

[17]  Gunnar Rätsch,et al.  The Feature Importance Ranking Measure , 2009, ECML/PKDD.

[18]  Luciano Floridi,et al.  Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation , 2017 .

[19]  Asbjørn Nilsen Riseth,et al.  Algorithms for decision making , 2018 .

[20]  Marie-Jeanne Lesot,et al.  Comparison-Based Inverse Classification for Interpretability in Machine Learning , 2018, IPMU.

[21]  Peter A. Flach,et al.  Desiderata for Interpretability: Explaining Decision Tree Predictions with Counterfactuals , 2019, AAAI.

[22]  Michael Correll,et al.  Ethical Dimensions of Visualization Research , 2018, CHI.

[23]  Olcay Boz,et al.  Extracting decision trees from trained neural networks , 2002, KDD.

[24]  Carlos Fernandez,et al.  Explaining Data-Driven Decisions made by AI Systems: The Counterfactual Approach , 2020, ArXiv.

[25]  Amit Sharma,et al.  Explaining machine learning classifiers through diverse counterfactual explanations , 2020, FAT*.

[26]  Foster J. Provost,et al.  Explaining Data-Driven Document Classifications , 2013, MIS Q..

[27]  Ryan Turner,et al.  A Model Explanation System: Latest Updates and Extensions , 2016, ArXiv.

[28]  Hinda Haned,et al.  Actionable Interpretability through Optimizable Counterfactual Explanations for Tree Ensembles , 2019, ArXiv.

[29]  Joachim Diederich,et al.  Survey and critique of techniques for extracting rules from trained artificial neural networks , 1995, Knowl. Based Syst..

[30]  Mandy Eberhart,et al.  The Scientific Image , 2016 .

[31]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[32]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[33]  R. Byrne Précis of The Rational Imagination: How People Create Alternatives to Reality , 2007, Behavioral and Brain Sciences.

[34]  Sanjay Krishnan,et al.  PALM: Machine Learning Explanations For Iterative Debugging , 2017, HILDA@SIGMOD.

[35]  Amit Dhurandhar,et al.  Model Agnostic Contrastive Explanations for Structured Data , 2019, ArXiv.

[36]  Julia Rubin,et al.  Fairness Definitions Explained , 2018, 2018 IEEE/ACM International Workshop on Software Fairness (FairWare).

[37]  Bernd Bischl,et al.  Multi-Objective Counterfactual Explanations , 2020, PPSN.

[38]  Solon Barocas,et al.  The hidden assumptions behind counterfactual explanations and principal reasons , 2019, FAT*.

[39]  Yair Zick,et al.  Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[40]  Jun Zhao,et al.  'It's Reducing a Human Being to a Percentage': Perceptions of Justice in Algorithmic Decisions , 2018, CHI.

[41]  Pedro M. Domingos Knowledge Discovery Via Multiple Models , 1998, Intell. Data Anal..

[42]  Marie-Jeanne Lesot,et al.  Issues with post-hoc counterfactual explanations: a discussion , 2019, ArXiv.

[43]  Javier M. Moguerza,et al.  Random forest explainability using counterfactual sets , 2020, Inf. Fusion.

[44]  Hiroki Arimura,et al.  DACE: Distribution-Aware Counterfactual Explanation by Mixed-Integer Linear Optimization , 2020, IJCAI.

[45]  Boris Kment,et al.  Counterfactuals and Explanation , 2006 .

[46]  William Nick Street,et al.  Generalized Inverse Classification , 2016, SDM.

[47]  Janis Klaise,et al.  Interpretable Counterfactual Explanations Guided by Prototypes , 2019, ECML/PKDD.

[48]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[49]  Barry Smyth,et al.  Good Counterfactuals and Where to Find Them: A Case-Based Technique for Generating Counterfactuals for Explainable AI (XAI) , 2020, ICCBR.

[50]  Shubham Rathi,et al.  Generating Counterfactual and Contrastive Explanations using SHAP , 2019, ArXiv.

[51]  Yang Liu,et al.  Actionable Recourse in Linear Classification , 2018, FAT.

[52]  Bas C. van Fraassen,et al.  The Scientific Image , 1980 .

[53]  Rachel K. E. Bellamy,et al.  Explaining models an empirical study of how explanations impact fairness judgment , 2019 .

[54]  Amir-Hossein Karimi,et al.  Model-Agnostic Counterfactual Explanations for Consequential Decisions , 2019, AISTATS.

[55]  Charu C. Aggarwal,et al.  The Inverse Classification Problem , 2010, Journal of Computer Science and Technology.

[56]  Artur S. d'Avila Garcez,et al.  Measurable Counterfactual Local Explanations for Any Classifier , 2019, ECAI.

[57]  Bernhard Schölkopf,et al.  Algorithmic Recourse: from Counterfactual Explanations to Interventions , 2020, ArXiv.

[58]  Joydeep Ghosh,et al.  CERTIFAI: Counterfactual Explanations for Robustness, Transparency, Interpretability, and Fairness of Artificial Intelligence models , 2019, ArXiv.

[59]  Amit Dhurandhar,et al.  Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives , 2018, NeurIPS.

[60]  Gjergji Kasneci,et al.  On Counterfactual Explanations under Predictive Multiplicity , 2020, UAI.

[61]  Richard S. Johannes,et al.  Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus , 1988 .

[62]  Andr'e Artelt,et al.  On the computation of counterfactual explanations - A survey , 2019, ArXiv.

[63]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[64]  Carl Doersch,et al.  Tutorial on Variational Autoencoders , 2016, ArXiv.

[65]  David Weinberger,et al.  Accountability of AI Under the Law: The Role of Explanation , 2017, ArXiv.

[66]  Amit Sharma,et al.  Preserving Causal Constraints in Counterfactual Explanations for Machine Learning Classifiers , 2019, ArXiv.

[67]  Bernhard Schölkopf,et al.  Algorithmic recourse under imperfect causal knowledge: a probabilistic approach , 2020, NeurIPS.

[68]  Cuntai Guan,et al.  A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[69]  Fabrizio Silvestri,et al.  Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking , 2017, KDD.

[70]  Peter A. Flach,et al.  FACE: Feasible and Actionable Counterfactual Explanations , 2020, AIES.

[71]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Chris Russell,et al.  Efficient Search for Diverse Coherent Explanations , 2019, FAT.

[73]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[74]  Emil Pitkin,et al.  Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation , 2013, 1309.6392.

[75]  Seth Flaxman,et al.  EU regulations on algorithmic decision-making and a "right to explanation" , 2016, ArXiv.

[76]  Amina Adadi,et al.  Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) , 2018, IEEE Access.

[77]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[78]  Suhang Wang,et al.  GRACE: Generating Concise and Informative Contrastive Sample to Explain Neural Network Model's Prediction , 2020, KDD.

[79]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[80]  Aníbal R. Figueiras-Vidal,et al.  Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[81]  Keith A. Markus,et al.  Making Things Happen: A Theory of Causal Explanation , 2007 .

[82]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[83]  Freddy Lécué,et al.  Interpretable Credit Application Predictions With Counterfactual Explanations , 2018, NIPS 2018.