FOCUS: Flexible Optimizable Counterfactual Explanations for Tree Ensembles

Model interpretability has become an important problem in machine learning (ML) due to the increased effect algorithmic decisions have on humans. Counterfactual explanations can help users understand not only why ML models make certain decisions, but also give insight into how these decisions can be modified. We frame the problem of finding counterfactual explanations as an optimization task and extend previous work that could only be applied to differentiable models. In order to accommodate non-differentiable models such as tree ensembles, we propose using probabilistic model approximations in the optimization framework. We introduce a simple approximation technique that is effective for finding counterfactual explanations for predictions of the original model using a range of distance metrics. We show that our counterfactual examples are significantly closer to the original instances compared to other methods designed for tree ensembles for four distance metrics.

[1]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2017, Pattern Recognit..

[2]  Freddy Lécué,et al.  Interpretable Credit Application Predictions With Counterfactual Explanations , 2018, NIPS 2018.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[5]  Solon Barocas,et al.  The hidden assumptions behind counterfactual explanations and principal reasons , 2019, FAT*.

[6]  Amir-Hossein Karimi,et al.  Model-Agnostic Counterfactual Explanations for Consequential Decisions , 2019, AISTATS.

[7]  Hiroki Arimura,et al.  DACE: Distribution-Aware Counterfactual Explanation by Mixed-Integer Linear Optimization , 2020, IJCAI.

[8]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[9]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[10]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[11]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[12]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[13]  Oluwasanmi Koyejo,et al.  Towards Realistic Individual Recourse and Actionable Explanations in Black-Box Decision Making Systems , 2019, ArXiv.

[14]  Randall Balestriero,et al.  Neural Decision Trees , 2017, ArXiv.

[15]  Marie-Jeanne Lesot,et al.  Inverse Classification for Comparison-based Interpretability in Machine Learning , 2017, ArXiv.

[16]  Tim Miller,et al.  Explainable Reinforcement Learning Through a Causal Lens , 2019, AAAI.

[17]  Bernhard Schölkopf,et al.  Algorithmic recourse under imperfect causal knowledge: a probabilistic approach , 2020, NeurIPS.

[18]  Janis Klaise,et al.  Interpretable Counterfactual Explanations Guided by Prototypes , 2019, ECML/PKDD.

[19]  Yang Liu,et al.  Actionable Recourse in Linear Classification , 2018, FAT.

[20]  Fabrizio Silvestri,et al.  Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking , 2017, KDD.

[21]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[22]  Bernhard Schölkopf,et al.  Algorithmic Recourse: from Counterfactual Explanations to Interventions , 2020, ArXiv.

[23]  Amit Sharma,et al.  Explaining machine learning classifiers through diverse counterfactual explanations , 2020, FAT*.

[24]  Amit Dhurandhar,et al.  Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives , 2018, NeurIPS.

[25]  Mihaela van der Schaar,et al.  Deep Counterfactual Networks with Propensity-Dropout , 2017, ArXiv.

[26]  Peter A. Flach,et al.  FACE: Feasible and Actionable Counterfactual Explanations , 2020, AIES.

[27]  Chris Russell,et al.  Efficient Search for Diverse Coherent Explanations , 2019, FAT.