Explaining machine learning classifiers through diverse counterfactual explanations

Post-hoc explanations of machine learning models are crucial for people to understand and act on algorithmic predictions. An intriguing class of explanations is through counterfactuals, hypothetical examples that show people how to obtain a different prediction. We posit that effective counterfactual explanations should satisfy two properties: feasibility of the counterfactual actions given user context and constraints, and diversity among the counterfactuals presented. To this end, we propose a framework for generating and evaluating a diverse set of counterfactual explanations based on determinantal point processes. To evaluate the actionability of counterfactuals, we provide metrics that enable comparison of counterfactual-based methods to other local explanation methods. We further address necessary tradeoffs and point to causal implications in optimizing for counterfactuals. Our experiments on four real-world datasets show that our framework can generate a set of counterfactuals that are diverse and well approximate local decision boundaries, outperforming prior approaches to generating diverse counterfactuals. We provide an implementation of the framework at https://github.com/microsoft/DiCE.

[1]  Alun D. Preece,et al.  Interpretable to Whom? A Role-based Model for Analyzing Interpretable Machine Learning Systems , 2018, ArXiv.

[2]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[3]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[4]  P. Todd,et al.  Can There Ever Be Too Many Options? A Meta-Analytic Review of Choice Overload , 2010 .

[5]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[6]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Christopher T. Lowenkamp,et al.  False Positives, False Negatives, and False Analyses: A Rejoinder to "Machine Bias: There's Software Used across the Country to Predict Future Criminals. and It's Biased against Blacks" , 2016 .

[9]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[10]  Bolei Zhou,et al.  Interpretable Basis Decomposition for Visual Explanation , 2018, ECCV.

[11]  Illtyd Trethowan Causality , 1938 .

[12]  Johannes Gehrke,et al.  Accurate intelligible models with pairwise interactions , 2013, KDD.

[13]  Hany Farid,et al.  The accuracy, fairness, and limits of predicting recidivism , 2018, Science Advances.

[14]  Jonah E. Rockoff,et al.  provided that full credit, including © notice, is given to the source. Can You Recognize an Effective Teacher When You Recruit One? , 2008 .

[15]  Alessio D'Ignazio,et al.  Targeting Policy-Compliers with Machine Learning: An Application to a Tax Rebate Programme in Italy , 2017 .

[16]  Matevz Kunaver,et al.  Diversity in recommender systems - A survey , 2017, Knowl. Based Syst..

[17]  Venkatesh Saligrama,et al.  Prediction of hospitalization due to heart diseases by supervised learning methods , 2015, Int. J. Medical Informatics.

[18]  Jure Leskovec,et al.  Interpretable Decision Sets: A Joint Framework for Description and Prediction , 2016, KDD.

[19]  Oluwasanmi Koyejo,et al.  Examples are not enough, learn to criticize! Criticism for Interpretability , 2016, NIPS.

[20]  Deena Skolnick Weisberg,et al.  Pretense, Counterfactuals, and Bayesian Causal Models: Why What Is Not Real Really Matters , 2013, Cogn. Sci..

[21]  Franco Turini,et al.  Local Rule-Based Explanations of Black Box Decision Systems , 2018, ArXiv.

[22]  Rich Caruana,et al.  Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation , 2017, AIES.

[23]  Jiayu Tang,et al.  What Else Is There? Search Diversity Examined , 2009, ECIR.

[24]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[26]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[27]  Franco Turini,et al.  Meaningful Explanations of Black Box AI Decision Systems , 2019, AAAI.

[28]  Chris Russell,et al.  Efficient Search for Diverse Coherent Explanations , 2019, FAT.

[29]  Risto Miikkulainen,et al.  GRADE: Machine Learning Support for Graduate Admissions , 2013, AI Mag..

[30]  Susan Athey,et al.  Beyond prediction: Using big data for policy problems , 2017, Science.

[31]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[32]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[33]  F. Maxwell Harper,et al.  User perception of differences in recommender algorithms , 2014, RecSys '14.

[34]  Helena Szrek,et al.  Choice Set Size and Decision Making: The Case of Medicare Part D Prescription Drug Plans , 2010, Medical decision making : an international journal of the Society for Medical Decision Making.

[35]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[36]  Johannes Gehrke,et al.  Intelligible models for classification and regression , 2012, KDD.

[37]  Sarah R. Beck,et al.  Relating developments in children's counterfactual thinking and executive functions , 2009 .

[38]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[39]  A. Gopnik,et al.  The power of possibility: causal learning, counterfactual reasoning, and pretend play , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.