论文信息 - The Explanation Game: Explaining Machine Learning Models Using Shapley Values - 字舞流文

The Explanation Game: Explaining Machine Learning Models Using Shapley Values

A number of techniques have been proposed to explain a machine learning model’s prediction by attributing it to the corresponding input features. Popular among these are techniques that apply the Shapley value method from cooperative game theory. While existing papers focus on the axiomatic motivation of Shapley values, and efficient techniques for computing them, they offer little justification for the game formulations used, and do not address the uncertainty implicit in their methods’ outputs. For instance, the popular SHAP algorithm’s formulation may give substantial attributions to features that play no role in the model. In this work, we illustrate how subtle differences in the underlying game formulations of existing methods can cause large differences in the attributions for a prediction. We then present a general game formulation that unifies existing methods, and enables straightforward confidence intervals on their attributions. Furthermore, it allows us to interpret the attributions as contrastive explanations of an input relative to a distribution of reference inputs. We tie this idea to classic research in cognitive psychology on contrastive explanations, and propose a conceptual framework for generating and interpreting explanations for ML models, called formulate, approximate, explain (FAE). We apply this framework to explain black-box models trained on two UCI datasets and a Lending Club dataset.

Ankur Taly | Luke Merrick | Ankur Taly | Luke Merrick

[1] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[2] Robert Tibshirani,et al. Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy , 1986 .

[3] Ralph Abbey,et al. An AI-Augmented Lesion Detection Framework For Liver Metastases With Model Interpretability , 2019, ArXiv.

[4] Scott Lundberg,et al. A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[5] Andreas Holzinger,et al. KANDINSKY Patterns as IQ-Test for Machine Learning , 2019, CD-MAKE.

[6] Erik Strumbelj,et al. Explaining prediction models and individual predictions with feature contributions , 2014, Knowledge and Information Systems.

[7] Eytan Ruppin,et al. Feature Selection Based on the Shapley Value , 2005, IJCAI.

[8] Erik Strumbelj,et al. An Efficient Explanation of Individual Classifications using Game Theory , 2010, J. Mach. Learn. Res..

[9] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.

[10] S Greenland. The bootstrap method for standard errors and confidence intervals of the adjusted attributable risk. , 1992, Epidemiology.

[11] Dale T. Miller,et al. Norm theory: Comparing reality to its alternatives , 1986 .

[12] Markus H. Gross,et al. Gradient-Based Attribution Methods , 2019, Explainable AI.

[13] Talal Rahwan,et al. Bounding the Estimation Error of Sampling-based Shapley Value Approximation With/Without Stratifying , 2013, ArXiv.

[14] Germund Hesslow,et al. The problem of causal selection , 1988 .

[15] Cengiz Öztireli,et al. Towards better understanding of gradient-based attribution methods for Deep Neural Networks , 2017, ICLR.

[16] H. Young. Monotonic solutions of cooperative games , 1985 .

[17] Yair Zick,et al. Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[18] J. Knobe,et al. Cause and Norm , 2009 .

[19] Markus H. Gross,et al. Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation , 2019, ICML.

[20] L. Shapley. A Value for n-person Games , 1988 .

[21] Avanti Shrikumar,et al. Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[22] Tie-Yan Liu,et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[23] Amit Dhurandhar,et al. Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives , 2018, NeurIPS.

[24] Dominik Janzing,et al. Feature relevance quantification in explainable AI: A causality problem , 2019, AISTATS.

[25] Chris Russell,et al. Explaining Explanations in AI , 2018, FAT.

[26] James Y. Zou,et al. Data Shapley: Equitable Valuation of Data for Machine Learning , 2019, ICML.