The many Shapley values for model explanation

The Shapley value has become a popular method to attribute the prediction of a machine-learning model on an input to its base features. The use of the Shapley value is justified by citing [16] showing that it is the \emph{unique} method that satisfies certain good properties (\emph{axioms}). There are, however, a multiplicity of ways in which the Shapley value is operationalized in the attribution problem. These differ in how they reference the model, the training data, and the explanation context. These give very different results, rendering the uniqueness result meaningless. Furthermore, we find that previously proposed approaches can produce counterintuitive attributions in theory and in practice---for instance, they can assign non-zero attributions to features that are not even referenced by the model. In this paper, we use the axiomatic approach to study the differences between some of the many operationalizations of the Shapley value for attribution, and propose a technique called Baseline Shapley (BShap) that is backed by a proper uniqueness result. We also contrast BShap with Integrated Gradients, another extension of Shapley value to the continuous setting.

[1]  Erik Strumbelj,et al.  Explaining prediction models and individual predictions with feature contributions , 2014, Knowledge and Information Systems.

[2]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[3]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[4]  U. Grömping Estimators of Relative Importance in Linear Regression Based on Variance Decomposition , 2007 .

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[7]  Erik Strumbelj,et al.  Explaining instance classifications with interactions of subsets of feature values , 2009, Data Knowl. Eng..

[8]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[9]  L. Shapley A Value for n-person Games , 1988 .

[10]  Kjersti Aas,et al.  Explaining individual predictions when features are dependent: More accurate approximations to Shapley values , 2019, Artif. Intell..

[11]  Yi Sun,et al.  Axiomatic attribution for multilinear functions , 2011, EC '11.

[12]  Art B. Owen,et al.  On Shapley Value for Measuring Importance of Dependent Inputs , 2016, SIAM/ASA J. Uncertain. Quantification.

[13]  L. Shapley,et al.  Values of Non-Atomic Games , 1974 .

[14]  Art B. Owen,et al.  Sobol' Indices and Shapley Value , 2014, SIAM/ASA J. Uncertain. Quantification.

[15]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[16]  Eric J. Friedman,et al.  Three Methods to Share Joint Costs or Surplus , 1999 .

[17]  Scott M. Lundberg,et al.  Consistent Individualized Feature Attribution for Tree Ensembles , 2018, ArXiv.

[18]  H. Young Monotonic solutions of cooperative games , 1985 .

[19]  Dominik Janzing,et al.  Feature relevance quantification in explainable AI: A causality problem , 2019, AISTATS.

[20]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .

[21]  Yair Zick,et al.  Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[22]  P. Sen,et al.  Introduction to bivariate and multivariate analysis , 1981 .

[23]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[24]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.