Predictive and Causal Implications of using Shapley Value for Model Interpretation

Shapley value is a concept from game theory. Recently, it has been used for explaining complex models produced by machine learning techniques. Although the mathematical definition of Shapley value is straight-forward, the implication of using it as a model interpretation tool is yet to be described. In the current paper, we analyzed Shapley value in the Bayesian network framework. We established the relationship between Shapley value and conditional independence, a key concept in both predictive and causal modeling. Our results indicate that, eliminating a variable with high Shapley value from a model do not necessarily impair predictive performance, whereas eliminating a variable with low Shapley value from a model could impair performance. Therefore, using Shapley value for feature selection do not result in the most parsimonious and predictively optimal model in the general case. More importantly, Shapley value of a variable do not reflect their causal relationship with the target of interest.

[1]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[2]  Hans-Jürgen Zimmermann,et al.  Improved feature selection and classification by the 2-additive fuzzy measure , 1999, Fuzzy Sets Syst..

[3]  Prashant J. Shenoy,et al.  Predicting solar generation from weather forecasts using machine learning , 2011, 2011 IEEE International Conference on Smart Grid Communications (SmartGridComm).

[4]  Jan Hendrik Witte,et al.  Deep Learning for Finance: Deep Portfolios , 2016 .

[5]  A. Shorrocks Decomposition procedures for distributional analysis: a unified framework based on the Shapley value , 2013 .

[6]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[7]  P. Dworzynski,et al.  Nationwide prediction of type 2 diabetes comorbidities , 2019, Scientific Reports.

[8]  Art B. Owen,et al.  On Shapley Value for Measuring Importance of Dependent Inputs , 2016, SIAM/ASA J. Uncertain. Quantification.

[9]  Erik Strumbelj,et al.  Explaining prediction models and individual predictions with feature contributions , 2014, Knowledge and Information Systems.

[10]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[11]  Jin Li,et al.  Using cooperative game theory to optimize the feature selection problem , 2012, Neurocomputing.

[12]  L. Shapley A Value for n-person Games , 1988 .

[13]  Eytan Ruppin,et al.  Feature Selection via Coalitional Game Theory , 2007, Neural Computation.

[14]  Fatemeh Afghah,et al.  A Feature Selection Method Based on Shapley Value to False Alarm Reduction in ICUs A Genetic-Algorithm Approach , 2018, 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[15]  Constantin F. Aliferis,et al.  HITON: A Novel Markov Blanket Algorithm for Optimal Variable Selection , 2003, AMIA.

[16]  Jeroen J. Bax,et al.  Machine learning of clinical variables and coronary artery calcium scoring for the prediction of obstructive coronary artery disease on coronary computed tomography angiography: analysis from the CONFIRM registry. , 2019, European heart journal.

[17]  Constantin F. Aliferis,et al.  Towards Principled Feature Selection: Relevancy, Filters and Wrappers , 2003 .

[18]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[19]  Constantin F. Aliferis,et al.  Algorithms for discovery of multiple Markov boundaries , 2013, J. Mach. Learn. Res..

[20]  E. Segal,et al.  Prediction of gestational diabetes based on nationwide electronic health records , 2020, Nature Medicine.

[21]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[22]  Blaz Zupan,et al.  Predictive data mining in clinical medicine: Current issues and guidelines , 2008, Int. J. Medical Informatics.

[23]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.