Efficient nonparametric statistical inference on population feature importance using Shapley values

The true population-level importance of a variable in a prediction task provides useful knowledge about the underlying data-generating mechanism and can help in deciding which measurements to collect in subsequent experiments. Valid statistical inference on this importance is a key component in understanding the population of interest. We present a computationally efficient procedure for estimating and obtaining valid statistical inference on the Shapley Population Variable Importance Measure (SPVIM). Although the computational complexity of the true SPVIM scales exponentially with the number of variables, we propose an estimator based on randomly sampling only $\Theta(n)$ feature subsets given $n$ observations. We prove that our estimator converges at an asymptotically optimal rate. Moreover, by deriving the asymptotic distribution of our estimator, we construct valid confidence intervals and hypothesis tests. Our procedure has good finite-sample performance in simulations, and for an in-hospital mortality prediction task produces similar variable importance estimates when different machine learning algorithms are applied.

[1]  Daniel Gómez,et al.  Polynomial calculation of the Shapley value based on sampling , 2009, Comput. Oper. Res..

[2]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[3]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[4]  U. Grömping Estimators of Relative Importance in Linear Regression Based on Variance Decomposition , 2007 .

[5]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[6]  G. Moody,et al.  Predicting in-hospital mortality of ICU patients: The PhysioNet/Computing in cardiology challenge 2012 , 2012, 2012 Computing in Cardiology.

[7]  Andrew J Dunning A model for immunological correlates of protection. , 2006, Statistics in medicine.

[8]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[9]  Chandan Singh,et al.  Definitions, methods, and applications in interpretable machine learning , 2019, Proceedings of the National Academy of Sciences.

[10]  Erik Strumbelj,et al.  Explaining prediction models and individual predictions with feature contributions , 2014, Knowledge and Information Systems.

[11]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[12]  G. David Garson,et al.  Interpreting neural-network connection weights , 1991 .

[13]  Marco Carone,et al.  Nonparametric variable importance assessment using machine learning techniques , 2020, Biometrics.

[14]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  Kim F. Nimon,et al.  Interpreting Multiple Linear Regression: A Guidebook of Variable Importance , 2012 .

[18]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[19]  Brian D. Williamson,et al.  A unified approach for inference on algorithm-agnostic variable importance , 2020 .

[20]  A. Charnes,et al.  Extremal Principle Solutions of Games in Characteristic Function Form: Core, Chebychev and Shapley Value Generalizations , 1988 .

[21]  Jean Feng,et al.  Nonparametric variable importance using an augmented neural network with multi-task learning , 2018, ICML.

[22]  Hugh Chen,et al.  From local explanations to global understanding with explainable AI for trees , 2020, Nature Machine Intelligence.

[23]  L. Shapley A Value for n-person Games , 1988 .

[24]  Art B. Owen,et al.  On Shapley Value for Measuring Importance of Dependent Inputs , 2016, SIAM/ASA J. Uncertain. Quantification.

[25]  Scott Lundberg,et al.  Understanding Global Feature Contributions Through Additive Importance Measures , 2020, ArXiv.

[26]  Sameer Singh,et al.  “Why Should I Trust You?”: Explaining the Predictions of Any Classifier , 2016, NAACL.

[27]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[28]  Chandan Singh,et al.  Definitions, methods, and applications in interpretable machine learning , 2019, Proceedings of the National Academy of Sciences.