Computationally Efficient Feature Significance and Importance for Machine Learning Models

We develop a simple and computationally efficient significance test for the features of a machine learning model. Our forward-selection approach applies to any model specification, learning task and variable type. The test is non-asymptotic, straightforward to implement, and does not require model refitting. It identifies the statistically significant features as well as feature interactions of any order in a hierarchical manner, and generates a model-free notion of feature importance. Numerical results illustrate its performance.

[1]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[2]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[3]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[4]  Christopher F. Parmeter,et al.  Applied Nonparametric Econometrics , 2015 .

[5]  Kay Giesecke,et al.  Significance Tests for Neural Networks , 2018, ArXiv.

[6]  Lucas Janson,et al.  Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection , 2016, 1610.02351.

[7]  Jesse Thomason,et al.  Interpreting Black Box Models via Hypothesis Testing , 2019, FODS.

[8]  I-Cheng Yeh,et al.  The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients , 2009, Expert Syst. Appl..

[9]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[10]  Yair Zick,et al.  Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[11]  A. Mood,et al.  The statistical sign test. , 1946, Journal of the American Statistical Association.

[12]  Paulo Cortez,et al.  Opening black box Data Mining models using Sensitivity Analysis , 2011, 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[13]  Tao Xiong,et al.  Sensitivity based Neural Networks Explanations , 2018, ArXiv.

[14]  Q. Vuong Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses , 1989 .

[15]  Colin Wei,et al.  Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel , 2018, NeurIPS.

[16]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[17]  Emmanuel J. Candes,et al.  Robust inference with knockoffs , 2018, The Annals of Statistics.

[18]  Alessandro Rinaldo,et al.  Distribution-Free Predictive Inference for Regression , 2016, Journal of the American Statistical Association.

[19]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[20]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[21]  Jeffrey S. Racine,et al.  Consistent Significance Testing for Nonparametric Regression , 1997 .

[22]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[23]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[24]  Erik Strumbelj,et al.  An Efficient Explanation of Individual Classifications using Game Theory , 2010, J. Mach. Learn. Res..

[25]  Marco Carone,et al.  Nonparametric variable importance assessment using machine learning techniques , 2020, Biometrics.

[26]  Haoran Zhang,et al.  The Holdout Randomization Test: Principled and Easy Black Box Feature Selection , 2018, 1811.00645.

[27]  Qiang Liu,et al.  On the Margin Theory of Feedforward Neural Networks , 2018, ArXiv.

[28]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[29]  Julian D. Olden,et al.  Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks , 2002 .

[30]  Jesse Thomason,et al.  Interpreting Black Box Models with Statistical Guarantees , 2019, ArXiv.