Statistical stability indices for LIME: obtaining reliable explanations for Machine Learning models

Nowadays we are witnessing a transformation of the business processes towards a more computation driven approach. The ever increasing usage of Machine Learning techniques is the clearest example of such trend. This sort of revolution is often providing advantages, such as an increase in prediction accuracy and a reduced time to obtain the results. However, these methods present a major drawback: it is very difficult to understand on what grounds the algorithm took the decision. To address this issue we consider the LIME method. We give a general background on LIME then, we focus on the stability issue: employing the method repeated times, under the same conditions, may yield to different explanations. Two complementary indices are proposed, to measure LIME stability. It is important for the practitioner to be aware of the issue, as well as to have a tool for spotting it. Stability guarantees LIME explanations to be reliable, therefore a stability assessment, made through the proposed indices, is crucial. As a case study, we apply both Machine Learning and classical statistical techniques to Credit Risk data. We test LIME on the Machine Learning algorithm and check its stability. Eventually, we examine the goodness of the explanations returned.

[1]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[2]  Naimul Mefraz Khan,et al.  DLIME: A Deterministic Local Interpretable Model-Agnostic Explanations Approach for Computer-Aided Diagnosis Systems , 2019, ArXiv.

[3]  Tommi S. Jaakkola,et al.  On the Robustness of Interpretability Methods , 2018, ArXiv.

[4]  John K. C. Kingston Using artificial intelligence to support compliance with the general data protection regulation , 2017, Artificial Intelligence and Law.

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Daniel W. Apley,et al.  Visualizing the effects of predictor variables in black box supervised learning models , 2016, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[7]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[8]  Federico Chesani,et al.  OptiLIME: Optimized LIME Explanations for Diagnostic Computer Algorithms , 2020, CIKM.

[9]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[10]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[11]  Sharath M. Shankaranarayana,et al.  ALIME: Autoencoder Based Approach for Local Interpretability , 2019, IDEAL.

[12]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[13]  J. Norris Appendix: probability and measure , 1997 .

[14]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[15]  Fabio Parlapiano,et al.  Corporate default forecasting with machine learning , 2020, Expert Syst. Appl..

[16]  Emil Pitkin,et al.  Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation , 2013, 1309.6392.

[17]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[18]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[19]  Christoph Molnar,et al.  Interpretable Machine Learning , 2020 .

[20]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[21]  D. Hand Modelling consumer credit risk , 2001 .

[22]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[23]  Sameer Singh,et al.  “Why Should I Trust You?”: Explaining the Predictions of Any Classifier , 2016, NAACL.

[24]  W. Greene,et al.  计量经济分析 = Econometric analysis , 2009 .

[25]  Alex R. Piquero,et al.  Testing for the Equality of Maximum-Likelihood Regression Coefficients Between Two Independent Equations , 1998 .

[26]  Wessel N. van Wieringen,et al.  Lecture notes on ridge regression , 2015, 1509.09169.

[27]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[28]  Giles Hooker,et al.  Interpreting Models via Single Tree Approximation , 2016, 1610.09036.

[29]  A. Agresti Foundations of Linear and Generalized Linear Models , 2015 .