Improving the Quality of Explanations with Local Embedding Perturbations

Classifier explanations have been identified as a crucial component of knowledge discovery. Local explanations evaluate the behavior of a classifier in the vicinity of a given instance. A key step in this approach is to generate synthetic neighbors of the given instance. This neighbor generation process is challenging and it has considerable impact on the quality of explanations. To assess quality of generated neighborhoods, we propose a local intrinsic dimensionality (LID) based locality constraint. Based on this, we then propose a new neighborhood generation method. Our method first fits a local embedding/subspace around a given instance using the LID of the test instance as the target dimensionality, then generates neighbors in the local embedding and projects them back to the original space. Experimental results show that our method generates more realistic neighborhoods and consequently better explanations. It can be used in combination with existing local explanation algorithms.

[1]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[2]  Seth Flaxman,et al.  EU regulations on algorithmic decision-making and a "right to explanation" , 2016, ArXiv.

[3]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[4]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[5]  Ken-ichi Kawarabayashi,et al.  Extreme-value-theoretic estimation of local intrinsic dimensionality , 2018, Data Mining and Knowledge Discovery.

[6]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[7]  Michael E. Houle,et al.  Local Intrinsic Dimensionality II: Multivariate Analysis and Distributional Support , 2017, SISAP.

[8]  Michael E. Houle,et al.  Local Intrinsic Dimensionality I: An Extreme-Value-Theoretic Foundation for Similarity Applications , 2017, SISAP.

[9]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Panagiotis Papapetrou,et al.  GoldenEye++: A Closer Look into the Black Box , 2015, SLDS.

[11]  Wouter Duivesteijn,et al.  Understanding Where Your Classifier Does (Not) Work -- The SCaPE Model Class for EMM , 2014, 2014 IEEE International Conference on Data Mining.

[12]  Marie-Jeanne Lesot,et al.  Defining Locality for Surrogates in Post-hoc Interpretablity , 2018, ICML 2018.

[13]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[14]  Panagiotis Papapetrou,et al.  A peek into the black box: exploring classifiers by randomization , 2014, Data Mining and Knowledge Discovery.

[15]  Glenn Fung,et al.  Rule extraction from linear support vector machines , 2005, KDD '05.

[16]  Suresh Venkatasubramanian,et al.  Auditing black-box models for indirect influence , 2016, Knowledge and Information Systems.

[17]  Carlos Guestrin,et al.  Model-Agnostic Interpretability of Machine Learning , 2016, ArXiv.

[18]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[19]  Carlos Guestrin,et al.  Nothing Else Matters: Model-Agnostic Explanations By Identifying Prediction Invariance , 2016, ArXiv.

[20]  Marko Robnik-Sikonja,et al.  Explaining Classifications For Individual Instances , 2008, IEEE Transactions on Knowledge and Data Engineering.

[21]  Franz J. Kurfess Neural Networks and Structured Knowledge: Rule Extraction and Applications , 2004, Applied Intelligence.

[22]  James Bailey,et al.  Measuring dependency via intrinsic dimensionality , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[23]  Ken-ichi Kawarabayashi,et al.  Intrinsic Dimensionality Estimation within Tight Localities , 2019, SDM.

[24]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[25]  Joachim Diederich,et al.  Eclectic Rule-Extraction from Support Vector Machines , 2005 .

[26]  Alper Kürşat Uysal,et al.  Feature Selection for Comment Spam Filtering on YouTube , 2018 .

[27]  Bart Baesens,et al.  Comprehensible Credit Scoring Models Using Rule Extraction from Support Vector Machines , 2007, Eur. J. Oper. Res..