Resisting Out-of-Distribution Data Problem in Perturbation of XAI

With the rapid development of eXplainable Artificial Intelligence (XAI), perturbation-based XAI algorithms have become quite popular due to their effectiveness and ease of implementation. The vast majority of perturbation-based XAI techniques face the challenge of Out-of-Distribution (OoD) data – an artifact of randomly perturbed data becoming inconsistent with the original dataset. OoD data leads to the overconfidence problem in model predictions, making the existing XAI approaches unreliable. To our best knowledge, OoD data problem in perturbation-based XAI algorithms has not been adequately addressed in the literature. In this work, we address this OoD data problem by designing an additional module quantifying the affinity between the perturbed data and the original dataset distribution, which is integrated into the process of aggregation. Our solution is shown to be compatible with the most popular perturbation-based XAI algorithms, such as RISE, OCCLUSION, and LIME. Experiments have confirmed that our methods demonstrate a significant improvement on general cases using both computational and cognitive metrics. Especially in the case of degradation, our proposed approach demonstrates outstanding performance comparing to baselines. Besides, our solution also resolves a fundamental problem with the faithfulness indicator, a commonly used evaluation metric of XAI algorithms that appears to be sensitive to the OoD issue.

[1]  David Duvenaud,et al.  Explaining Image Classifiers by Counterfactual Generation , 2018, ICLR.

[2]  James J. Little,et al.  Does Your Model Know the Digit 6 Is Not a Cat? A Less Biased Evaluation of "Outlier" Detectors , 2018, ArXiv.

[3]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[4]  Amit Dhurandhar,et al.  One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques , 2019, ArXiv.

[5]  Lalana Kagal,et al.  Explaining Explanations: An Overview of Interpretability of Machine Learning , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[6]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[7]  Gary Klein,et al.  Metrics for Explainable AI: Challenges and Prospects , 2018, ArXiv.

[8]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[9]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[10]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[11]  Kate Saenko,et al.  RISE: Randomized Input Sampling for Explanation of Black-box Models , 2018, BMVC.

[12]  Hongxia Jin,et al.  Generalized ODIN: Detecting Out-of-Distribution Image Without Learning From Out-of-Distribution Data , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Oluwasanmi Koyejo,et al.  Examples are not enough, learn to criticize! Criticism for Interpretability , 2016, NIPS.

[14]  Ghassan Hamarneh,et al.  EUCA: A Practical Prototyping Framework towards End-User-Centered Explainable Artificial Intelligence , 2021, ArXiv.

[15]  S. Gregor,et al.  Measuring Human-Computer Trust , 2000 .

[16]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Michael Georgiopoulos,et al.  A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes , 2010, Data Mining and Knowledge Discovery.

[19]  Abubakar Abid,et al.  Interpretation of Neural Networks is Fragile , 2017, AAAI.

[20]  Caleb Chen Cao,et al.  Roadmap of Designing Cognitive Metrics for Explainable Artificial Intelligence (XAI) , 2021, ArXiv.

[21]  Xun Xue,et al.  A Survey of Data-Driven and Knowledge-Aware eXplainable AI , 2020, IEEE Transactions on Knowledge and Data Engineering.

[22]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[23]  R. Srikant,et al.  Principled Detection of Out-of-Distribution Examples in Neural Networks , 2017, ArXiv.

[24]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[25]  Hanno Gottschalk,et al.  Classification Uncertainty of Deep Neural Networks Based on Gradient Information , 2018, ANNPR.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Seong-Whan Lee,et al.  Interpreting Undesirable Pixels for Image Classification on Black-Box Models , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[28]  GiannottiFosca,et al.  A Survey of Methods for Explaining Black Box Models , 2018 .

[29]  Or Biran,et al.  Explanation and Justification in Machine Learning : A Survey Or , 2017 .

[30]  Amina Adadi,et al.  Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) , 2018, IEEE Access.

[31]  Keith Sullivan,et al.  Finding Anomalies with Generative Adversarial Networks for a Patrolbot , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[32]  Patrick Dattalo,et al.  Statistical Power Analysis , 2008 .

[33]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Wenhu Chen,et al.  Enhancing the Robustness of Prior Network in Out-of-Distribution Detection , 2018, ArXiv.

[35]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[36]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.

[37]  Qian Yang,et al.  Designing Theory-Driven User-Centric Explainable AI , 2019, CHI.

[38]  Andreas Holzinger,et al.  Measuring the Quality of Explanations: The System Causability Scale (SCS) , 2020, KI - Künstliche Intelligenz.

[39]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[40]  Maya R. Gupta,et al.  To Trust Or Not To Trust A Classifier , 2018, NeurIPS.

[41]  Rick Salay,et al.  Improving Reconstruction Autoencoder Out-of-distribution Detection with Mahalanobis Distance , 2018, ArXiv.

[42]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Mark R. Lehto,et al.  Foundations for an Empirically Determined Scale of Trust in Automated Systems , 2000 .

[44]  Kristin E. Schaefer,et al.  The Perception And Measurement Of Human-robot Trust , 2013 .

[45]  Andrea Vedaldi,et al.  Understanding Deep Networks via Extremal Perturbations and Smooth Masks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  Thomas G. Dietterich,et al.  Open Category Detection with PAC Guarantees , 2018, ICML.

[47]  Filip Karlo Dosilovic,et al.  Explainable artificial intelligence: A survey , 2018, 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[48]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[49]  Jasper Snoek,et al.  Likelihood Ratios for Out-of-Distribution Detection , 2019, NeurIPS.

[50]  Ran El-Yaniv,et al.  Deep Anomaly Detection Using Geometric Transformations , 2018, NeurIPS.

[51]  Wei Bai,et al.  Quantitative Evaluations on Saliency Methods: An Experimental Study , 2020, ArXiv.

[52]  Anh Nguyen,et al.  Explaining Image Classifiers by Removing Input Features Using Generative Models , 2020, ACCV.

[53]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[54]  Sameer Singh,et al.  Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods , 2020, AIES.

[55]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[56]  Xia Hu,et al.  Techniques for interpretable machine learning , 2018, Commun. ACM.

[57]  Kiyoharu Aizawa,et al.  Unsupervised Out-of-Distribution Detection by Maximum Classifier Discrepancy , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[58]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[59]  Béatrice Cahour,et al.  Does projection into use improve trust and exploration? An example with a cruise control system , 2009 .

[60]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[61]  Tommi S. Jaakkola,et al.  Towards Robust Interpretability with Self-Explaining Neural Networks , 2018, NeurIPS.

[62]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.