XAI Handbook: Towards a Unified Framework for Explainable AI

The field of explainable AI (XAI) has quickly become a thriving and prolific community. However, a silent, recurrent and acknowledged issue in this area is the lack of consensus regarding its terminology. In particular, each new contribution seems to rely on its own (and often intuitive) version of terms like "explanation" and "interpretation". Such disarray encumbers the consolidation of advances in the field towards the fulfillment of scientific and regulatory demands e.g., when comparing methods or establishing their compliance w.r.t. biases and fairness constraints. We propose a theoretical framework that not only provides concrete definitions for these terms, but it also outlines all steps necessary to produce explanations and interpretations. The framework also allows for existing contributions to be re-contextualized such that their scope can be measured, thus making them comparable to other methods. We show that this framework is compliant with desiderata on explanations, on interpretability and on evaluation metrics. We present a use-case showing how the framework can be used to compare LIME, SHAP and MDNet, establishing their advantages and shortcomings. Finally, we discuss relevant trends in XAI as well as recommendations for future work, all from the standpoint of our framework.

[1]  Paolo Giudici,et al.  Explainable Machine Learning in Credit Risk Management , 2019, Computational Economics.

[2]  Timothy D. King,et al.  Human color perception, cognition, and culture: why red is always red , 2005, IS&T/SPIE Electronic Imaging.

[3]  Aleksander Madry,et al.  On Evaluating Adversarial Robustness , 2019, ArXiv.

[4]  Bjorn Ommer,et al.  A Disentangling Invertible Interpretation Network for Explaining Latent Representations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[6]  Peter A. Flach,et al.  Explainability fact sheets: a framework for systematic assessment of explainable approaches , 2019, FAT*.

[7]  David Duvenaud,et al.  Explaining Image Classifiers by Counterfactual Generation , 2018, ICLR.

[8]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[9]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[10]  Ari S. Morcos,et al.  Towards falsifiable interpretability research , 2020, ArXiv.

[11]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[12]  Marcel van Gerven,et al.  Explainable Deep Learning: A Field Guide for the Uninitiated , 2020, J. Artif. Intell. Res..

[13]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[14]  Jure Leskovec,et al.  Faithful and Customizable Explanations of Black Box Models , 2019, AIES.

[15]  Francisco Herrera,et al.  Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI , 2020, Inf. Fusion.

[16]  Cynthia Rudin,et al.  Optimized Scoring Systems: Toward Trust in Machine Learning for Healthcare and Criminal Justice , 2018, Interfaces.

[17]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.

[18]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[19]  Abubakar Abid,et al.  Interpretation of Neural Networks is Fragile , 2017, AAAI.

[20]  Bernhard Schölkopf,et al.  Discovering Causal Signals in Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Or Biran,et al.  Explanation and Justification in Machine Learning : A Survey Or , 2017 .

[22]  Tim Miller,et al.  Explainable AI: Beware of Inmates Running the Asylum Or: How I Learnt to Stop Worrying and Love the Social and Behavioural Sciences , 2017, ArXiv.

[23]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Bettina Finzel,et al.  Mutual Explanations for Cooperative Decision Making in Medicine , 2020, KI - Künstliche Intelligenz.

[25]  Klaus-Robert Müller,et al.  Explanations can be manipulated and geometry is to blame , 2019, NeurIPS.

[26]  Samy Bengio,et al.  Large-Scale Object Classification Using Label Relation Graphs , 2014, ECCV.

[27]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[28]  Ilya Feige,et al.  Shapley-based explainability on the data manifold , 2020, ArXiv.

[29]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[30]  Andreas Holzinger,et al.  The European Legal Framework for Medical AI , 2020, CD-MAKE.

[31]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[32]  Lin Yang,et al.  MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Alexander D'Amour,et al.  Underspecification Presents Challenges for Credibility in Modern Machine Learning , 2020, J. Mach. Learn. Res..

[34]  Andrea Omicini,et al.  An Abstract Framework for Agent-Based Explanations in AI , 2020, AAMAS.

[35]  Marley Maria Bernardes Rebuzzi Vellasco,et al.  Evolved Explainable Classifications for Lymph Node Metastases , 2020, Neural Networks.

[36]  Benoît Frénay,et al.  Interpretability of machine learning models and representations: an introduction , 2016, ESANN.

[37]  Sameer Singh,et al.  Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods , 2020, AIES.

[38]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[39]  John F. Sowa,et al.  Conceptual Structures: Information Processing in Mind and Machine , 1983 .

[41]  Aditya K. Ghose,et al.  Explainable Software Analytics , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: New Ideas and Emerging Technologies Results (ICSE-NIER).

[42]  Been Kim,et al.  Considerations for Evaluation and Generalization in Interpretable Machine Learning , 2018 .

[43]  Tommi S. Jaakkola,et al.  Towards Robust Interpretability with Self-Explaining Neural Networks , 2018, NeurIPS.

[44]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[45]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Eric P. Xing,et al.  Contextual Explanation Networks , 2017, J. Mach. Learn. Res..

[47]  T. Lombrozo The structure and function of explanations , 2006, Trends in Cognitive Sciences.

[48]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[49]  Andreas Dengel,et al.  What do Deep Networks Like to See? , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  L. Longo,et al.  Explainable Artificial Intelligence: a Systematic Review , 2020, ArXiv.

[51]  John R. Josephson,et al.  Abductive inference : computation, philosophy, technology , 1994 .

[52]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[53]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[54]  Marcel van Gerven,et al.  Explanation Methods in Deep Learning: Users, Values, Concerns and Challenges , 2018, ArXiv.

[55]  Je-Won Kang,et al.  Intrusion Detection System Using Deep Neural Network for In-Vehicle Network Security , 2016, PloS one.