Evaluating Explanation Without Ground Truth in Interpretable Machine Learning

Interpretable Machine Learning (IML) has become increasingly important in many real-world applications, such as autonomous cars and medical diagnosis, where explanations are significantly preferred to help people better understand how machine learning systems work and further enhance their trust towards systems. However, due to the diversified scenarios and subjective nature of explanations, we rarely have the ground truth for benchmark evaluation in IML on the quality of generated explanations. Having a sense of explanation quality not only matters for assessing system boundaries, but also helps to realize the true benefits to human users in practical settings. To benchmark the evaluation in IML, in this article, we rigorously define the problem of evaluating explanations, and systematically review the existing efforts from state-of-the-arts. Specifically, we summarize three general aspects of explanation (i.e., generalizability, fidelity and persuasibility) with formal definitions, and respectively review the representative methodologies for each of them under different tasks. Further, a unified evaluation framework is designed according to the hierarchical needs from developers and end-users, which could be easily adopted for different scenarios in practice. In the end, open problems are discussed, and several limitations of current evaluation techniques are raised for future explorations.

[1]  Luciano Floridi,et al.  Transparent, explainable, and accountable AI for robotics , 2017, Science Robotics.

[2]  Bruce G. Buchanan,et al.  Principles of Rule-Based Expert Systems , 1982, Adv. Comput..

[3]  Raymond J. Mooney,et al.  Explaining Recommendations: Satisfaction vs. Promotion , 2005 .

[4]  Bolei Zhou,et al.  Interpreting Deep Visual Representations via Network Dissection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Lalana Kagal,et al.  Explaining Explanations: An Overview of Interpretability of Machine Learning , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[6]  Weng-Keen Wong,et al.  Principles of Explanatory Debugging to Personalize Interactive Machine Learning , 2015, IUI.

[7]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[8]  Yan Liu,et al.  Distilling Knowledge from Deep Networks with Applications to Healthcare Domain , 2015, ArXiv.

[9]  Ye Zhang,et al.  Rationale-Augmented Convolutional Neural Networks for Text Classification , 2016, EMNLP.

[10]  Emily Chen,et al.  How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation , 2018, ArXiv.

[11]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[12]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[14]  Abubakar Abid,et al.  Interpretation of Neural Networks is Fragile , 2017, AAAI.

[15]  Fan Yang,et al.  On Attribution of Recurrent Neural Network Predictions via Additive Decomposition , 2019, WWW.

[16]  Chandan Singh,et al.  Definitions, methods, and applications in interpretable machine learning , 2019, Proceedings of the National Academy of Sciences.

[17]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[18]  Tommi S. Jaakkola,et al.  Learning Corresponded Rationales for Text Matching , 2018 .

[19]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[20]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[21]  Simone Stumpf,et al.  User Trust in Intelligent Systems: A Journey Over Time , 2016, IUI.

[22]  Mohan S. Kankanhalli,et al.  Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda , 2018, CHI.

[23]  Martin Wattenberg,et al.  TCAV: Relative concept importance testing with Linear Concept Activation Vectors , 2018 .

[24]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[25]  Mark O. Riedl,et al.  Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations , 2017, AIES.

[26]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[27]  Fan Yang,et al.  Towards Interpretation of Recommender Systems with Sorted Explanation Paths , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[28]  Markus H. Gross,et al.  A unified view of gradient-based attribution methods for Deep Neural Networks , 2017, NIPS 2017.

[29]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[30]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Simone Stumpf,et al.  Explaining Smart Heating Systems to Discourage Fiddling with Optimized Behavior , 2018, IUI Workshops.

[32]  Xia Hu,et al.  Techniques for interpretable machine learning , 2018, Commun. ACM.

[33]  Richard L. Phillips,et al.  Interpretable Active Learning , 2018, FAT.

[34]  Chih-Kuan Yeh,et al.  On the (In)fidelity and Sensitivity for Explanations. , 2019, 1901.09392.

[35]  Dianne P. O'Leary,et al.  Interpreting Neural Networks Using Flip Points , 2019, ArXiv.

[36]  Qingquan Song,et al.  Towards Explanation of DNN-based Prediction with Guided Feature Inversion , 2018, KDD.

[37]  Adrian Weller,et al.  Transparency: Motivations and Challenges , 2019, Explainable AI.

[38]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[39]  Dan Conway,et al.  How to Recommend?: User Trust Factors in Movie Recommender Systems , 2017, IUI.

[40]  Regina Barzilay,et al.  Deriving Machine Attention from Human Rationales , 2018, EMNLP.

[41]  Bernease Herman,et al.  The Promise and Peril of Human Evaluation for Model Interpretability , 2017, ArXiv.

[42]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[43]  Shi Feng,et al.  What can AI do for me?: evaluating machine learning interpretations in cooperative play , 2019, IUI.

[44]  Christoph Molnar,et al.  Interpretable Machine Learning , 2020 .

[45]  Trevor Darrell,et al.  Multimodal Explanations: Justifying Decisions and Pointing to the Evidence , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Stephen Muggleton,et al.  How Does Predicate Invention Affect Human Comprehensibility? , 2016, ILP.

[47]  Yun Fu,et al.  Tell Me Where to Look: Guided Attention Inference Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Geoffrey E. Hinton,et al.  Distilling a Neural Network Into a Soft Decision Tree , 2017, CEx@AI*IA.

[49]  Bradley Hayes,et al.  Interpretable models for fast activity recognition and anomaly explanation during collaborative robotics tasks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[50]  Vineeth N. Balasubramanian,et al.  Neural Network Attributions: A Causal Perspective , 2019, ICML.

[51]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[52]  Mouzhi Ge,et al.  How should I explain? A comparison of different explanation types for recommender systems , 2014, Int. J. Hum. Comput. Stud..

[53]  Quanshi Zhang,et al.  Interpreting CNNs via Decision Trees , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[55]  Yarin Gal,et al.  Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[56]  William Stafford Noble,et al.  DeepPINK: reproducible feature selection in deep neural networks , 2018, NeurIPS.

[57]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[58]  Friedrich Rippmann,et al.  Interpretable Deep Learning in Drug Discovery , 2019, Explainable AI.

[59]  Jure Leskovec,et al.  Interpretable Decision Sets: A Joint Framework for Description and Prediction , 2016, KDD.

[60]  Dympna O'Sullivan,et al.  The Role of Explanations on Trust and Reliance in Clinical Decision Support Systems , 2015, 2015 International Conference on Healthcare Informatics.

[61]  Kenney Ng,et al.  Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models , 2016, CHI.

[62]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Quanshi Zhang,et al.  Interpreting CNN knowledge via an Explanatory Graph , 2017, AAAI.

[64]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[65]  Shi Feng,et al.  Interpreting Neural Networks with Nearest Neighbors , 2018, BlackboxNLP@EMNLP.

[66]  Alex Pentland,et al.  Fair, Transparent, and Accountable Algorithmic Decision-making Processes , 2017, Philosophy & Technology.

[67]  Todd Kulesza,et al.  Tell me more?: the effects of mental model soundness on personalizing an intelligent agent , 2012, CHI.

[68]  Yang Liu,et al.  Actionable Recourse in Linear Classification , 2018, FAT.

[69]  Olfa Nasraoui,et al.  Using Explainability for Constrained Matrix Factorization , 2017, RecSys.

[70]  Chandan Singh,et al.  Definitions, methods, and applications in interpretable machine learning , 2019, Proceedings of the National Academy of Sciences.

[71]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Senthil Mani,et al.  Explaining Deep Learning Models using Causal Inference , 2018, ArXiv.

[73]  Osbert Bastani,et al.  Learning Interpretable Models with Causal Guarantees , 2019, ArXiv.

[74]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Hongxia Yang,et al.  Adversarial Detection with Model Interpretation , 2018, KDD.