论文信息 - One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques

One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques

As artificial intelligence and machine learning algorithms make further inroads into society, calls are increasing from multiple stakeholders for these algorithms to explain their outputs. At the same time, these stakeholders, whether they be affected citizens, government regulators, domain experts, or system developers, present different requirements for explanations. Toward addressing these needs, we introduce AI Explainability 360 (this http URL), an open-source software toolkit featuring eight diverse and state-of-the-art explainability methods and two evaluation metrics. Equally important, we provide a taxonomy to help entities requiring explanations to navigate the space of explanation methods, not only those in the toolkit but also in the broader literature on explainability. For data scientists and other users of the toolkit, we have implemented an extensible software architecture that organizes methods according to their place in the AI modeling pipeline. We also discuss enhancements to bring research innovations closer to consumers of explanations, ranging from simplified, more accessible versions of algorithms, to tutorials and an interactive web demo to introduce AI explainability to different audiences and application domains. Together, our toolkit and taxonomy can help identify gaps where more explainability methods are needed and provide a platform to incorporate them as they are developed.

[1] Amit Dhurandhar,et al. Big Data System for Analyzing Risky Procurement Entities , 2015, KDD.

[2] Claire Cardie,et al. Automatically Generating Annotator Rationales to Improve Sentiment Classification , 2010, ACL.

[3] Sanjeeb Dash,et al. Generalized Linear Rule Models , 2019, ICML.

[4] Steven M. Drucker,et al. Gamut: A Design Probe to Understand How Data Scientists Understand Machine Learning Models , 2019, CHI.

[5] Franco Turini,et al. A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[6] Carlos Guestrin,et al. Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[7] Jason Yosinski,et al. Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks , 2016, ArXiv.

[8] Tommi S. Jaakkola,et al. Towards Robust Interpretability with Self-Explaining Neural Networks , 2018, NeurIPS.

[9] Cynthia Rudin,et al. A Bayesian Framework for Learning Rule Sets for Interpretable Classification , 2017, J. Mach. Learn. Res..

[10] Amit Dhurandhar,et al. TED: Teaching AI to Explain its Decisions , 2018, AIES.

[11] Jaime S. Cardoso,et al. Machine Learning Interpretability: A Survey on Methods and Metrics , 2019, Electronics.

[12] Cynthia Rudin,et al. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[13] Pieter Abbeel,et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[14] Chris Russell,et al. Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[15] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[16] Martin Wattenberg,et al. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[17] Trevor Darrell,et al. Generating Visual Explanations , 2016, ECCV.

[18] Julia Powles,et al. "Meaningful Information" and the Right to Explanation , 2017, FAT.

[19] Been Kim,et al. Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[20] Brad Boehmke,et al. Interpretable Machine Learning , 2019 .

[21] Regina Barzilay,et al. Rationalizing Neural Predictions , 2016, EMNLP.

[22] R. Tibshirani,et al. Generalized Additive Models , 1991 .

[23] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[24] Daniel S. Weld,et al. Intelligible Artificial Intelligence , 2018, ArXiv.

[25] Amit Dhurandhar,et al. Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives , 2018, NeurIPS.

[26] Kush R. Varshney,et al. Learning sparse two-level boolean rules , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[27] Abhishek Kumar,et al. Variational Inference of Disentangled Latent Concepts from Unlabeled Observations , 2017, ICLR.

[28] Luciano Floridi,et al. Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation , 2017 .

[29] Percy Liang,et al. Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[30] Hendrik Strobelt,et al. DEEPLING: A VISUAL INTERPRETABILITY SYSTEM FOR CONVOLUTIONAL NEURAL NETWORKS , 2019 .

[31] Oluwasanmi Koyejo,et al. Examples are not enough, learn to criticize! Criticism for Interpretability , 2016, NIPS.

[32] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[33] Kush R. Varshney,et al. Trustworthy machine learning and artificial intelligence , 2019, XRDS.

[34] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[35] Quanshi Zhang,et al. Interpretable Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36] Amit Dhurandhar,et al. Generating Contrastive Explanations with Monotonic Attribute Functions , 2019, ArXiv.

[37] Michael Hind,et al. Explaining explainable AI , 2019, XRDS.

[38] Kenney Ng,et al. Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models , 2016, CHI.