Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods
暂无分享,去创建一个
[1] Cynthia Rudin,et al. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.
[2] J. Reidenberg,et al. Accountable Algorithms , 2016 .
[3] Chris Russell,et al. Explaining Explanations in AI , 2018, FAT.
[4] Taesup Moon,et al. Fooling Neural Network Interpretations via Adversarial Model Manipulation , 2019, NeurIPS.
[5] Scott Lundberg,et al. A Unified Approach to Interpreting Model Predictions , 2017, NIPS.
[6] Alok Baveja,et al. Computing , Artificial Intelligence and Information Technology A data-driven software tool for enabling cooperative information sharing among police departments , 2002 .
[7] Rich Caruana,et al. Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation , 2017, AIES.
[8] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.
[9] Been Kim,et al. Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.
[10] Klaus-Robert Müller,et al. Explanations can be manipulated and geometry is to blame , 2019, NeurIPS.
[11] Zachary Chase Lipton. The mythos of model interpretability , 2016, ACM Queue.
[12] Sébastien Gambs,et al. Fairwashing: the risk of rationalization , 2019, ICML.
[13] Solon Barocas,et al. The Intuitive Appeal of Explainable Machines , 2018 .
[14] Sherif Sakr,et al. On the interpretability of machine learning-based model for predicting hypertension , 2019, BMC Medical Informatics and Decision Making.
[15] Abubakar Abid,et al. Interpretation of Neural Networks is Fragile , 2017, AAAI.
[16] John W. Paisley,et al. Global Explanations of Neural Networks: Mapping the Landscape of Predictions , 2019, AIES.
[17] Carlos Guestrin,et al. Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.
[18] Martin Wattenberg,et al. TCAV: Relative concept importance testing with Linear Concept Activation Vectors , 2018 .
[19] Corey M. Hudson,et al. Mapping chemical performance on molecular structures using locally interpretable explanations , 2016, 1611.07443.
[20] Martin Wattenberg,et al. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.
[21] Sameer Singh,et al. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier , 2016, NAACL.