Rethinking AI Explainability and Plausibility

Setting proper evaluation objectives for explainable artificial intelligence (XAI) is vital for making XAI algorithms follow human communication norms, support human reasoning processes, and fulfill human needs for AI explanations. In this article, we examine explanation plausibility, which is the most pervasive human-grounded concept in XAI evaluation. Plausibility measures how reasonable the machine explanation is compared to the human explanation. Plausibility has been conventionally formulated as an important evaluation objective for AI explainability tasks. We argue against this idea, and show how optimizing and evaluating XAI for plausibility is sometimes harmful, and always ineffective to achieve model understandability, transparency, and trustworthiness. Specifically, evaluating XAI algorithms for plausibility regularizes the machine explanation to express exactly the same content as human explanation, which deviates from the fundamental motivation for humans to explain: expressing similar or alternative reasoning trajectories while conforming to understandable forms or language. Optimizing XAI for plausibility regardless of the model decision correctness also jeopardizes model trustworthiness, as doing so breaks an important assumption in human-human explanation namely that plausible explanations typically imply correct decisions, and violating this assumption eventually leads to either undertrust or overtrust of AI models. Instead of being the end goal in XAI evaluation, plausibility can serve as an intermediate computational proxy for the human process of interpreting explanations to optimize the utility of XAI. We further highlight the importance of explainability-specific evaluation objectives by differentiating the AI explanation task from the object localization task.

[1]  Jennifer Wortman Vaughan,et al.  Understanding the Role of Human Intuition on Reliance in Human-AI Decision-Making with Explanations , 2023, ArXiv.

[2]  Weina Jin,et al.  Generating post-hoc explanation from deep neural networks for multi-modal medical image analysis tasks , 2023, MethodsX.

[3]  G. Hamarneh,et al.  Guidelines and evaluation of clinical explainable AI in medical image analysis , 2022, Medical Image Anal..

[4]  C. Seifert,et al.  From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI , 2022, ACM Comput. Surv..

[5]  G. Hamarneh,et al.  Evaluating the Clinical Utility of Artificial Intelligence Assistance and its Explanation on Glioma Grading Task , 2022, medRxiv.

[6]  Been Kim,et al.  Post hoc Explanations may be Ineffective for Detecting Unknown Spurious Correlation , 2022, ICLR.

[7]  Andrew Y. Ng,et al.  Benchmarking saliency methods for chest X-ray interpretation , 2022, Nature Machine Intelligence.

[8]  Zoe Porter,et al.  Distinguishing two features of accountability for AI technologies , 2022, Nature Machine Intelligence.

[9]  D. Gromala,et al.  Transcending XAI Algorithm Boundaries through End-User-Inspired Design , 2022, ArXiv.

[10]  Mohammad Reza Taesiri,et al.  Visual correspondence-based explanations improve AI robustness and human-AI team accuracy , 2022, NeurIPS.

[11]  Xiaoxiao Li,et al.  Evaluating Explainable AI on a Multi-Modal Medical Imaging Task: Can Existing Algorithms Fulfill Clinical Requirements? , 2022, AAAI.

[12]  Chenhao Tan,et al.  Machine Explanations and Human Understanding , 2022, Trans. Mach. Learn. Res..

[13]  Ruth C. Fong,et al.  HIVE: Evaluating the Human Interpretability of Visual Explanations , 2021, ECCV.

[14]  Serena Booth,et al.  Do Feature Attribution Methods Correctly Attribute Features? , 2021, AAAI.

[15]  Ming-Hsuan Yang,et al.  Weakly Supervised Object Localization and Detection: A Survey , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Kush R. Varshney,et al.  Human-Centered Explainable AI (XAI): From Algorithms to User Experiences , 2021, ArXiv.

[17]  Hanna M. Wallach,et al.  A Human-Centered Agenda for Intelligible Machine Learning , 2021 .

[18]  Ho Chit Siu,et al.  Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi , 2021, NeurIPS.

[19]  Anh M Nguyen,et al.  The effectiveness of feature attribution methods and its correlation with automatic evaluation scores , 2021, NeurIPS.

[20]  Eric D. Ragan,et al.  Quantitative Evaluation of Machine Learning Explanations: A Human-Grounded Benchmark , 2021, IUI.

[21]  D. Gromala,et al.  EUCA: the End-User-Centered Explainable AI Framework , 2021, 2102.02437.

[22]  Maia L. Jacobs,et al.  How machine-learning recommendations influence clinician treatment selections: the example of antidepressant selection , 2021, Translational Psychiatry.

[23]  Chenhao Tan,et al.  Understanding the Effect of Out-of-distribution Examples and Interactive Explanations on Human-AI Decision Making , 2021, Proc. ACM Hum. Comput. Interact..

[24]  Jan A. Kors,et al.  The role of explainability in creating trustworthy artificial intelligence for health care: a comprehensive survey of the terminology, design choices, and evaluation strategies , 2020, J. Biomed. Informatics.

[25]  Raymond Fok,et al.  Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance , 2020, CHI.

[26]  Eric D. Ragan,et al.  A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI Systems , 2018, ACM Trans. Interact. Intell. Syst..

[27]  Niklas Kühl,et al.  Human-AI Complementarity in Hybrid Intelligence Systems: A Structured Literature Review , 2021, PACIS.

[28]  Hua Shen,et al.  How Useful Are the Machine-Generated Interpretations to General Users? A Human Evaluation on Guessing the Incorrectly Predicted Labels , 2020, HCOMP.

[29]  Andrew Critch,et al.  AI Research Considerations for Human Existential Safety (ARCHES) , 2020, ArXiv.

[30]  Qiaozhu Mei,et al.  Feature-Based Explanations Don't Help People Detect Misclassifications of Online Toxicity , 2020, ICWSM.

[31]  Nicholas Ruozzi,et al.  Don't Explain without Verifying Veracity: An Evaluation of Explainable AI with Video Activity Recognition , 2020, ArXiv.

[32]  Yoav Goldberg,et al.  Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness? , 2020, ACL.

[33]  C. Rudin,et al.  Concept whitening for interpretable image recognition , 2020, Nature Machine Intelligence.

[34]  Yunfeng Zhang,et al.  Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making , 2020, FAT*.

[35]  Himabindu Lakkaraju,et al.  "How do I fool you?": Manipulating User Trust via Misleading Black Box Explanations , 2019, AIES.

[36]  Byron C. Wallace,et al.  ERASER: A Benchmark to Evaluate Rationalized NLP Models , 2019, ACL.

[37]  Alejandro Barredo Arrieta,et al.  Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI , 2019, Inf. Fusion.

[38]  Francesca Lagioia,et al.  The impact of the General Data Protection Regulation (GDPR) on artificial intelligence , 2020 .

[39]  V. Lakshminarayanan,et al.  What is the Optimal Attribution Method for Explainable Ophthalmic Disease Classification? , 2020, OMIA@MICCAI.

[40]  Zijian Zhang,et al.  Dissonance Between Human and Machine Understanding , 2019, Proc. ACM Hum. Comput. Interact..

[41]  Francesca Toni,et al.  Human-grounded Evaluations of Explanation Methods for Text Classification , 2019, EMNLP.

[42]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[43]  Daniel S. Weld,et al.  The challenge of crafting intelligible intelligence , 2018, Commun. ACM.

[44]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[45]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[46]  Mike Ananny,et al.  Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability , 2018, New Media Soc..

[47]  Bernease Herman,et al.  The Promise and Peril of Human Evaluation for Model Interpretability , 2017, ArXiv.

[48]  David Weinberger,et al.  Accountability of AI Under the Law: The Role of Explanation , 2017, ArXiv.

[49]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[50]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.

[51]  Seth Flaxman,et al.  European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[52]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Geoffrey K Lighthall,et al.  Understanding Decision Making in Critical Care , 2015, Clinical Medicine & Research.

[54]  Dympna O'Sullivan,et al.  The Role of Explanations on Trust and Reliance in Clinical Decision Support Systems , 2015, 2015 International Conference on Healthcare Informatics.

[55]  B. Malle How the Mind Explains Behavior: Folk Explanations, Meaning, and Social Interaction , 2004 .