Machine Explanations and Human Understanding

Explanations are hypothesized to improve human understanding of machine learning models and achieve a variety of desirable outcomes, ranging from model debugging to enhancing human decision making. However, empirical studies have found mixed and even negative results. An open question, therefore, is under what conditions explanations can improve human understanding and in what way. Using adapted causal diagrams, we provide a formal characterization of the interplay between machine explanations and human understanding, and show how human intuitions play a central role in enabling human understanding. Specifically, we identify three core concepts of interest that cover all existing quantitative measures of understanding in the context of human-AI decision making: task decision boundary, model decision boundary, and model error. Our key result is that without assumptions about task-specific intuitions, explanations may potentially improve human understanding of model decision boundary, but they cannot improve human understanding of task decision boundary or model error. To achieve complementary human-AI performance, we articulate possible ways on how explanations need to work with human intuitions. For instance, human intuitions about the relevance of features (e.g., education is more important than age in predicting a person's income) can be critical in detecting model error. We validate the importance of human intuitions in shaping the outcome of machine explanations with empirical human-subject studies. Overall, our work provides a general framework along with actionable implications for future algorithmic development and empirical experiments of machine explanations.

[1]  Been Kim,et al.  Post hoc Explanations may be Ineffective for Detecting Unknown Spurious Correlation , 2022, ICLR.

[2]  Mohammad Reza Taesiri,et al.  Visual correspondence-based explanations improve AI robustness and human-AI team accuracy , 2022, NeurIPS.

[3]  Scott Cheng‐Hsin Yang,et al.  A psychological theory of explainability , 2022, ICML.

[4]  A. Chouldechova,et al.  Human-Algorithm Collaboration: Achieving Complementarity and Avoiding Unfairness , 2022, FAccT.

[5]  Thomas Serre,et al.  What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods , 2021, NeurIPS.

[6]  Ruth C. Fong,et al.  HIVE: Evaluating the Human Interpretability of Visual Explanations , 2021, ECCV.

[7]  Q. Vera Liao,et al.  Towards a Science of Human-AI Decision Making: A Survey of Empirical Studies , 2021, ArXiv.

[8]  Anh M Nguyen,et al.  The effectiveness of feature attribution methods and its correlation with automatic evaluation scores , 2021, NeurIPS.

[9]  Harmanpreet Kaur,et al.  From Human Explanation to Model Interpretability: A Framework Based on Weight of Evidence , 2021, HCOMP.

[10]  Vibhav Gogate,et al.  Anchoring Bias Affects Mental Model Formation and User Reliance in Explainable AI Systems , 2021, IUI.

[11]  Ming Yin,et al.  Are Explanations Helpful? A Comparative Study of the Effects of Explanations in AI-Assisted Decision-Making , 2021, IUI.

[12]  Michael Chromik,et al.  I Think I Get Your Point, AI! The Illusion of Explanatory Depth in Explainable AI , 2021, IUI.

[13]  Chenhao Tan,et al.  Understanding the Effect of Out-of-distribution Examples and Interactive Explanations on Human-AI Decision Making , 2021, Proc. ACM Hum. Comput. Interact..

[14]  Chenhao Tan,et al.  Towards Unifying Feature Attribution and Counterfactual Explanations: Different Means to the Same End , 2020, AIES.

[15]  Raymond Fok,et al.  Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance , 2020, CHI.

[16]  Daniel G. Goldstein,et al.  Manipulating and Measuring Model Interpretability , 2018, CHI.

[17]  A. D'Amour,et al.  Counterfactual Invariance to Spurious Correlations in Text Classification , 2021, NeurIPS.

[18]  Yashar Mehdad,et al.  Human Evaluation of Spoken vs. Visual Explanations for Open-Domain QA , 2020, ArXiv.

[19]  Qiaozhu Mei,et al.  Feature-Based Explanations Don't Help People Detect Misclassifications of Online Toxicity , 2020, ICWSM.

[20]  Mohit Bansal,et al.  Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? , 2020, ACL.

[21]  Sungsoo Ray Hong,et al.  Human Factors in Model Interpretability: Industry Practices, Challenges, and Needs , 2020, Proc. ACM Hum. Comput. Interact..

[22]  Brian Y. Lim,et al.  COGAM: Measuring and Moderating Cognitive Load in Machine Learning Model Explanations , 2020, CHI.

[23]  Lauren Wilcox,et al.  A Human-Centered Evaluation of a Deep Learning System Deployed in Clinics for the Detection of Diabetic Retinopathy , 2020, CHI.

[24]  Ryanne A. Brown,et al.  Impact of a deep learning assistant on the histopathologic classification of liver cancer , 2020, npj Digital Medicine.

[25]  Enrico Costanza,et al.  Evaluating saliency map explanations for convolutional neural networks: a user study , 2020, IUI.

[26]  Jongbin Jung,et al.  The limits of human predictions of recidivism , 2020, Science Advances.

[27]  Krzysztof Z. Gajos,et al.  Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems , 2020, IUI.

[28]  Han Liu,et al.  "Why is 'Chicago' deceptive?" Towards Building Model-Driven Tutorials for Humans , 2020, CHI.

[29]  Yunfeng Zhang,et al.  Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making , 2020, FAT*.

[30]  M. de Rijke,et al.  Why does my model fail?: contrastive local explanations for retail forecasting , 2019, FAT*.

[31]  Amit Sharma,et al.  Explaining machine learning classifiers through diverse counterfactual explanations , 2019, FAT*.

[32]  BEN GREEN,et al.  The Principles and Limits of Algorithm-in-the-Loop Decision Making , 2019, Proc. ACM Hum. Comput. Interact..

[33]  Richard B. Berlin,et al.  A Slow Algorithm Improves Users' Assessments of the Algorithm's Accuracy , 2019, Proc. ACM Hum. Comput. Interact..

[34]  Eric Horvitz,et al.  Beyond Accuracy: The Role of Mental Models in Human-AI Team Performance , 2019, HCOMP.

[35]  Samuel J. Gershman,et al.  Human Evaluation of Models Built for Interpretability , 2019, HCOMP.

[36]  Mykola Pechenizkiy,et al.  A Human-Grounded Evaluation of SHAP for Alert Processing , 2019, ArXiv.

[37]  Hongyuan Zha,et al.  Visualizing Uncertainty and Alternatives in Event Sequence Predictions , 2019, CHI.

[38]  Carlos Eduardo Scheidegger,et al.  Assessing the Local Interpretability of Machine Learning Models , 2019, ArXiv.

[39]  Ben Green,et al.  Disparate Interactions: An Algorithm-in-the-Loop Analysis of Fairness in Risk Assessments , 2019, FAT.

[40]  Rachel K. E. Bellamy,et al.  Explaining models an empirical study of how explanations impact fairness judgment , 2019 .

[41]  Vivian Lai,et al.  On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection , 2018, FAT.

[42]  Jordan L. Boyd-Graber,et al.  What can AI do for me?: evaluating machine learning interpretations in cooperative play , 2018, IUI.

[43]  Gary Klein,et al.  Metrics for Explainable AI: Challenges and Prospects , 2018, ArXiv.

[44]  Devi Parikh,et al.  Do explanations make VQA models more predictable to a human? , 2018, EMNLP.

[45]  Scott M. Lundberg,et al.  Explainable machine-learning predictions for the prevention of hypoxaemia during surgery , 2018, Nature Biomedical Engineering.

[46]  Kori Inkpen Quinn,et al.  Investigating Human + Machine Complementarity for Recidivism Predictions , 2018, ArXiv.

[47]  Dong Nguyen,et al.  Comparing Automatic and Human Evaluation of Local Explanations for Text Classification , 2018, NAACL.

[48]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[49]  Jure Leskovec,et al.  Human Decisions and Machine Predictions , 2017, The quarterly journal of economics.

[50]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[51]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[52]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[53]  Kathleen McKeown,et al.  Human-Centric Justification of Machine Learning Predictions , 2017, IJCAI.

[54]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[55]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[56]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[57]  Jure Leskovec,et al.  Interpretable Decision Sets: A Joint Framework for Description and Prediction , 2016, KDD.

[58]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[59]  Dympna O'Sullivan,et al.  The Role of Explanations on Trust and Reliance in Clinical Decision Support Systems , 2015, 2015 International Conference on Healthcare Informatics.

[60]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[61]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[62]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[63]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[64]  John M. Winn,et al.  Causality with Gates , 2012, AISTATS.

[65]  Anind K. Dey,et al.  Why and why not explanations improve the intelligibility of context-aware intelligent systems , 2009, CHI.

[66]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[67]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[68]  C. K. Mertz,et al.  PSYCHOLOGICAL SCIENCE Research Article Numeracy and Decision Making , 2022 .

[69]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[70]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[71]  Arne Jönsson,et al.  Wizard of Oz studies: why and how , 1993, IUI '93.

[72]  P. Thagard,et al.  Explanatory coherence , 1989, Behavioral and Brain Sciences.

[73]  Illtyd Trethowan Causality , 1938 .