X-ToM: Explaining with Theory-of-Mind for Gaining Justified Human Trust

We present a new explainable AI (XAI) framework aimed at increasing justified human trust and reliance in the AI machine through explanations. We pose explanation as an iterative communication process, i.e. dialog, between the machine and human user. More concretely, the machine generates sequence of explanations in a dialog which takes into account three important aspects at each dialog turn: (a) human's intention (or curiosity); (b) human's understanding of the machine; and (c) machine's understanding of the human user. To do this, we use Theory of Mind (ToM) which helps us in explicitly modeling human's intention, machine's mind as inferred by the human as well as human's mind as inferred by the machine. In other words, these explicit mental representations in ToM are incorporated to learn an optimal explanation policy that takes into account human's perception and beliefs. Furthermore, we also show that ToM facilitates in quantitatively measuring justified human trust in the machine by comparing all the three mental representations. We applied our framework to three visual recognition tasks, namely, image classification, action recognition, and human body pose estimation. We argue that our ToM based explanations are practical and more natural for both expert and non-expert users to understand the internal workings of complex machine learning models. To the best of our knowledge, this is the first work to derive explanations using ToM. Extensive human study experiments verify our hypotheses, showing that the proposed explanations significantly outperform the state-of-the-art XAI methods in terms of all the standard quantitative and qualitative XAI evaluation metrics including human trust, reliance, and explanation satisfaction.

[1]  Trevor Darrell,et al.  Generating Visual Explanations , 2016, ECCV.

[2]  Frédéric Gosselin,et al.  Bubbles: a technique to reveal the use of information in recognition tasks , 2001, Vision Research.

[3]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[4]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[5]  Jeffrey M. Bradshaw,et al.  Metrics, Metrics, Metrics, Part 2: Universal Metrics? , 2010, IEEE Intelligent Systems.

[6]  Song-Chun Zhu,et al.  Attribute And-Or Grammar for Joint Parsing of Human Attributes, Part and Pose , 2016, ArXiv.

[7]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Herbert H. Clark,et al.  Contributing to Discourse , 1989, Cogn. Sci..

[9]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[10]  D. Hilton Conversational processes and causal explanation. , 1990 .

[11]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[12]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Quanshi Zhang,et al.  Interpretable Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Cynthia Rudin,et al.  The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification , 2014, NIPS.

[15]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[16]  T. Lombrozo The structure and function of explanations , 2006, Trends in Cognitive Sciences.

[17]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[18]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[19]  A. Goldman To Appear in: , 2008 .

[20]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[21]  Subhashini Venugopalan,et al.  Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. , 2016, JAMA.

[22]  H. H. Clark,et al.  Referring as a collaborative process , 1986, Cognition.

[23]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Andrew P. Witkin,et al.  Scale-Space Filtering , 1983, IJCAI.

[26]  M. Tomasello,et al.  Does the chimpanzee have a theory of mind? 30 years later , 2008, Trends in Cognitive Sciences.

[27]  Nando de Freitas,et al.  Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.

[28]  Alexander M. Rush,et al.  LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks , 2016, IEEE Transactions on Visualization and Computer Graphics.

[29]  Rachid Alami,et al.  An implemented theory of mind to improve human-robot shared plans execution , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[30]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[31]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[32]  Amit Dhurandhar,et al.  Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives , 2018, NeurIPS.

[33]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[34]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[35]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[36]  G. Meade Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001 .

[37]  Gary Klein,et al.  Metrics for Explainable AI: Challenges and Prospects , 2018, ArXiv.

[38]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[39]  Ziyan Wu,et al.  Counterfactual Visual Explanations , 2019, ICML.

[40]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[41]  James P. Bliss,et al.  The Role of Trust as a Mediator Between System Characteristics and Response Behaviors , 2015, Hum. Factors.

[42]  Shaohua Yang,et al.  Commonsense Justification for Action Explanation , 2018, EMNLP.