A Human-Centered Interpretability Framework Based on Weight of Evidence

In this paper, we take a human-centered approach to interpretable machine learning. First, drawing inspiration from the study of explanation in philosophy, cognitive science, and the social sciences, we propose a list of design principles for machinegenerated explanations that are meaningful to humans. Using the concept of weight of evidence from information theory, we develop a method for producing explanations that adhere to these principles. We show that this method can be adapted to handle high-dimensional, multi-class settings, yielding a flexible meta-algorithm for generating explanations. We demonstrate that these explanations can be estimated accurately from finite samples and are robust to small perturbations of the inputs. We also evaluate our method through a qualitative user study with machine learning practitioners, where we observe that the resulting explanations are usable despite some participants struggling with background concepts like prior class probabilities. Finally, we conclude by surfacing design implications for interpretability tools.

[1]  Dan R. Olsen,et al.  Evaluating user interface systems research , 2007, UIST.

[2]  Sungsoo Ray Hong,et al.  Human Factors in Model Interpretability: Industry Practices, Challenges, and Needs , 2020, Proc. ACM Hum. Comput. Interact..

[3]  Daniel G. Goldstein,et al.  Manipulating and Measuring Model Interpretability , 2018, CHI.

[4]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[5]  三嶋 博之 The theory of affordances , 2008 .

[6]  Raimo Tuomela,et al.  A Pragmatic Theory of Explanation , 1984 .

[7]  Tommi S. Jaakkola,et al.  On the Robustness of Interpretability Methods , 2018, ArXiv.

[8]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[9]  W. Edwards,et al.  Conservatism in a simple probability inference task. , 1966, Journal of experimental psychology.

[10]  Cynthia Rudin,et al.  Interpretable classification models for recidivism prediction , 2015, 1503.07810.

[11]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[12]  Harmanpreet Kaur,et al.  Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning , 2020, CHI.

[13]  Ramprasaath R. Selvaraju,et al.  Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization , 2016 .

[14]  Jure Leskovec,et al.  Interpretable Decision Sets: A Joint Framework for Description and Prediction , 2016, KDD.

[15]  D. Goldstein,et al.  Simple Rules for Complex Decisions , 2017, 1702.04690.

[16]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[17]  Samuel J. Gershman,et al.  Human Evaluation of Models Built for Interpretability , 2019, HCOMP.

[18]  Scott E. Hudson,et al.  Concepts, Values, and Methods for Technical Human-Computer Interaction Research , 2014, Ways of Knowing in HCI.

[19]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[20]  Andrea Bunt,et al.  Are explanations always important?: a study of deployed, low-cost intelligent interactive systems , 2012, IUI '12.

[21]  Daniel S. Weld,et al.  The challenge of crafting intelligible intelligence , 2018, Commun. ACM.

[22]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[23]  Mark A. Neerincx,et al.  Contrastive Explanations with Local Foil Trees , 2018, ICML 2018.

[24]  Hang Zhang,et al.  Ubiquitous Log Odds: A Common Representation of Probability and Frequency Distortion in Perception, Action, and Cognition , 2012, Front. Neurosci..

[25]  Rachel K. E. Bellamy,et al.  Explaining models an empirical study of how explanations impact fairness judgment , 2019 .

[26]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[27]  I. Good Corroboration, Explanation, Evolving Probability, Simplicity and a Sharpened Razor , 1968, The British Journal for the Philosophy of Science.

[28]  Steven M. Drucker,et al.  Gamut: A Design Probe to Understand How Data Scientists Understand Machine Learning Models , 2019, CHI.

[29]  Michael Veale,et al.  Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making , 2018, CHI.

[30]  Jakob Nielsen,et al.  Determining Usability Test Sample Size , 2006 .

[31]  Noah D. Goodman,et al.  Intuitive Theories of Mind: A Rational Approach to False Belief , 2005 .

[32]  James H. Fetzer Statistical Explanation and Statistical Relevance , 1981 .

[33]  Charles S. Peirce,et al.  Illustrations of the Logic of Science , 2014 .

[34]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[35]  Solon Barocas,et al.  The Intuitive Appeal of Explainable Machines , 2018 .

[36]  Stephen D. Bay,et al.  Detecting change in categorical data: mining contrast sets , 1999, KDD '99.

[37]  D. Norman The Design of Everyday Things: Revised and Expanded Edition , 2013 .

[38]  Hanna M. Wallach,et al.  A Human-Centered Agenda for Intelligible Machine Learning , 2021 .

[39]  Paulo J. Azevedo,et al.  Rules for contrast sets , 2010, Intell. Data Anal..

[40]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[41]  J. Gold,et al.  Neural computations that underlie decisions about sensory stimuli , 2001, Trends in Cognitive Sciences.

[42]  L. Stein,et al.  Probability and the Weighing of Evidence , 1950 .

[43]  C. Hempel,et al.  Studies in the Logic of Explanation , 1948, Philosophy of Science.

[44]  Brian Y. Lim,et al.  COGAM: Measuring and Moderating Cognitive Load in Machine Learning Model Explanations , 2020, CHI.

[45]  Rich Caruana,et al.  Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation , 2017, AIES.

[46]  I.,et al.  Weight of Evidence : A Brief Survey , 2006 .

[47]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[48]  J. Koehler The base rate fallacy reconsidered: Descriptive, normative, and methodological challenges , 1996, Behavioral and Brain Sciences.

[49]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[50]  Anind K. Dey,et al.  Investigating intelligibility for uncertain context-aware applications , 2011, UbiComp '11.

[51]  Paul N. Bennett,et al.  Guidelines for Human-AI Interaction , 2019, CHI.

[52]  Tim Miller,et al.  Contrastive explanation: a structural-model approach , 2018, The Knowledge Engineering Review.

[53]  Geoffrey I. Webb,et al.  On detecting differences between groups , 2003, KDD '03.

[54]  J. Gold,et al.  Banburismus and the Brain Decoding the Relationship between Sensory Stimuli, Decisions, and Reward , 2002, Neuron.

[55]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[56]  Paulo Cortez,et al.  A Proactive Intelligent Decision Support System for Predicting the Popularity of Online News , 2015, EPIA.

[57]  Dympna O'Sullivan,et al.  The Role of Explanations on Trust and Reliance in Clinical Decision Support Systems , 2015, 2015 International Conference on Healthcare Informatics.

[58]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[59]  M. Bar-Hillel The base-rate fallacy in probability judgments. , 1980 .