COGAM: Measuring and Moderating Cognitive Load in Machine Learning Model Explanations

Interpretable machine learning models trade -off accuracy for simplicity to make explanations more readable and easier to comprehend. Drawing from cognitive psychology theories in graph comprehension, we formalize readability as visual cognitive chunks to measure and moderate the cognitive load in explanation visualizations. We present Cognitive-GAM (COGAM) to generate explanations with desired cognitive load and accuracy by combining the expressive nonlinear generalized additive models (GAM) with simpler sparse linear models. We calibrated visual cognitive chunks with reading time in a user study, characterized the trade-off between cognitive load and accuracy for four datasets in simulation studies, and evaluated COGAM against baselines with users. We found that COGAM can decrease cognitive load without decreasing accuracy and/or increase accuracy without increasing cognitive load. Our framework and empirical measurement instruments for cognitive load will enable more rigorous assessment of the human interpretability of explainable AI.

[1]  Jorge Gonçalves,et al.  Crowdsourcing Perceptions of Fair Predictors for Machine Learning , 2019, Proc. ACM Hum. Comput. Interact..

[2]  S. Drucker,et al.  TeleGam: Combining Visualization and Verbalization for Interpretable Machine Learning , 2019, 2019 IEEE Visualization Conference (VIS).

[3]  Rich Caruana,et al.  InterpretML: A Unified Framework for Machine Learning Interpretability , 2019, ArXiv.

[4]  David W. Aha,et al.  DARPA's Explainable Artificial Intelligence (XAI) Program , 2019, AI Mag..

[5]  Steven M. Drucker,et al.  Gamut: A Design Probe to Understand How Data Scientists Understand Machine Learning Models , 2019, CHI.

[6]  Qian Yang,et al.  Designing Theory-Driven User-Centric Explainable AI , 2019, CHI.

[7]  Paul N. Bennett,et al.  Guidelines for Human-AI Interaction , 2019, CHI.

[8]  Carrie J. Cai,et al.  The effects of example-based explanations in a machine learning interface , 2019, IUI.

[9]  David Gunning,et al.  DARPA's explainable artificial intelligence (XAI) program , 2019, IUI.

[10]  Mark O. Riedl,et al.  Automated rationale generation: a technique for explainable AI and its effects on human perceptions , 2019, IUI.

[11]  Mark T. Keane,et al.  The Role of Surprise in Learning: Different Surprising Outcomes Affect Memorability Differentially , 2018, Top. Cogn. Sci..

[12]  Huamin Qu,et al.  RuleMatrix: Visualizing and Understanding Classifiers with Rules , 2018, IEEE Transactions on Visualization and Computer Graphics.

[13]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[14]  Daniel A. Keim,et al.  Going beyond Visualization. Verbalization as Complementary Medium to Explain Machine Learning Models , 2018 .

[15]  Zachary C. Lipton,et al.  The mythos of model interpretability , 2018, Commun. ACM.

[16]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[17]  Mohan S. Kankanhalli,et al.  Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda , 2018, CHI.

[18]  Daniel Servén,et al.  pyGAM: Generalized Additive Models in Python , 2018 .

[19]  Emily Chen,et al.  How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation , 2018, ArXiv.

[20]  Mike Wu,et al.  Beyond Sparsity: Tree Regularization of Deep Models for Interpretability , 2017, AAAI.

[21]  Rich Caruana,et al.  Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation , 2017, AIES.

[22]  Minsuk Kahng,et al.  ActiVis: Visual Exploration of Industry-Scale Deep Neural Network Models , 2017, IEEE Transactions on Visualization and Computer Graphics.

[23]  David Weinberger,et al.  Accountability of AI Under the Law: The Role of Explanation , 2017, ArXiv.

[24]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[25]  S. Wood Generalized Additive Models: An Introduction with R, Second Edition , 2017 .

[26]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[27]  Nadya Vasilyeva,et al.  Contextual utility affects the perceived quality of explanations , 2017, Psychonomic bulletin & review.

[28]  Finale Doshi-Velez,et al.  A Roadmap for a Rigorous Science of Interpretability , 2017, ArXiv.

[29]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Margo I. Seltzer,et al.  Scalable Bayesian Rule Lists , 2016, ICML.

[31]  Sung-Hee Kim,et al.  VLAT: Development of a Visualization Literacy Assessment Test , 2017, IEEE Transactions on Visualization and Computer Graphics.

[32]  Klaus-Robert Müller,et al.  Explainable artificial intelligence , 2017 .

[33]  Benjamin Strobel,et al.  Do graph readers prefer the graph type most suited to a given task? Insights from eye tracking , 2017 .

[34]  Kenney Ng,et al.  Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models , 2016, CHI.

[35]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[36]  Olli Rummukainen,et al.  Task-relevant spatialized auditory cues enhance attention orientation and peripheral target detection in natural scenes , 2016 .

[37]  M. Leach,et al.  Potentially useful study of transcendental meditation fails to impress because of poor methodology , 2015 .

[38]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[39]  T. Hastie,et al.  Generalized Additive Model Selection , 2015, 1506.03850.

[40]  Cynthia Rudin,et al.  Interpretable classification models for recidivism prediction , 2015, 1503.07810.

[41]  Weng-Keen Wong,et al.  Principles of Explanatory Debugging to Personalize Interactive Machine Learning , 2015, IUI.

[42]  Cynthia Rudin,et al.  Supersparse linear integer models for optimized medical scoring systems , 2015, Machine Learning.

[43]  T. Gog,et al.  Measuring cognitive load with subjective rating scales during problem solving: differences between immediate and delayed ratings , 2015 .

[44]  Vidya Setlur,et al.  Four Experiments on the Perception of Bar Charts , 2014, IEEE Transactions on Visualization and Computer Graphics.

[45]  Johannes Gehrke,et al.  Sparse Partially Linear Additive Models , 2014, ArXiv.

[46]  R. Lynn,et al.  A negative Flynn effect in Finland, 1997–2009 , 2013 .

[47]  David Peebles,et al.  The Effect of Gestalt Laws of Perceptual Organization on the Comprehension of Three-Variable Bar and Line Graphs , 2013, Hum. Factors.

[48]  Anind K. Dey,et al.  Weights of evidence for intelligible smart environments , 2012, UbiComp '12.

[49]  Johannes Gehrke,et al.  Intelligible models for classification and regression , 2012, KDD.

[50]  Unaizah Hanum Binti Obaidellah The role of chunking and schemas in learning and drawing , 2012 .

[51]  Dean R. De Cock,et al.  Ames, Iowa: Alternative to the Boston Housing Data as an End of Semester Regression Project , 2011 .

[52]  Benoit Morel,et al.  Artificial intelligence and the future of cybersecurity , 2011, AISec '11.

[53]  Priti Shah,et al.  Bar and Line Graph Comprehension: An Interaction of Top-Down and Bottom-Up Processes , 2011, Top. Cogn. Sci..

[54]  Kilian G. Seeber Cognitive load in simultaneous interpreting: Existing theories — new models , 2011 .

[55]  Xiaoming Xi Aspects of performance on line graph description tasks: influenced by graph familiarity and different task features: , 2010 .

[56]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[57]  Weidong Huang,et al.  Measuring Effectiveness of Graph Visualizations: A Cognitive Load Perspective , 2009, Inf. Vis..

[58]  Gerhard Tutz,et al.  A comparison of methods for the fitting of generalized additive models , 2008, Stat. Comput..

[59]  T. Lombrozo,et al.  Simplicity and probability in causal explanation , 2007, Cognitive Psychology.

[60]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[61]  T. Lombrozo The structure and function of explanations , 2006, Trends in Cognitive Sciences.

[62]  P. Shah,et al.  Review of Graph Comprehension Research: Implications for Instruction , 2002 .

[63]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[64]  Stefan Sperlich,et al.  Generalized Additive Models , 2014 .

[65]  A. Baddeley,et al.  Pattern span: a tool for unwelding visuo–spatial memory , 1999, Neuropsychologia.

[66]  I-Cheng Yeh,et al.  Modeling of strength of high-performance concrete using artificial neural networks , 1998 .

[67]  P. Carpenter,et al.  Conceptual limitations in comprehending line graphs. , 1995 .

[68]  C. Melody Carswell,et al.  Stimulus complexity and information integration in the spontaneous interpretations of line graphs , 1993 .

[69]  Roy O. Freedle,et al.  Artificial Intelligence and the Future of Testing , 1990 .

[70]  Steven Pinker,et al.  A theory of graph comprehension. , 1990 .

[71]  Charles R. Fletcher Short-term memory processes in text comprehension , 1981 .

[72]  P. Carpenter,et al.  Individual differences in working memory and reading , 1980 .

[73]  R. Dawes Judgment under uncertainty: The robust beauty of improper linear models in decision making , 1979 .

[74]  Walter Kintsch,et al.  Toward a model of text comprehension and production. , 1978 .

[75]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .