How do visual explanations foster end users' appropriate trust in machine learning?

We investigated the effects of example-based explanations for a machine learning classifier on end users' appropriate trust. We explored the effects of spatial layout and visual representation in an in-person user study with 33 participants. We measured participants' appropriate trust in the classifier, quantified the effects of different spatial layouts and visual representations, and observed changes in users' trust over time. The results show that each explanation improved users' trust in the classifier, and the combination of explanation, human, and classification algorithm yielded much better decisions than the human and classification algorithm separately. Yet these visual explanations lead to different levels of trust and may cause inappropriate trust if an explanation is difficult to understand. Visual representation and performance feedback strongly affect users' trust, and spatial layout shows a moderate effect. Our results do not support that individual differences (e.g., propensity to trust) affect users' trust in the classifier. This work advances the state-of-the-art in trust-able machine learning and informs the design and appropriate use of automated systems.

[1]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[2]  Weng-Keen Wong,et al.  Principles of Explanatory Debugging to Personalize Interactive Machine Learning , 2015, IUI.

[3]  Minsuk Kahng,et al.  Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers , 2018, IEEE Transactions on Visualization and Computer Graphics.

[4]  Quanshi Zhang,et al.  Visual interpretability for deep learning: a survey , 2018, Frontiers of Information Technology & Electronic Engineering.

[5]  Lawrence K. Jones,et al.  Multiple Subtypes among Vocationally Undecided College Students: A Model and Assessment Instrument. , 1980 .

[6]  M. Watson,et al.  Can There Be Just One Trust? A Cross-Disciplinary Identification Of Trust Definitions And Measurement , 2005 .

[7]  G. Elofson Developing Trust with Intelligent Agents: An Exploratory Study , 2001 .

[8]  Heinrich Hußmann,et al.  Understanding Algorithms through Exploration: Supporting Knowledge Acquisition in Primary Tasks , 2019, MuC.

[9]  Daniel R. Ilgen,et al.  Not All Trust Is Created Equal: Dispositional and History-Based Trust in Human-Automation Interactions , 2008, Hum. Factors.

[10]  Stephanie M. Merritt Affective Processes in Human–Automation Interactions , 2011, Hum. Factors.

[11]  Julian D. Olden,et al.  Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks , 2002 .

[12]  Bonnie M. Muir,et al.  Trust Between Humans and Machines, and the Design of Decision Aids , 1987, Int. J. Man Mach. Stud..

[13]  Oded Nov,et al.  The Persuasive Power of Data Visualization , 2014, IEEE Transactions on Visualization and Computer Graphics.

[14]  André R. S. Marçal,et al.  Evaluation of Features for Leaf Discrimination , 2013, ICIAR.

[15]  Mary Czerwinski,et al.  Effective Notification Systems Depend on User Trust , 2001, INTERACT.

[16]  Berkeley J. Dietvorst,et al.  Algorithm Aversion: People Erroneously Avoid Algorithms after Seeing Them Err , 2014, Journal of experimental psychology. General.

[17]  Linda G. Pierce,et al.  The Perceived Utility of Human and Automated Aids in a Visual Detection Task , 2002, Hum. Factors.

[18]  John D. Lee,et al.  Trust in Automation: Designing for Appropriate Reliance , 2004, Hum. Factors.

[19]  René F. Kizilcec How Much Information?: Effects of Transparency on Trust in an Algorithmic Interface , 2016, CHI.

[20]  Ann M. Bisantz,et al.  The impact of cognitive feedback on judgment performance and trust with decision aids , 2008 .

[21]  D. Wiegmann,et al.  Similarities and differences between human–human and human–automation trust: an integrative review , 2007 .

[22]  Li Chen,et al.  Trust building with explanation interfaces , 2006, IUI '06.

[23]  Ajay Rana,et al.  K-means with Three different Distance Metrics , 2013 .

[24]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[25]  Yashesh Gaur,et al.  The effects of automatic speech recognition quality on human transcription latency , 2016, W4A.

[26]  Yao Ming A SURVEY ON VISUALIZATION FOR EXPLAINABLE CLASSIFIERS by , 2017 .

[27]  Min Kyung Lee Understanding perception of algorithmic decisions: Fairness, trust, and emotion in response to algorithmic management , 2018, Big Data Soc..

[28]  Mohan S. Kankanhalli,et al.  Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda , 2018, CHI.

[29]  Enrico Bertini,et al.  Interpreting Black-Box Classifiers Using Instance-Level Visual Explanations , 2017, HILDA@SIGMOD.

[30]  Dympna O'Sullivan,et al.  The Role of Explanations on Trust and Reliance in Clinical Decision Support Systems , 2015, 2015 International Conference on Healthcare Informatics.

[31]  Kenney Ng,et al.  Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models , 2016, CHI.

[32]  Azad M. Madni,et al.  Architectural Framework for Exploring Adaptive Human-Machine Teaming Options in Simulated Dynamic Environments , 2018, Syst..

[33]  Jing Wu,et al.  Visual Diagnosis of Tree Boosting Methods , 2018, IEEE Transactions on Visualization and Computer Graphics.

[34]  Gary Klein,et al.  Explaining Explanation, Part 1: Theoretical Foundations , 2017, IEEE Intelligent Systems.

[35]  R. Marshall Building trust early: the influence of first and second order expectations on trust in international channels of distribution , 2003 .

[36]  Emerson Franchini,et al.  Effect of fatigue on reaction time, response time, performance time, and kick impact in taekwondo roundhouse kick , 2017, Sports biomechanics.

[37]  Mark R. Lehto,et al.  Foundations for an Empirically Determined Scale of Trust in Automated Systems , 2000 .

[38]  Makoto Itoh,et al.  Trust, Self-Confidence and Authority in Human-Machine Systems , 1998 .

[39]  Minsuk Kahng,et al.  ActiVis: Visual Exploration of Industry-Scale Deep Neural Network Models , 2017, IEEE Transactions on Visualization and Computer Graphics.

[40]  Ning Wang,et al.  Trust calibration within a human-robot team: Comparing automatically generated explanations , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[41]  Michael Gleicher,et al.  Explainers: Expert Explorations with Crafted Projections , 2013, IEEE Transactions on Visualization and Computer Graphics.

[42]  S. Gregor,et al.  Measuring Human-Computer Trust , 2000 .

[43]  Jock D. Mackinlay,et al.  The structure of the information visualization design space , 1997, Proceedings of VIZ '97: Visualization Conference, Information Visualization Symposium and Parallel Rendering Symposium.

[44]  Michelle A. Borkin,et al.  What Makes a Visualization Memorable? , 2013, IEEE Transactions on Visualization and Computer Graphics.

[45]  N. Moray,et al.  Trust in automation. Part II. Experimental studies of trust and human intervention in a process control simulation. , 1996, Ergonomics.

[46]  P. Maruff,et al.  Practice Effects Associated with the Repeated Assessment of Cognitive Function Using the CogState Battery at 10-minute, One Week and One Month Test-retest Intervals , 2006, Journal of clinical and experimental neuropsychology.

[47]  Deborah Lee,et al.  I Trust It, but I Don’t Know Why , 2013, Hum. Factors.

[48]  Daniel J. McAllister Affect- and Cognition-Based Trust as Foundations for Interpersonal Cooperation in Organizations , 1995 .

[49]  Anind K. Dey,et al.  Why and why not explanations improve the intelligibility of context-aware intelligent systems , 2009, CHI.

[50]  Yindalon Aphinyanagphongs,et al.  A Workflow for Visual Diagnostics of Binary Classifiers using Instance-Level Explanations , 2017, 2017 IEEE Conference on Visual Analytics Science and Technology (VAST).

[51]  Fujun Wang,et al.  A new method for nondestructive quality evaluation of the resistance spot welding based on the radar chart method and the decision tree classifier , 2015 .

[52]  Raja Parasuraman,et al.  Adaptive Aiding of Human-Robot Teaming , 2011 .

[53]  Connie R. Wanberg,et al.  A Typology of Career Decision Status: Validity Extension of the Vocational Decision Status Model , 1992 .

[54]  Anand K. Gramopadhye,et al.  A MODEL FOR PREDICTING HUMAN TRUST IN AUTOMATED SYSTEMS , 2003 .

[55]  Detmar W. Straub,et al.  Trust and TAM in Online Shopping: An Integrated Model , 2003, MIS Q..

[56]  Qian Yang,et al.  Why these Explanations? Selecting Intelligibility Types for Explanation Goals , 2019, IUI Workshops.

[57]  L. David,et al.  TRUST, RISK, AND DECISION-MAKING IN ORGANIZATIONAL CHANGE , 2016 .

[58]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[60]  Deborah Lee,et al.  Are Well-Calibrated Users Effective Users? Associations Between Calibration of Trust and Performance on an Automation-Aided Task , 2015, Hum. Factors.

[61]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[62]  Carla E. Brodley,et al.  Dis-function: Learning distance functions interactively , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[63]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[64]  Marcel Worring,et al.  Interactive decision making using dissimilarity to visually represented prototypes , 2011, 2011 IEEE Conference on Visual Analytics Science and Technology (VAST).

[65]  Norman L. Chervany,et al.  Trust and Distrust Definitions: One Bite at a Time , 2000, Trust in Cyber-societies.

[66]  Qian Yang,et al.  Designing Theory-Driven User-Centric Explainable AI , 2019, CHI.

[67]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[68]  Regina A. Pomranky,et al.  The role of trust in automation reliance , 2003, Int. J. Hum. Comput. Stud..

[69]  Devon S. Johnson,et al.  Cognitive and affective trust in service relationships , 2005 .

[70]  G. Cumming The New Statistics: Why and How , 2013 .

[71]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[72]  G. Cumming Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis , 2011 .

[73]  Anand K. Gramopadhye,et al.  Effect of Error Severity on Human Trust in Hybrid Systems , 2004 .

[74]  Marc W Howard,et al.  Contextual variability and serial position effects in free recall. , 1999, Journal of experimental psychology. Learning, memory, and cognition.

[75]  Stephen Marsh,et al.  Trust, Untrust, Distrust and Mistrust - An Exploration of the Dark(er) Side , 2005, iTrust.

[76]  Tom Vanallemeersch,et al.  Intellingo: An Intelligible Translation Environment , 2018, CHI.

[77]  D. Gefen,et al.  E-commerce: the role of familiarity and trust , 2000 .

[78]  Dayong Wang,et al.  Deep Learning for Identifying Metastatic Breast Cancer , 2016, ArXiv.

[79]  Izak Benbasat,et al.  Recommendation Agents for Electronic Commerce: Effects of Explanation Facilities on Trusting Beliefs , 2007, J. Manag. Inf. Syst..

[80]  Enrico Bertini,et al.  INFUSE: Interactive Feature Selection for Predictive Modeling of High Dimensional Data , 2014, IEEE Transactions on Visualization and Computer Graphics.

[81]  Desney S. Tan,et al.  Examining multiple potential models in end-user interactive concept learning , 2010, CHI.

[82]  Emden R. Gansner,et al.  Graphviz - Open Source Graph Drawing Tools , 2001, GD.

[83]  Nick Cramer,et al.  Familiarity Vs Trust: A Comparative Study of Domain Scientists' Trust in Visual Analytics and Conventional Analysis Methods , 2017, IEEE Transactions on Visualization and Computer Graphics.

[84]  Erik Strumbelj,et al.  Explaining instance classifications with interactions of subsets of feature values , 2009, Data Knowl. Eng..

[85]  Beste F. Yuksel,et al.  Brains or Beauty , 2017, ACM Trans. Internet Techn..

[86]  Raja Parasuraman,et al.  A Design Methodology for Trust Cue Calibration in Cognitive Agents , 2014, HCI.

[87]  Scott Cheng‐Hsin Yang,et al.  Explainable Artificial Intelligence via Bayesian Teaching , 2017 .

[88]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[89]  G. E. Thomas Resampling‐Based Multiple Testing: Examples and Methods for p‐Value Adjustment , 1994 .

[90]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[91]  Laurence W. Grimes,et al.  Measurement of trust over time in hybrid inspection systems , 2005 .

[92]  Jaegul Choo,et al.  Visual Analytics for Explainable Deep Learning , 2018, IEEE Computer Graphics and Applications.

[93]  J. G. Holmes,et al.  Trust in close relationships. , 1985 .

[94]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[95]  BenbasatIzak,et al.  The effects of personalizaion and familiarity on trust and adoption of recommendation agents , 2006 .

[96]  Yu-chen Hsu,et al.  The effects of metaphors on novice and expert learners' performance and mental-model development , 2006, Interact. Comput..

[97]  Sebastian Grottel,et al.  Visualizations of Deep Neural Networks in Computer Vision: A Survey , 2017 .

[98]  Christopher A. Miller,et al.  Trust in Adaptive Automation : The Role of Etiquette in Tuning Trust via Analogic and Affective Methods , 2005 .

[99]  Hadas Erel,et al.  Introducing children to machine learning concepts through hands-on experience , 2018, IDC.

[100]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[101]  Pierre Dragicevic,et al.  Fair Statistical Communication in HCI , 2016 .

[102]  Anind K. Dey,et al.  Improving Understanding and Trust with Intelligibility in Context-Aware Applications , 2012 .

[103]  Ewart de Visser,et al.  Measurement of trust in human-robot collaboration , 2007, 2007 International Symposium on Collaborative Technologies and Systems.

[104]  John D. Lee,et al.  Trust, self-confidence, and operators' adaptation to automation , 1994, Int. J. Hum. Comput. Stud..

[105]  Carrie J. Cai,et al.  The effects of example-based explanations in a machine learning interface , 2019, IUI.

[106]  Martin Wattenberg,et al.  Visualizing Dataflow Graphs of Deep Learning Models in TensorFlow , 2018, IEEE Transactions on Visualization and Computer Graphics.

[107]  Fang Chen,et al.  2D Transparency Space - Bring Domain Users and Machine Learning Experts Together , 2018, Human and Machine Learning.

[108]  Thomas G. Dietterich,et al.  Interacting meaningfully with machine learning systems: Three experiments , 2009, Int. J. Hum. Comput. Stud..

[109]  Francisco Herrera,et al.  Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI , 2020, Inf. Fusion.

[110]  Morris Sloman,et al.  A survey of trust in internet applications , 2000, IEEE Communications Surveys & Tutorials.

[111]  Raja Parasuraman,et al.  Humans and Automation: Use, Misuse, Disuse, Abuse , 1997, Hum. Factors.

[112]  Cees J. H. Midden,et al.  The effects of errors on system trust, self-confidence, and the allocation of control in route planning , 2003, Int. J. Hum. Comput. Stud..

[113]  Nadine B. Sarter,et al.  Supporting Trust Calibration and the Effective Use of Decision Aids by Presenting Dynamic System Confidence Information , 2006, Hum. Factors.

[114]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[115]  Matthew K. O. Lee,et al.  A Trust Model for Consumer Internet Shopping , 2001, Int. J. Electron. Commer..

[116]  J. H. Davis,et al.  An Integrative Model Of Organizational Trust , 1995 .

[117]  Stephan Diehl,et al.  Exploring the Limits of Complexity: A Survey of Empirical Studies on Graph Visualisation , 2018, Vis. Informatics.

[118]  Bonnie M. Muir,et al.  Trust in automation. I: Theoretical issues in the study of trust and human intervention in automated systems , 1994 .

[119]  Simone Stumpf,et al.  Explaining Smart Heating Systems to Discourage Fiddling with Optimized Behavior , 2018, IUI Workshops.

[120]  Charles J. Kacmar,et al.  Developing and Validating Trust Measures for e-Commerce: An Integrative Typology , 2002, Inf. Syst. Res..

[121]  N Moray,et al.  Trust, control strategies and allocation of function in human-machine systems. , 1992, Ergonomics.

[122]  Josua Krause,et al.  A User Study on the Effect of Aggregating Explanations for Interpreting Machine Learning Models , 2018 .

[123]  Ivan Herman,et al.  Graph Visualization and Navigation in Information Visualization: A Survey , 2000, IEEE Trans. Vis. Comput. Graph..

[124]  Deborah L. McGuinness,et al.  Toward establishing trust in adaptive agents , 2008, IUI '08.

[125]  R. Kennedy,et al.  Defense Advanced Research Projects Agency (DARPA). Change 1 , 1996 .

[126]  Hanspeter Pfister,et al.  LineUp: Visual Analysis of Multi-Attribute Rankings , 2013, IEEE Transactions on Visualization and Computer Graphics.

[127]  Eric Horvitz,et al.  Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff , 2019, AAAI.

[128]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.