Human-AI Collaboration for UX Evaluation: Effects of Explanation and Synchronization

Analyzing usability test videos is arduous. Although recent research showed the promise of AI in assisting with such tasks, it remains largely unknown how AI should be designed to facilitate effective collaboration between user experience (UX) evaluators and AI. Inspired by the concepts of agency and work context in human and AI collaboration literature, we studied two corresponding design factors for AI-assisted UX evaluation: explanations and synchronization. Explanations allow AI to further inform humans how it identifies UX problems from a usability test session; synchronization refers to the two ways humans and AI collaborate: synchronously and asynchronously. We iteratively designed a tool-AI Assistant-with four versions of UIs corresponding to the two levels of explanations (with/without) and synchronization (sync/async). By adopting a hybrid wizard-of-oz approach to simulating an AI with reasonable performance, we conducted a mixed-method study with 24 UX evaluators identifying UX problems from usability test videos using AI Assistant. Our quantitative and qualitative results show that AI with explanations, regardless of being presented synchronously or asynchronously, provided better support for UX evaluators' analysis and was perceived more positively; when without explanations, synchronous AI better improved UX evaluators' performance and engagement compared to the asynchronous AI. Lastly, we present the design implications for AI-assisted UX evaluation and facilitating more effective human-AI collaboration.

[1]  Mingming Fan,et al.  CoUX: Collaborative Visual Analysis of Think-Aloud Usability Test Videos for Digital Interfaces , 2021, IEEE Transactions on Visualization and Computer Graphics.

[2]  Yunfeng Zhang,et al.  Explainable Active Learning (XAL) , 2021, Proc. ACM Hum. Comput. Interact..

[3]  Hoh Peter In,et al.  Detecting usability problems in mobile applications on the basis of dissimilarity in user behavior , 2020, Int. J. Hum. Comput. Stud..

[4]  Raymond Fok,et al.  Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance , 2020, CHI.

[5]  Eric Horvitz,et al.  Learning to Complement Humans , 2020, IJCAI.

[6]  Mourad Khayati,et al.  OpenCrowd: A Human-AI Collaborative Approach for Finding Social Influencers via Open-Ended Answers Aggregation , 2020, WWW.

[7]  Ben Shneiderman,et al.  Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy , 2020, Int. J. Hum. Comput. Interact..

[8]  Krzysztof Z. Gajos,et al.  Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems , 2020, IUI.

[9]  Han Liu,et al.  "Why is 'Chicago' deceptive?" Towards Building Model-Driven Tutorials for Humans , 2020, CHI.

[10]  Q. Liao,et al.  Questioning the AI: Informing Design Practices for Explainable AI User Experiences , 2020, CHI.

[11]  Yunfeng Zhang,et al.  Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making , 2020, FAT*.

[12]  Yue Li,et al.  VisTA: Integrating Machine Intelligence with Visualization to Support the Investigation of Think-Aloud Sessions , 2020, IEEE Transactions on Visualization and Computer Graphics.

[13]  Lauren Wilcox,et al.  "Hello AI": Uncovering the Onboarding Needs of Medical Practitioners for Human-AI Collaborative Decision-Making , 2019, Proc. ACM Hum. Comput. Interact..

[14]  BEN GREEN,et al.  The Principles and Limits of Algorithm-in-the-Loop Decision Making , 2019, Proc. ACM Hum. Comput. Interact..

[15]  Maximilian Timo Stauss,et al.  Discovering the Sweet Spot of Human-Computer Configurations , 2019, Proc. ACM Hum. Comput. Interact..

[16]  Amit Dhurandhar,et al.  One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques , 2019, ArXiv.

[17]  Parikshit Ram,et al.  Human-AI Collaboration in Data Science , 2019, Proc. ACM Hum. Comput. Interact..

[18]  Jaime S. Cardoso,et al.  Machine Learning Interpretability: A Survey on Methods and Metrics , 2019, Electronics.

[19]  Khai N. Truong,et al.  Concurrent Think-Aloud Verbalizations and Usability Problems , 2019, ACM Trans. Comput. Hum. Interact..

[20]  Eric Horvitz,et al.  Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff , 2019, AAAI.

[21]  Jeffrey Heer,et al.  Local Decision Pitfalls in Interactive Machine Learning , 2019, ACM Trans. Comput. Hum. Interact..

[22]  Haiyi Zhu,et al.  Explaining Decision-Making Algorithms through UI: Strategies to Help Non-Expert Stakeholders , 2019, CHI.

[23]  Paul N. Bennett,et al.  Will You Accept an Imperfect AI?: Exploring Designs for Adjusting End-user Expectations of AI Systems , 2019, CHI.

[24]  Patrick Harms,et al.  Automated Usability Evaluation of Virtual Reality Applications , 2019, ACM Trans. Comput. Hum. Interact..

[25]  John Zimmerman,et al.  Unremarkable AI: Fitting Intelligent Decision Support into Critical, Clinical Decision-Making Processes , 2019, CHI.

[26]  Martin Wattenberg,et al.  Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making , 2019, CHI.

[27]  Kristian Kersting,et al.  Explanatory Interactive Machine Learning , 2019, AIES.

[28]  Mark O. Riedl,et al.  Automated rationale generation: a technique for explainable AI and its effects on human perceptions , 2019, IUI.

[29]  Vivian Lai,et al.  On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection , 2018, FAT.

[30]  Amit Dhurandhar,et al.  TED: Teaching AI to Explain its Decisions , 2018, AIES.

[31]  Amina Adadi,et al.  Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) , 2018, IEEE Access.

[32]  Kwan-Liu Ma,et al.  Chart Constellations: Effective Chart Summarization for Collaborative and Multi‐User Analyses , 2018, Comput. Graph. Forum.

[33]  Lalana Kagal,et al.  Explaining Explanations: An Overview of Interpretability of Machine Learning , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[34]  Daniel G. Goldstein,et al.  Manipulating and Measuring Model Interpretability , 2018, CHI.

[35]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[36]  Jun Zhao,et al.  'It's Reducing a Human Being to a Percentage': Perceptions of Justice in Algorithmic Decisions , 2018, CHI.

[37]  Min Kyung Lee Understanding perception of algorithmic decisions: Fairness, trust, and emotion in response to algorithmic management , 2018, Big Data Soc..

[38]  Therese Peffer,et al.  Deep learning for automatic usability evaluations based on images: A case study of the usability heuristics of thermostats , 2017 .

[39]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[40]  Fabio Paternò,et al.  Customizable automatic detection of bad usability smells in mobile accessed web applications , 2017, MobileHCI.

[41]  Trevor Darrell,et al.  Localizing Moments in Video with Natural Language , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Angèle Christin Algorithms in practice: Comparing web journalism and criminal justice , 2017 .

[43]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[44]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[45]  Melanie Tory,et al.  Exploiting analysis history to support collaborative data analysis , 2015, Graphics Interface.

[46]  Maya Cakmak,et al.  Power to the People: The Role of Humans in Interactive Machine Learning , 2014, AI Mag..

[47]  Niklas Elmqvist,et al.  PolyChrome: A Cross-Device Framework for Collaborative Web Visualization , 2014, ITS '14.

[48]  Melanie Tory,et al.  Supporting Communication and Coordination in Collaborative Sensemaking , 2014, IEEE Transactions on Visualization and Computer Graphics.

[49]  Eric Gilbert,et al.  VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.

[50]  Selim Zaim,et al.  A machine learning-based usability evaluation method for eLearning systems , 2013, Decis. Support Syst..

[51]  Niklas Elmqvist,et al.  Branch-explore-merge: facilitating real-time revision control in collaborative visual exploration , 2012, ITS.

[52]  Kasper Hornbæk,et al.  Analysis in practical usability evaluation: a survey study , 2012, CHI.

[53]  Tingting Zhao,et al.  Exploring Think-Alouds in Usability Testing: An International Survey , 2012, IEEE Transactions on Professional Communication.

[54]  Jeremy P. Birnholtz,et al.  Tracking changes in collaborative writing: edits, visibility and group maintenance , 2012, CSCW.

[55]  Hans Hagen,et al.  Collaborative visualization: Definition, challenges, and research agenda , 2011, Inf. Vis..

[56]  Anind K. Dey,et al.  Investigating intelligibility for uncertain context-aware applications , 2011, UbiComp '11.

[57]  Jeffrey Heer,et al.  CommentSpace: structured support for collaborative visual analysis , 2011, CHI.

[58]  Mary Czerwinski,et al.  An exploratory study of co-located collaborative visual analytics around a tabletop display , 2010, 2010 IEEE Symposium on Visual Analytics Science and Technology.

[59]  Madhu C. Reddy,et al.  Understanding together: sensemaking in collaborative information seeking , 2010, CSCW '10.

[60]  M. Sheelagh T. Carpendale,et al.  Lark: Coordinating Co-located Collaboration with Information Visualization , 2009, IEEE Transactions on Visualization and Computer Graphics.

[61]  Martin Wattenberg,et al.  ManyEyes: a Site for Visualization at Internet Scale , 2007, IEEE Transactions on Visualization and Computer Graphics.

[62]  Mark S. Young,et al.  Driving automation: Learning from aviation about design philosophies , 2007 .

[63]  Jan Stage,et al.  What happened to remote usability testing?: an empirical study of three methods , 2007, CHI.

[64]  Martin Wattenberg,et al.  Voyagers and voyeurs: supporting asynchronous collaborative information visualization , 2007, CHI.

[65]  Martin Wattenberg,et al.  Designing for social data analysis , 2006, IEEE Transactions on Visualization and Computer Graphics.

[66]  S. L. Sporer,et al.  Paraverbal indicators of deception: a meta‐analytic synthesis , 2006 .

[67]  Li Chen,et al.  Trust building with explanation interfaces , 2006, IUI '06.

[68]  Vanda Broughton,et al.  Sage Dictionary of Statistics: A Practical Resource for Students in the Social Sciences , 2005 .

[69]  Jiucang Hao,et al.  Emotion recognition by speech signals , 2003, INTERSPEECH.

[70]  Morten Hertzum,et al.  The Evaluator Effect: A Chilling Fact About Usability Evaluation Methods , 2001, Int. J. Hum. Comput. Interact..

[71]  Jerry Alan Fails,et al.  Interactive machine learning , 2003, IUI '03.

[72]  Charles J. Kacmar,et al.  Developing and Validating Trust Measures for e-Commerce: An Integrative Typology , 2002, Inf. Syst. Res..

[73]  Christopher D. Wickens,et al.  A model for types and levels of human interaction with automation , 2000, IEEE Trans. Syst. Man Cybern. Part A.

[74]  John F. Affisco,et al.  Task and technology fit: a comparison of two technologies for synchronous and asynchronous group communication , 1999, Inf. Manag..

[75]  N. L. Chervany,et al.  Initial Trust Formation in New Organizational Relationships , 1998 .

[76]  Deborah Hix,et al.  Remote usability evaluation: can users report their own critical incidents? , 1998, CHI Conference Summary.

[77]  Jill Gerhardt-Powals Cognitive engineering principles for enhancing human-computer performance , 1996, Int. J. Hum. Comput. Interact..

[78]  James R. Lewis,et al.  IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use , 1995, Int. J. Hum. Comput. Interact..

[79]  Robert Johansen,et al.  Groupware: Computer Support for Business Teams , 1988 .

[80]  A. Strauss Work and the Division of Labor , 1985 .

[81]  K. A. Ericsson,et al.  Protocol Analysis: Verbal Reports as Data , 1984 .

[82]  L. R. Peterson,et al.  Concurrent verbal activity. , 1969 .

[83]  Jian Zhao,et al.  Supporting Handoff in Asynchronous Collaborative Sensemaking Using Knowledge-Transfer Graphs , 2018, IEEE Transactions on Visualization and Computer Graphics.

[84]  Gustavo Rossi,et al.  Automatic detection of usability smells in web applications , 2017, Int. J. Hum. Comput. Stud..

[85]  Jian Zhao,et al.  Annotation Graphs: A Graph-Based Visualization for Meta-Analysis of Data Based on User-Authored Annotations , 2017, IEEE Transactions on Visualization and Computer Graphics.

[86]  Subhashini Venugopalan,et al.  Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.

[87]  Melanie Tory,et al.  Visualizing Dimension Coverage to Support Exploratory Analysis , 2017, IEEE Transactions on Visualization and Computer Graphics.

[88]  D. Norman The Design of Everyday Things: Revised and Expanded Edition , 2013 .

[89]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[90]  Peter Pirolli,et al.  Information Foraging , 2009, Encyclopedia of Database Systems.

[91]  Kristin A. Cook,et al.  Illuminating the Path: The Research and Development Agenda for Visual Analytics , 2005 .

[92]  Claudia Müller-Birn,et al.  195 Discovering the Sweet Spot of Human—Computer Configurations: A Case Study in Information Extraction , 2022 .