Let Me Ask You This: How Can a Voice Assistant Elicit Explicit User Feedback?

Voice assistants offer users access to an increasing variety of personalized functionalities. Researchers and engineers who build these experiences rely on various signals from users to create the machine learning models powering them. One type of signal is explicit feedback. While collecting explicit user feedback in situ via voice assistants would help improve and inspect the underlying models, from a user perspective it can be disruptive to the overall experience, and the user might not feel compelled to respond. However, careful design can help alleviate the friction in the experience. In this paper, we explore the opportunities and the design space for voice assistant explicit feedback elicitation. First, we present four usage categories of explicit feedback in situ for model evaluation and improvement, derived from interviews with machine learning practitioners. Then, using realistic scenarios generated for each category, we conducted an online study to evaluate multiple voice assistant designs. Our results reveal that when the voice assistant is introduced as a learner or a collaborator, users were more willing to respond to its request for feedback and felt less disruptive. In addition, giving users instructions on how to initiate feedback themselves can reduce the perceived disruptiveness compared to asking users for feedback directly. Based on our findings, we discuss the implications and potential future directions for designing voice assistants to elicit user feedback for personalized voice experiences.

[1]  Jichen Zhu,et al.  The Impact of User Characteristics and Preferences on Performance with an Unfamiliar Voice User Interface , 2019, CHI.

[2]  S. Porter,et al.  Overcoming survey research problems , 2004 .

[3]  Loren G. Terveen,et al.  Understanding How People Use Natural Language to Ask for Recommendations , 2017, RecSys.

[4]  V. Venkatesh,et al.  AGE DIFFERENCES IN TECHNOLOGY ADOPTION DECISIONS: IMPLICATIONS FOR A CHANGING WORK FORCE , 2000 .

[5]  F. Maxwell Harper,et al.  An Economic Model of User Rating in an Online Recommender System , 2005, User Modeling.

[6]  Shwetak N. Patel,et al.  SwitchBack: Using Focus and Saccade Tracking to Guide Users' Attention for Mobile Task Resumption , 2015, CHI.

[7]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[8]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[9]  Nuria Oliver,et al.  I Like It... I Like It Not: Evaluating User Ratings Noise in Recommender Systems , 2009, UMAP.

[10]  John Zimmerman,et al.  Rapidly Exploring Application Design Through Speed Dating , 2007, UbiComp.

[11]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[12]  David R. Large,et al.  "It's small talk, jim, but not as we know it.": engendering trust through human-agent conversation in an autonomous, self-driving car , 2019, CUI.

[13]  S. Shyam Sundar,et al.  Will Deleting History Make Alexa More Trustworthy?: Effects of Privacy and Content Customization on User Experience of Smart Speakers , 2020, CHI.

[14]  Cecilia Mascolo,et al.  Mobile-Based Experience Sampling for Behaviour Research , 2015, Emotions and Personality in Personalized Services.

[15]  Frank Bentley,et al.  Understanding the Long-Term Use of Smart Speaker Assistants , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[16]  Roger K. Moore Is Spoken Language All-or-Nothing? Implications for Future Speech-Based Human-Machine Interaction , 2016, IWSDS.

[17]  Wei Wang,et al.  Recommender system application developments: A survey , 2015, Decis. Support Syst..

[18]  Catholijn M. Jonker,et al.  Designing interfaces for explicit preference elicitation: a user-centered investigation of preference representation and elicitation process , 2011, User Modeling and User-Adapted Interaction.

[19]  F. Conrad,et al.  Interactive Feedback Can Improve the Quality of Responses in Web Surveys , 2005 .

[20]  Gregory D. Abowd,et al.  Towards a Better Understanding of Context and Context-Awareness , 1999, HUC.

[21]  Tomás Horváth,et al.  Opinion-Driven Matrix Factorization for Rating Prediction , 2013, UMAP.

[22]  Chris Van Pelt,et al.  Designing a scalable crowdsourcing platform , 2012, SIGMOD Conference.

[23]  Bernd Ludwig,et al.  InCarMusic: Context-Aware Music Recommendations in a Car , 2011, EC-Web.

[24]  Gregory D. Abowd,et al.  Farther Than You May Think: An Empirical Investigation of the Proximity of Users to Their Mobile Phones , 2006, UbiComp.

[25]  Gediminas Adomavicius,et al.  Context-aware recommender systems , 2008, RecSys '08.

[26]  Frank Bentley,et al.  Music, Search, and IoT , 2019, ACM Trans. Comput. Hum. Interact..

[27]  M. Taylor,et al.  Consequences of individual feedback on behavior in organizations. , 1979 .

[28]  Kim-Phuong L. Vu,et al.  Privacy Concerns for Use of Voice Activated Personal Assistant in the Public Space , 2015, Int. J. Hum. Comput. Interact..

[29]  John T. Stasko,et al.  Be Quiet? Evaluating Proactive and Reactive User Interface Assistants , 2003, INTERACT.

[30]  Cecilia Mascolo,et al.  EmotionSense: a mobile phones based adaptive platform for experimental social psychology research , 2010, UbiComp.

[31]  Chinmay Kulkarni,et al.  One Voice Fits All? Social Implications and Research Challenges of Designing Voices for Smart Devices , 2019 .

[32]  Scott C. Roesch,et al.  Testing the latent factor structure and construct validity of the Ten-Item Personality Inventory , 2009 .

[33]  Daniela Braga,et al.  Evaluating Voice Quality and Speech Synthesis Using Crowdsourcing , 2013, TSD.

[34]  Walid Maalej,et al.  When users become collaborators: towards continuous and context-aware user input , 2009, OOPSLA Companion.

[35]  Jennifer Thom-Santelli,et al.  Giving Voice to Silent Data: Designing with Personal Music Listening History , 2020, CHI.

[36]  Judith Masthoff,et al.  Designing and Evaluating Explanations for Recommender Systems , 2011, Recommender Systems Handbook.

[37]  Florian Alt,et al.  At Your Service: Designing Voice Assistant Personalities to Improve Automotive User Interfaces , 2019, CHI.

[38]  Gloria Mark,et al.  Tell Me About Yourself , 2019, ACM Trans. Comput. Hum. Interact..

[39]  Benjamin M. Marlin,et al.  Modeling User Rating Profiles For Collaborative Filtering , 2003, NIPS.

[40]  Dick de Waard,et al.  A simple procedure for the assessment of acceptance of advanced transport telematics , 1997 .

[41]  Bruce A. MacDonald,et al.  Age and gender factors in user acceptance of healthcare robots , 2009, RO-MAN 2009 - The 18th IEEE International Symposium on Robot and Human Interactive Communication.

[42]  Martin Szomszor,et al.  Comparison of implicit and explicit feedback from an online music recommendation service , 2010, HetRec '10.

[43]  Yehuda Koren,et al.  Collaborative filtering with temporal dynamics , 2009, KDD.

[44]  LeeUichin,et al.  Interruptibility for In-vehicle Multitasking , 2020 .

[45]  C. Gallagher Extending the Linear Model With R: Generalized Linear, Mixed Effects and Nonparametric Regression Models , 2007 .

[46]  Matthias Peissner,et al.  Voice User Interface Design , 2004, UP.

[47]  B. J. Fogg,et al.  Can computers be teammates? , 1996, Int. J. Hum. Comput. Stud..

[48]  Predrag V. Klasnja,et al.  Exploring Privacy Concerns about Personal Sensing , 2009, Pervasive.

[49]  Sonia Chiasson,et al.  Understanding Fitness Tracker Users' Security and Privacy Knowledge, Attitudes and Behaviours , 2020, CHI.

[50]  Anja Bachmann,et al.  ESMAC: A Web-Based Configurator for Context-Aware Experience Sampling Apps in Ambulatory Assessment , 2015, EAI Endorsed Trans. Ambient Syst..

[51]  Shwetak N. Patel,et al.  FarmChat , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[52]  Sarah Sharples,et al.  Voice Interfaces in Everyday Life , 2018, CHI.

[53]  C. Judd,et al.  What the Voice Reveals: Within- and Between-Category Stereotyping on the Basis of Voice , 2006, Personality & social psychology bulletin.

[54]  Li Chen,et al.  A user-centric evaluation framework for recommender systems , 2011, RecSys '11.

[55]  Chinmay Kulkarni,et al.  Vitro: Designing a Voice Assistant for the Scientific Lab Workplace , 2019, Conference on Designing Interactive Systems.

[56]  James A. Landay,et al.  MyExperience: a system for in situ tracing and capturing of user feedback on mobile phones , 2007, MobiSys '07.

[57]  Eugene Cho,et al.  Hey Google, Can I Ask You Something in Private? , 2019, CHI.

[58]  E. Diener,et al.  Experience Sampling: Promises and Pitfalls, Strengths and Weaknesses , 2003 .

[59]  Clifford Nass,et al.  Computers are social actors , 1994, CHI '94.

[60]  Benjamin R. Cowan,et al.  "What can i help you with?": infrequent users' experiences of intelligent personal assistants , 2017, MobileHCI.

[61]  Gloria Mark,et al.  Tell Me About Yourself , 2019, ACM Trans. Comput. Hum. Interact..

[62]  Stephen R. Porter,et al.  Multiple Surveys of Students and Survey Fatigue. , 2004 .

[63]  Peter Brusilovsky,et al.  Explaining recommendations in an interactive hybrid social recommender , 2019, IUI.

[64]  Michelle X. Zhou,et al.  Who should be my teammates: using a conversational agent to understand individuals and help teaming , 2019, IUI.

[65]  Jaime Teevan,et al.  Explicit In Situ User Feedback for Web Search Results , 2016, SIGIR.

[66]  Gita Taasoobshirazi,et al.  Promoting attitude change and expressed willingness to take action toward climate change in college students , 2012 .

[67]  Michael S. Bernstein,et al.  Conceptual Metaphors Impact Perceptions of Human-AI Collaboration , 2020, Proc. ACM Hum. Comput. Interact..

[68]  Lei Zheng,et al.  Joint Deep Modeling of Users and Items Using Reviews for Recommendation , 2017, WSDM.

[69]  Matthias Söllner,et al.  AI-Based Digital Assistants , 2019, Business & Information Systems Engineering.

[70]  Robert A. Virzi,et al.  Refining the Test Phase of Usability Evaluation: How Many Subjects Is Enough? , 1992 .

[71]  Riender Happee,et al.  Using Crowdflower to Study the Relationship between Self-Reported Violations and Traffic Accidents , 2015 .

[72]  Donghua Tao,et al.  Intention to Use and Actual Use of Electronic Information Resources: Further Exploring Technology Acceptance Model (TAM) , 2009, AMIA.

[73]  Nadir Weibel,et al.  Computational Ethnography: Automated and Unobtrusive Means for Collecting Data In Situ for Human–Computer Interaction Evaluation Studies , 2015 .

[74]  Jose M. Such,et al.  More than Smart Speakers: Security and Privacy Perceptions of Smart Home Personal Assistants , 2019, SOUPS @ USENIX Security Symposium.

[75]  Jose M. Such,et al.  Privacy Norms for Smart Home Personal Assistants , 2021, CHI.

[76]  E. Weigand The Routledge Handbook of Language and Dialogue , 2017 .

[77]  Biplav Srivastava,et al.  Towards an Optimal Dialog Strategy for Information Retrieval Using Both Open- and Close-ended Questions , 2018, IUI.

[78]  Louis-Philippe Morency,et al.  It's only a computer: Virtual humans increase willingness to disclose , 2014, Comput. Hum. Behav..

[79]  M. Alexiades Ethnobotany in the Third Millennium: expectations and unresolved issues , 2003 .

[80]  Hongxin Hu,et al.  Measuring the Effectiveness of Privacy Policies for Voice Assistant Applications , 2020, ACSAC.

[81]  Traum. David,et al.  Computational Approaches to Dialogue , 2017 .

[82]  E. Altenmüller,et al.  Does music listening in a social context alter experience? A physiological and psychological perspective on emotion , 2011 .

[83]  David M. Nichols,et al.  Implicit Rating and Filtering , 1998 .

[84]  Jennifer Thom-Santelli,et al.  Play Music: User Motivations and Expectations for Non-Specific Voice Queries , 2020, ISMIR.

[85]  Derry O'Sullivan,et al.  Explicit vs Implicit Profiling - A Case-Study in Electronic Programme Guides , 2003, IJCAI.

[86]  Chris Speed,et al.  The Ethnobot: Gathering Ethnographies in the Age of IoT , 2018, CHI.

[87]  Auk Kim,et al.  Interruptibility for In-vehicle Multitasking , 2020, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[88]  Catholijn M. Jonker,et al.  Factors Influencing User Motivation for Giving Online Preference Feedback , 2010 .

[89]  Roger K. Moore Appropriate Voices for Artefacts: Some Key Insights , 2017 .

[90]  Abigail Sellen,et al.  "Like Having a Really Bad PA": The Gulf between User Expectation and Experience of Conversational Agents , 2016, CHI.

[91]  Venkatesh,et al.  A Longitudinal Field Investigation of Gender Differences in Individual Technology Adoption Decision-Making Processes. , 2000, Organizational behavior and human decision processes.

[92]  Markus Zanker,et al.  Collaborative Feature-Combination Recommender Exploiting Explicit and Implicit User Feedback , 2009, 2009 IEEE Conference on Commerce and Enterprise Computing.

[93]  Ass,et al.  Can computers be teammates? , 1996 .

[94]  WangWei,et al.  Recommender system application developments , 2015 .