Designers Characterize Naturalness in Voice User Interfaces: Their Goals, Practices, and Challenges

This work investigates the practices and challenges of voice user interface (VUI) designers. Existing VUI design guidelines recommend that designers strive for natural human-agent conversation. However, the literature leaves a critical gap regarding how designers pursue naturalness in VUIs and what their struggles are in doing so. Bridging this gap is necessary for identifying designers’ needs and supporting them. Our interviews with 20 VUI designers identified 12 ways that designers characterize and approach naturalness in VUIs. We categorized these characteristics into three groupings based on the types of conversational context that each characteristic contributes to: Social, Transactional, and Core. Our results contribute new findings on designers’ challenges, such as a design dilemma in augmenting task-oriented VUIs with social conversations, difficulties in writing for spoken language, lack of proper tool support for imbuing synthesized voice with expressivity, and implications for developing design tools and guidelines.

[1]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[2]  Oliver Lemon,et al.  Hybrid chat and task dialogue for more engaging HRI using reinforcement learning , 2017, 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[3]  Gladys L. Borchers An approach to the problem of oral style , 1936 .

[4]  James L. Ryan,et al.  A conversational system for incremental compilation and execution in a time-sharing environment , 1966, AFIPS '66 (Fall).

[5]  Ismael Pascual-Nieto,et al.  An exploratory study on how children interact with pedagogic conversational agents , 2013, Behav. Inf. Technol..

[6]  Transactional and Interpersonal Conversation Texts in English Textbook , 2014 .

[7]  Robert J. Moore,et al.  Conversational UX Design: A Practitioner's Guide to the Natural Conversation Framework , 2019, Conversational UX Design.

[8]  Matthias Peissner,et al.  Voice User Interface Design , 2004, UP.

[9]  David Griol,et al.  AN AUTOMATIC DIALOG SIMULATION TECHNIQUE TO DEVELOP AND EVALUATE INTERACTIVE CONVERSATIONAL AGENTS , 2013, Appl. Artif. Intell..

[10]  Gregory S. Berns,et al.  Something funny happened to reward , 2004, Trends in Cognitive Sciences.

[11]  Timothy W. Bickmore,et al.  Establishing and maintaining long-term human-computer relationships , 2005, TCHI.

[12]  Michael F. McTear,et al.  The Rise of the Conversational Interface: A New Kid on the Block? , 2016, FETLT.

[13]  Justine Cassell,et al.  Relational agents: a model and implementation of building user trust , 2001, CHI.

[14]  Matthew Kam,et al.  Enabling the Rapid Development and Adoption of Speech-User Interfaces , 2014, Computer.

[15]  Clifford Nass,et al.  The media equation - how people treat computers, television, and new media like real people and places , 1996 .

[16]  Alessio Malizia,et al.  The artificiality of natural user interfaces , 2012, CACM.

[17]  Benjamin R. Cowan,et al.  Voice assistants and older people: some open issues , 2019, CUI.

[18]  Rui Rodrigues,et al.  Multi-touch as a Natural User Interface for elders: A survey , 2011, 6th Iberian Conference on Information Systems and Technologies (CISTI 2011).

[19]  Christian Guetl,et al.  Adding Semantic Web Knowledge to Intelligent Personal Assistant Agents , 2010, ISWC 2010.

[20]  C. Nass,et al.  Machines and Mindlessness , 2000 .

[21]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[22]  Aung Pyae,et al.  Investigating the usability and user experiences of voice user interface: a case of Google home smart speaker , 2018, MobileHCI Adjunct.

[23]  Alexander I. Rudnicky,et al.  A Wizard-of-Oz Study on A Non-Task-Oriented Dialog Systems That Reacts to User Engagement , 2016, SIGDIAL Conference.

[24]  Sanni Siltanen,et al.  Implementing a natural user interface for camera phones using visual tags , 2006, AUIC.

[25]  Jessie Y. C. Chen,et al.  A Model of Human-Robot Trust , 2011 .

[26]  Timothy W. Bickmore,et al.  Designing Relational Agents as Long Term Social Companions for Older Adults , 2012, IVA.

[27]  Khalil Sima'an,et al.  Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship , 2006, Computational Linguistics.

[28]  Changhoon Oh,et al.  TurtleTalk: An Educational Programming Game for Children with Voice User Interface , 2019, CHI Extended Abstracts.

[29]  Randy Allen Harris,et al.  Voice Interaction Design: Crafting the New Conversational Speech Systems , 2004 .

[30]  Sarah Sharples,et al.  Voice Interfaces in Everyday Life , 2018, CHI.

[31]  Henry Holtzman,et al.  Qooqle: search with speech, gesture, and social media , 2011, UbiComp '11.

[32]  B Maegaard,et al.  Acoustic Features of Different Types of Laughter in North Sami Conversational Speech , 2016 .

[33]  Robert Simon Sherratt,et al.  Towards disappearing user interfaces for ubiquitous computing: human enhancement from sixth sense to super senses , 2017, J. Ambient Intell. Humaniz. Comput..

[34]  John K. Zao,et al.  Augmented Brain Computer Interaction Based on Fog Computing and Linked Data , 2014, 2014 International Conference on Intelligent Environments.

[35]  Wolfgang Maier,et al.  Natural Language Input for In-Car Spoken Dialog Systems: How Natural is Natural? , 2017, SIGDIAL Conference.

[36]  James A. Landay,et al.  Evaluating Speech-Based Smart Devices Using New Usability Heuristics , 2018, IEEE Pervasive Computing.

[37]  Marshall D. Abrams,et al.  Measuring and modelling man-computer interaction , 1973, SIGME '73.

[38]  G.H.J. Drieman Differences between written and spoken language: An exploratory study , 1962 .

[39]  Ronald Rosenfeld,et al.  A universal speech interface for appliances , 2004, INTERSPEECH.

[40]  James D. Hollan,et al.  Beyond being there , 1992, CHI.

[41]  Speaking and writing—A study of differences , 1922 .

[42]  Anoop K. Sinha,et al.  Suede: a Wizard of Oz prototyping tool for speech user interfaces , 2000, UIST '00.

[43]  Beth Brownholtz,et al.  Voice user interface principles for a conversational agent , 2004, IUI '04.

[44]  Alessandro Soro,et al.  Evaluation of user gestures in multi-touch interaction: a case study in pair-programming , 2011, ICMI '11.

[45]  Jacques de Villiers,et al.  New tools for interactive speech and language training: Using animated conversational agents in the classrooms of profoundly deaf children , 1999 .

[46]  Rui Yan,et al.  "Chitty-Chitty-Chat Bot": Deep Learning for Conversational AI , 2018, IJCAI.

[47]  C. Nass,et al.  Are Machines Gender Neutral? Gender‐Stereotypic Responses to Computers With Voices , 1997 .

[48]  Hsi-Peng Lu,et al.  Stereotypes or golden rules? Exploring likable voice traits of social robots as active aging companions for tech-savvy baby boomers in Taiwan , 2018, Comput. Hum. Behav..

[49]  Sebastian Boring,et al.  Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct , 2015, International Conference on Human-Computer Interaction with Mobile Devices and Services.

[50]  Jiro Tanaka,et al.  Finger identification and hand gesture recognition techniques for natural user interface , 2013, APCHI.

[51]  S. Eggins,et al.  Analysing Casual Conversation , 1996 .

[52]  V. Braun,et al.  Using thematic analysis in psychology , 2006 .

[53]  James F. Allen,et al.  An architecture for a generic dialogue shell , 2000, Natural Language Engineering.

[54]  Yuan-Yi Fan,et al.  Contour: An Efficient Voice-enabled Workflow for Producing Text-to-Speech Content , 2017, UIST.

[55]  Donald A. Norman,et al.  Natural user interfaces are not natural , 2010, INTR.

[56]  Michael L. Mauldin,et al.  CHATTERBOTS, TINYMUDS, and the Turing Test: Entering the Loebner Prize Competition , 1994, AAAI.

[57]  Samy Bengio,et al.  Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.

[58]  Tat-Seng Chua,et al.  Knowledge-aware Multimodal Dialogue Systems , 2018, ACM Multimedia.

[59]  Christine Murad,et al.  Designing Voice Interfaces: Back to the (Curriculum) Basics , 2020, CHI.

[60]  J. Richards,et al.  The language teaching matrix , 1990 .

[61]  Randall Davis,et al.  Speech and sketching for multimodal design , 2004, IUI '04.

[62]  Jichen Zhu,et al.  Modeling Behavior Patterns with an Unfamiliar Voice User Interface , 2019, UMAP.

[63]  Michal Romaszewski,et al.  Choosing and Modeling the Hand Gesture Database for a Natural User Interface , 2011, Gesture Workshop.

[64]  Jean-Marc Dewaele,et al.  Variation in the Contextuality of Language: An Empirical Measure , 2002 .

[65]  D. Tannen Oral and Literate Strategies in Spoken and Written Narratives. , 1982 .

[66]  Benjamin R. Cowan,et al.  Mapping Perceptions of Humanness in Intelligent Personal Assistant Interaction , 2019, MobileHCI.

[67]  Benjamin R. Cowan,et al.  What Makes a Good Conversation?: Challenges in Designing Truly Conversational Agents , 2019, CHI.

[68]  David V. Pynadath,et al.  Building Trust in a Human-Robot Team with Automatically Generated Explanations , 2015 .

[69]  Cristian Danescu-Niculescu-Mizil,et al.  ConvoKit: A Toolkit for the Analysis of Conversations , 2020, SIGDIAL.

[70]  Andries van Dam User interfaces: disappearing, dissolving, and evolving , 2001, CACM.

[71]  C. D. Forgie,et al.  Automatic Recognition of Spoken Digits , 1958 .

[72]  Emer Gilmartin,et al.  What's the Game and Who's Got the Ball? Genre in Spoken Interaction , 2015, AAAI Spring Symposia.

[73]  Bruce T. Lowerre,et al.  The HARPY speech recognition system , 1976 .

[74]  Bernhard Suhm,et al.  Towards best practices for speech user interface design , 2003, INTERSPEECH.

[75]  Abigail Sellen,et al.  "Like Having a Really Bad PA": The Gulf between User Expectation and Experience of Conversational Agents , 2016, CHI.

[76]  James H. Aylor,et al.  Computer for the 21st Century , 1999, Computer.

[77]  Thomas R. Whitaker,et al.  Conversation as Design. , 1984 .

[78]  Cosmin Munteanu,et al.  "I don't know what you're talking about, HALexa": the case for voice user interface guidelines , 2019, CUI.

[79]  Benjamin R. Cowan,et al.  Design guidelines for hands-free speech interaction , 2018, MobileHCI Adjunct.

[80]  David Vandyke,et al.  On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems , 2016, ACL.

[81]  K. Chang,et al.  Embodiment in conversational interfaces: Rea , 1999, CHI '99.

[82]  B. J. Fogg,et al.  Can computer personalities be human personalities? , 1995, Int. J. Hum. Comput. Stud..

[83]  Gabriel Skantze Error Handling in Spoken Dialogue Systems : Managing Uncertainty, Grounding and Miscommunication , 2007 .

[84]  Matthieu Geist,et al.  A Comprehensive Reinforcement Learning Framework for Dialogue Management Optimization , 2012, IEEE Journal of Selected Topics in Signal Processing.

[85]  Susanne Bødker,et al.  When second wave HCI meets third wave challenges , 2006, NordiCHI '06.

[86]  Peter Dalsgaard,et al.  Note to self: stop calling interfaces "natural" , 2015, Aarhus Conference on Critical Alternatives.

[87]  Gina-Anne Levow,et al.  Designing SpeechActs: issues in speech user interfaces , 1995, CHI '95.

[88]  Deborah E. White,et al.  Thematic Analysis , 2017 .

[89]  Tanja Schultz,et al.  SPICE: web-based tools for rapid language adaptation in speech processing systems , 2007, INTERSPEECH.