One Voice Fits All?

When a smart device talks, what should its voice sound like? Voice-enabled devices are becoming a ubiquitous presence in our everyday lives. Simultaneously, speech synthesis technology is rapidly improving, making it possible to generate increasingly varied and realistic computerized voices. Despite the flexibility and richness of expression that technology now affords, today's most common voice assistants often have female-sounding, polite, and playful voices by default. In this paper, we examine the social consequences of voice design, and introduce a simple research framework for understanding how voice affects how we perceive and interact with smart devices. Based on the foundational paradigm of computers as social actors, and informed by research in human-robot interaction, this framework demonstrates how voice design depends on a complex interplay between characteristics of the user, device, and context. Through this framework, we propose a set of guiding questions to inform future research in the space of voice design for smart devices.

[1]  Shaun W. Lawson,et al.  Voice as a Design Material: Sociophonetic Inspired Design Strategies in Human-Computer Interaction , 2019, CHI.

[2]  Siddhartha S. Srinivasa,et al.  Gracefully mitigating breakdowns in robotic services , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[3]  John C. Tang,et al.  More to Meetings: Challenges in Using Speech-Based Technology to Support Meetings , 2017, CSCW.

[4]  Tom Rodden,et al.  A Multimodal Approach to Assessing User Experiences with Agent Helpers , 2016, ACM Trans. Interact. Intell. Syst..

[5]  John Zimmerman,et al.  Re-Embodiment and Co-Embodiment: Exploration of social presence for robots and conversational agents , 2019, Conference on Designing Interactive Systems.

[6]  Sean Andrist,et al.  Effects of Culture on the Credibility of Robot Speech: A Comparison between English and Arabic , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[7]  E. Goffman The Presentation of Self in Everyday Life , 1959 .

[8]  Jens Edlund,et al.  The State of Speech in HCI: Trends, Themes and Challenges , 2018, Interact. Comput..

[9]  Julia Hirschberg,et al.  Deep Personality Recognition for Deception Detection , 2018, INTERSPEECH.

[10]  Susan R. Fussell,et al.  Anthropomorphic Interactions with a Robot and Robot–like Agent , 2008 .

[11]  Shruti Sannon,et al.  "Alexa is my new BFF": Social Roles, User Satisfaction, and Personification of the Amazon Echo , 2017, CHI Extended Abstracts.

[12]  Kerstin Fischer,et al.  Levels of embodiment: Linguistic analyses of factors influencing HRI , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[13]  Maneesh Agrawala,et al.  How to Design Voice Based Navigation for How-To Videos , 2019, CHI.

[14]  Wendy Ju,et al.  WoZ Way: Enabling Real-time Remote Interaction Prototyping & Observation in On-road Vehicles , 2017, CSCW.

[15]  Henriette Cramer,et al.  "Play PRBLMS": Identifying and Correcting Less Accessible Content in Voice Interfaces , 2018, CHI.

[16]  Lauren A. Schmidt,et al.  Sex, syntax and semantics. , 2003 .

[17]  Heather Pon-Barry,et al.  Effects of voice-adaptation and social dialogue on perceptions of a robotic learning companion , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[18]  Josephine Lau,et al.  Alexa, Are You Listening? , 2018, Proc. ACM Hum. Comput. Interact..

[19]  K. M. Lee,et al.  Can robots manifest personality? : An empirical test of personality recognition, social responses, and social presence in human-robot interaction , 2006 .

[20]  Ilaria Torre,et al.  Can you Tell the Robot by the Voice? An Exploratory Study on the Role of Voice in the Perception of Robots , 2019, 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[21]  Jichen Zhu,et al.  Patterns for How Users Overcome Obstacles in Voice User Interfaces , 2018, CHI.

[22]  Elizabeth D. Mynatt,et al.  An architecture for transforming graphical interfaces , 1994, UIST '94.

[23]  Cassia Valentini-Botinhao,et al.  Are we using enough listeners? no! - an empirically-supported critique of interspeech 2014 TTS evaluations , 2015, INTERSPEECH.

[24]  Guy Deutscher,et al.  Through the Language Glass: Why the World Looks Different in Other Languages , 2010 .

[25]  Susan R. Fussell,et al.  How a robot should give advice , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[26]  Walter S. Lasecki,et al.  Accessible Voice Interfaces , 2018, CSCW Companion.

[27]  S. Shyam Sundar,et al.  Feminizing Robots: User Responses to Gender Cues on Robot Body and Screen , 2016, CHI Extended Abstracts.

[28]  Jennifer Marlow,et al.  Designing for Workplace Reflection: A Chat and Voice-Based Conversational Agent , 2018, Conference on Designing Interactive Systems.

[29]  C. Nass,et al.  Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction. , 2001, Journal of experimental psychology. Applied.

[30]  Jason C. Yip,et al.  Communication Breakdowns Between Families and Alexa , 2019, CHI.

[31]  B. Fogg,et al.  Motivating, Influencing, and Persuading Users: An Introduction To Captology , 2007 .

[32]  William W. Gaver The SonicFinder: An Interface That Uses Auditory Icons , 1989, Hum. Comput. Interact..

[33]  Mark West,et al.  I'd blush if I could: closing gender divides in digital skills through education , 2019 .

[34]  Lone Koefoed Hansen,et al.  Intimate Futures: Staying with the Trouble of Digital Personal Assistants through Design Fiction , 2018, Conference on Designing Interactive Systems.

[35]  Roger K. Moore Is Spoken Language All-or-Nothing? Implications for Future Speech-Based Human-Machine Interaction , 2016, IWSDS.

[36]  Maya Cakmak,et al.  Characterizing the Design Space of Rendered Robot Faces , 2018, 2018 13th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[37]  Frank Bentley,et al.  Music, Search, and IoT , 2019, ACM Trans. Comput. Hum. Interact..

[38]  Benjamin R. Cowan,et al.  Design guidelines for hands-free speech interaction , 2018, MobileHCI Adjunct.

[39]  Daniela Karin Rosner,et al.  Broken probes: toward the design of worn media , 2014, Personal and Ubiquitous Computing.

[40]  Leila Takayama,et al.  Help Me Please: Robot Politeness Strategies for Soliciting Help From Humans , 2016, CHI.

[41]  Juan Manuel Montero-Martínez,et al.  Emotional speech synthesis: from speech database to TTS , 1998, ICSLP.

[42]  Rana El Kaliouby,et al.  On the Future of Personal Assistants , 2016, CHI Extended Abstracts.

[43]  Wendy Ju,et al.  Good vibrations: How consequential sounds affect perception of robotic arms , 2017, 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[44]  Ben Shneiderman,et al.  The limits of speech recognition , 2000, CACM.

[45]  D. Pillemer,et al.  Children's sex-related stereotyping of colors. , 1990, Child development.

[46]  Sarah Sharples,et al.  "Do Animals Have Accents?": Talking with Agents in Multi-Party Conversation , 2017, CSCW.

[47]  Hsi-Peng Lu,et al.  Stereotypes or golden rules? Exploring likable voice traits of social robots as active aging companions for tech-savvy baby boomers in Taiwan , 2018, Comput. Hum. Behav..

[48]  Christoph Bartneck,et al.  Robots And Racism , 2018, 2018 13th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[49]  A. Todorov,et al.  How Do You Say ‘Hello’? Personality Impressions from Brief Novel Voices , 2014, PloS one.

[50]  Amy Ogan,et al.  Automated Pitch Convergence Improves Learning in a Social, Teachable Robot for Middle School Mathematics , 2018, AIED.

[51]  Meera M. Blattner,et al.  Earcons and Icons: Their Structure and Common Design Principles (Abstract only) , 1989, SGCH.

[52]  Martin Porcheron Conversational agent use in a café , 2017 .

[53]  Shrikanth S. Narayanan,et al.  Improving Gender Identification in Movie Audio Using Cross-Domain Data , 2018, INTERSPEECH.

[54]  Andreea Danielescu,et al.  A Bot is Not a Polyglot: Designing Personalities for Multi-Lingual Conversational Agents , 2018, CHI Extended Abstracts.

[55]  Clifford Nass,et al.  Computers are social actors , 1994, CHI '94.

[56]  N. Ambady,et al.  Half a minute: Predicting teacher evaluations from thin slices of nonverbal behavior and physical attractiveness. , 1993 .

[57]  Jodi Forlizzi,et al.  "Hey Alexa, What's Up?": A Mixed-Methods Studies of In-Home Conversational Agent Usage , 2018, Conference on Designing Interactive Systems.

[58]  Bilge Mutlu,et al.  Task Structure and User Attributes as Elements of Human-Robot Interaction Design , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[59]  Haizhou Li,et al.  Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion , 2018, INTERSPEECH.

[60]  Os Keyes,et al.  The Misgendering Machines , 2018, Proc. ACM Hum. Comput. Interact..

[61]  Roger K. Moore Appropriate Voices for Artefacts: Some Key Insights , 2017 .

[62]  Sarah Sharples,et al.  Voice Interfaces in Everyday Life , 2018, CHI.

[63]  C. Judd,et al.  What the Voice Reveals: Within- and Between-Category Stereotyping on the Basis of Voice , 2006, Personality & social psychology bulletin.

[64]  Khalil Sima'an,et al.  Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship , 2006, Computational Linguistics.

[65]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[66]  Benjamin R. Cowan,et al.  Siri, Echo and Performance: You have to Suffer Darling , 2019, CHI Extended Abstracts.

[67]  Astrid M. Rosenthal-von der Pütten,et al.  The Peculiarities of Robot Embodiment (EmCorp-Scale) : Development, Validation and Initial Test of the Embodiment and Corporeality of Artificial Agents Scale , 2018, 2018 13th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[68]  Abigail Sellen,et al.  "Like Having a Really Bad PA": The Gulf between User Expectation and Experience of Conversational Agents , 2016, CHI.

[69]  Wendy Ju,et al.  Making Noise Intentional: A Study of Servo Sound Perception , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[70]  Taezoon Park,et al.  When stereotypes meet robots: The double-edge sword of robot gender and personality in human-robot interaction , 2014, Comput. Hum. Behav..