The Human Takes It All: Humanlike Synthesized Voices Are Perceived as Less Eerie and More Likable. Evidence From a Subjective Ratings Study

Background: The increasing involvement of social robots in human lives raises the question as to how humans perceive social robots. Little is known about human perception of synthesized voices. Aim: To investigate which synthesized voice parameters predict the speaker's eeriness and voice likability; to determine if individual listener characteristics (e.g., personality, attitude toward robots, age) influence synthesized voice evaluations; and to explore which paralinguistic features subjectively distinguish humans from robots/artificial agents. Methods: 95 adults (62 females) listened to randomly presented audio-clips of three categories: synthesized (Watson, IBM), humanoid (robot Sophia, Hanson Robotics), and human voices (five clips/category). Voices were rated on intelligibility, prosody, trustworthiness, confidence, enthusiasm, pleasantness, human-likeness, likability, and naturalness. Speakers were rated on appeal, credibility, human-likeness, and eeriness. Participants' personality traits, attitudes to robots, and demographics were obtained. Results: The human voice and human speaker characteristics received reliably higher scores on all dimensions except for eeriness. Synthesized voice ratings were positively related to participants' agreeableness and neuroticism. Females rated synthesized voices more positively on most dimensions. Surprisingly, interest in social robots and attitudes toward robots played almost no role in voice evaluation. Contrary to the expectations of an uncanny valley, when the ratings of human-likeness for both the voice and the speaker characteristics were higher, they seemed less eerie to the participants. Moreover, when the speaker's voice was more humanlike, it was more liked by the participants. This latter point was only applicable to one of the synthesized voices. Finally, pleasantness and trustworthiness of the synthesized voice predicted the likability of the speaker's voice. Qualitative content analysis identified intonation, sound, emotion, and imageability/embodiment as diagnostic features. Discussion: Humans clearly prefer human voices, but manipulating diagnostic speech features might increase acceptance of synthesized voices and thereby support human-robot interaction. There is limited evidence that human-likeness of a voice is negatively linked to the perceived eeriness of the speaker.

[1]  F. Ciardo,et al.  Do We Adopt the Intentional Stance Toward Humanoid Robots? , 2019, Front. Psychol..

[2]  Katarzyna Wac,et al.  Multimodal Integration of Emotional Signals from Voice, Body, and Context: Effects of (In)Congruence on Emotion Recognition and Attitudes Towards Robots , 2019, Int. J. Soc. Robotics.

[3]  Steven E. Stern,et al.  Social perception of male and female computer synthesized speech , 2003, Comput. Hum. Behav..

[4]  Brian Scassellati,et al.  The similarity-attraction effect in human-robot interaction , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[5]  Ilaria Torre,et al.  Trust in artificial voices: A "congruency effect" of first impressions and behavioural experience , 2018, APAScience.

[6]  Marianella Chamorro-Koc,et al.  Emotion specific body movements: Studying humans to augment robots' bodily expressions , 2019, OZCHI.

[7]  Thomas Drugman,et al.  Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection , 2019, INTERSPEECH.

[8]  Brian Scassellati,et al.  Effects of form and motion on judgments of social robots' animacy, likability, trustworthiness and unpleasantness , 2016, Int. J. Hum. Comput. Stud..

[9]  Björn W. Schuller,et al.  The Perception and Analysis of the Likeability and Human Likeness of Synthesized Speech , 2018, INTERSPEECH.

[10]  H. Reis,et al.  Attraction and close relationships. , 1998 .

[11]  Mickey Vallee Technology, Embodiment, and Affect in Voice Sciences , 2017 .

[12]  Randy Allen Harris,et al.  Voice Interaction Design: Crafting the New Conversational Speech Systems , 2004 .

[13]  Judee K. Burgoon,et al.  Expectancy Violations Theory , 2015 .

[14]  Judee K. Burgoon,et al.  Application of Expectancy Violations Theory to communication with and judgments about embodied agents during a decision-making task , 2016, Int. J. Hum. Comput. Stud..

[15]  Matthew P. Aylett,et al.  Beyond the Listening Test: An Interactive Approach to TTS Evaluation , 2017, INTERSPEECH.

[16]  Cynthia Breazeal,et al.  Effects of framing a robot as a social agent or as a machine on children's social behavior , 2016, 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[17]  Florian Hinterleitner,et al.  Quality of synthetic speech : perceptual dimensions, influencing factors, and instrumental assessment , 2017 .

[18]  Florian Alt,et al.  At Your Service: Designing Voice Assistant Personalities to Improve Automotive User Interfaces , 2019, CHI.

[19]  Simone Ashby,et al.  "Human, All Too Human": NOAA Weather Radio and the Emotional Impact of Synthetic Voices , 2020, CHI.

[20]  Partha Pratim Roy,et al.  A Generative Adversarial Network Based Ensemble Technique for Automatic Evaluation of Machine Synthesized Speech , 2019, Asian Conference on Pattern Recognition.

[21]  Judy Kay,et al.  Interactive Realistic Digital Avatars - Revisiting the Uncanny Valley , 2017, HICSS.

[22]  Shunsuke Sasaki,et al.  Analysis of emotional expression by visualization of the human and synthesized speech signal sets — A consideration of audio-visual advantage- , 2018, 2018 International Workshop on Advanced Image Technology (IWAIT).

[23]  J. Burgoon A Communication Model of Personal Space Violations: Explication and an Initial Test. , 1978 .

[24]  Haizhou Li,et al.  Making Social Robots More Attractive: The Effects of Voice Pitch, Humor and Empathy , 2013, Int. J. Soc. Robotics.

[25]  Maartje M. A. de Graaf,et al.  Intonation in Robot Speech: Does it work the same as with people? , 2020, 2020 15th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[26]  P. Costa,et al.  Personality trait structure as a human universal. , 1997, The American psychologist.

[27]  Noah L. Schroeder,et al.  Reconsidering the voice effect when learning from a virtual human , 2017, Comput. Educ..

[28]  Peter Birkholz,et al.  Manipulation of the prosodic features of vocal tract length, nasality and articulatory precision using articulatory synthesis , 2017, Comput. Speech Lang..

[29]  Matthew P. Aylett,et al.  Creating Robot Personality: Effects of Mixing Speech and Semantic Free Utterances , 2020, HRI.

[30]  Nicole C. Krämer,et al.  Neural Mechanisms for Accepting and Rejecting Artificial Social Partners in the Uncanny Valley , 2019, The Journal of Neuroscience.

[31]  O. John,et al.  Measuring personality in one minute or less: A 10-item short version of the Big Five Inventory in English and German , 2007 .

[32]  M. Back,et al.  Neuroticism and interpersonal perception: Evidence for positive, but not negative, biases. , 2020, Journal of personality.

[33]  Sébastien Le Maguer,et al.  Speech Synthesis Evaluation — State-of-the-Art Assessment and Suggestion for a Novel Research Program , 2019, 10th ISCA Workshop on Speech Synthesis (SSW 10).

[34]  Alan Bryman,et al.  Developments in qualitative data analysis: an introduction , 2002 .

[35]  Emma Rodero,et al.  Effectiveness, attention, and recall of human and artificial voices in an advertising story. Prosody influence and functions of voices , 2017, Comput. Hum. Behav..

[36]  Oliver Bendel Die Maschine an meiner Seite , 2020 .

[37]  Steven O. Entezari,et al.  Individual differences predict sensitivity to the uncanny valley , 2015 .

[38]  Cynthia Breazeal,et al.  Designing sociable robots , 2002 .

[39]  D. A. Kenny,et al.  Interpersonal Perception: A Social Relations Analysis , 1988 .

[40]  Stefan Kopp,et al.  Guidelines for Designing Social Robots as Second Language Tutors , 2018, International Journal of Social Robotics.

[41]  Heather C. Lum,et al.  Robots' Auditory Cues are Subject to Anthropomorphism , 2009 .

[42]  Wolfgang Minker,et al.  Effects of Gender Stereotypes on Trust and Likability in Spoken Human-Robot Interaction , 2018, LREC.

[43]  P. van Lieshout,et al.  Effects of age on speech and voice quality ratings. , 2016, The Journal of the Acoustical Society of America.

[44]  Steven E. Stern,et al.  The Persuasiveness of Synthetic Speech versus Human Speech , 1999, Hum. Factors.

[45]  Takayuki Kanda,et al.  Is The Uncanny Valley An Uncanny Cliff? , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[46]  K. Shadan,et al.  Available online: , 2012 .

[47]  Aaron C. Elkins,et al.  The Sound of Trust: Voice as a Measurement of Trust During Interactions with Embodied Conversational Agents , 2013 .

[48]  D. A. Kenny,et al.  The Female Positivity Effect in the Perception of Others , 1998 .

[49]  Simine Vazire,et al.  Perceiver effects as projective tests: what your perceptions of others say about you. , 2010, Journal of personality and social psychology.

[50]  Michael Oehl,et al.  Exploring the Uncanny Valley Effect in Social Robotics , 2017, HRI.

[51]  Jan Romportl Speech Synthesis and Uncanny Valley , 2014, TSD.

[52]  Heloir,et al.  The Uncanny Valley , 2019, The Animation Studies Reader.

[53]  Dylan F. Glas,et al.  Persistence of the Uncanny Valley , 2018 .

[54]  Nadia Magnenat Thalmann,et al.  Eliza in the uncanny valley: anthropomorphizing consumer robots increases their perceived warmth but decreases liking , 2019, Marketing Letters.

[55]  Brad E. Sheese,et al.  Attraction, personality, and prejudice: liking none of the people most of the time. , 2007, Journal of personality and social psychology.

[56]  Tatsuya Nomura,et al.  Prediction of Human Behavior in Human--Robot Interaction Using Psychological Scales for Anxiety and Negative Attitudes Toward Robots , 2008, IEEE Transactions on Robotics.

[57]  Oliver Watts,et al.  Using generative modelling to produce varied intonation for speech synthesis , 2019, ArXiv.

[58]  Tyler J. Burleigh,et al.  A reappraisal of the uncanny valley: categorical perception or frequency-based sensitization? , 2015, Front. Psychol..

[59]  E. Broadbent Interactions With Robots: The Truths We Reveal About Ourselves , 2017, Annual review of psychology.

[60]  Takaaki Kuratate,et al.  Are virtual humans uncanny?: varying speech, appearance and motion to better understand the acceptability of synthetic humans , 2009, AVSP.

[61]  M. H. Fischer,et al.  Intimate Relationships with Humanoid Robots: Exploring Human Sexuality in the Twenty-First Century , 2019, AI Love You.

[62]  Benjamin R. Cowan,et al.  The Influence of Synthetic Voice on the Evaluation of a Virtual Character , 2017, INTERSPEECH.

[63]  A. Greenwald,et al.  Measuring individual differences in implicit cognition: the implicit association test. , 1998, Journal of personality and social psychology.

[64]  Matthew P. Aylett,et al.  Speech Synthesis for the Generation of Artificial Personality , 2020, IEEE Transactions on Affective Computing.

[65]  J. Danaher,et al.  Robot Sex. Social and Ethical Implications , 2020, Zeitschrift für Sexualforschung.

[66]  Yoshihiro Miyake,et al.  The Relationship between Robot’s Nonverbal Behaviour and Human’s Likability Based on Human’s Personality , 2018, Scientific Reports.

[67]  A. Schmidt-Nielsen A Test of Speaker Recognition Using Human Listeners , 1995, Proceedings. IEEE Workshop on Speech Coding for Telecommunications.

[68]  Matthias Scheutz,et al.  A mismatch in the human realism of face and voice produces an uncanny valley , 2011, i-Perception.

[69]  Hsi-Peng Lu,et al.  Stereotypes or golden rules? Exploring likable voice traits of social robots as active aging companions for tech-savvy baby boomers in Taiwan , 2018, Comput. Hum. Behav..

[70]  Margherita Antona,et al.  My robot is happy today: how older people with mild cognitive impairments understand assistive robots' affective output , 2019, PETRA.

[71]  Sebastian Möller,et al.  Perceptual references for independent dimensions of speech quality as measured by electroencephalography , 2017 .

[72]  I. Dey Qualitative Data Analysis: A User Friendly Guide for Social Scientists , 1993 .

[73]  R. McCrae,et al.  Universal features of personality traits from the observer's perspective: data from 50 cultures. , 2005, Journal of personality and social psychology.

[74]  D. Massaro Speech Perception By Ear and Eye: A Paradigm for Psychological Inquiry , 1989 .

[75]  Julia Hirschberg,et al.  Classifying subject ratings of emotional speech using acoustic features , 2003, INTERSPEECH.

[76]  Cynthia Breazeal Social Robots: From Research to Commercialization , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[77]  Matthias Scheutz,et al.  Too Much Humanness for Human-Robot Interaction: Exposure to Highly Humanlike Robots Elicits Aversive Responding in Observers , 2015, CHI.

[78]  James R. Lewis,et al.  Expanding the MOS: Development and Psychometric Evaluation of the MOS-R and MOS-X , 2003, Int. J. Speech Technol..

[79]  K. MacDorman,et al.  Subjective Ratings of Robot Video Clips for Human Likeness, Familiarity, and Eeriness: An Exploration of the Uncanny Valley , 2006 .

[80]  Björn W. Schuller,et al.  Paralinguistics in speech and language - State-of-the-art and the challenge , 2013, Comput. Speech Lang..

[81]  Li Gong,et al.  Shall we mix synthetic speech and human speech?: impact on users' performance, perception, and attitude , 2001, CHI.

[82]  Hae Won Park,et al.  Flat vs. Expressive Storytelling: Young Children’s Learning and Retention of a Social Robot’s Narrative , 2017, Front. Hum. Neurosci..

[83]  Matthew P. Aylett,et al.  The right kind of unnatural: designing a robot voice , 2019, CUI.

[84]  C. Nass,et al.  Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction. , 2001, Journal of experimental psychology. Applied.

[85]  Kino Coursey,et al.  Living with Harmony: A Personal Companion System by Realbotix™ , 2019, AI Love You.

[86]  Ilaria Torre,et al.  Can you Tell the Robot by the Voice? An Exploratory Study on the Role of Voice in the Perception of Robots , 2019, 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[87]  Tapio Takala,et al.  Virtual Faces Evoke Only a Weak Uncanny Valley Effect: An Empirical Investigation With Controlled Virtual Face Images , 2019, Perception.