Expressive synthetic voices: Considerations for human robot interaction

As speech synthesis technology develops more advanced paralinguistic capabilities, open questions emerge regarding how humans perceive the use of such vocal capabilities by robots. Perceptions of spoken interaction are complex and influenced by multiple factors including the linguistic content of a message, social context, perceived intelligence of the agent, and form factor of its embodiment. This paper shares results from a study that controlled for the above factors in order to investigate the effect on human listeners of a male synthetic voice with an expressive range. Participants were randomly assigned to three conditions, counterbalancing for gender and language background, in which how paralinguistic cues were applied was varied. As the voice became more expressive and appropriate for the context, observers were more likely to describe the communication as effective, but were less likely to refer to the unseen agent as a person. Possible effects of the listener gender and cultural-linguistic background are examined. Implications for future methodologies in this field are discussed.

[1]  K. M. Lee,et al.  Children’s Responses to Computer-Synthesized Speech in Educational Media: Gender Consistency and Gender Similarity Effects , 2007 .

[2]  R. Schlosser,et al.  Roles of Speech Output in Augmentative and Alternative Communication: Narrative Review , 2003, Augmentative and alternative communication.

[3]  Janice Light,et al.  Attitudes toward Individuals Who Use Augmentative and Alternative Communication: Research Review , 2005 .

[4]  Julie Carson-Berndsen,et al.  Clustering Expressive Speech Styles in Audiobooks Using Glottal Source Parameters , 2011, INTERSPEECH.

[5]  Eun-Ju Lee,et al.  The more humanlike, the better? How speech type and users' cognitive style affect social responses to computers , 2010, Comput. Hum. Behav..

[6]  Zhigang Deng,et al.  Perceptual analysis of talking avatar head movements: a quantitative perspective , 2011, CHI.

[7]  Khalil Sima'an,et al.  Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship , 2006, Computational Linguistics.

[8]  Charles R. Crowelly,et al.  Gendered voice and robot entities: Perceptions and reactions of male and female subjects , 2009 .

[9]  Erna Alant,et al.  Attitudes of children toward an unfamiliar peer using an AAC device with and without voice output , 2002 .

[10]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[11]  C. Nass,et al.  Machines and Mindlessness , 2000 .

[12]  Ann R. Beck,et al.  Attitudes of children toward a similar-aged child who uses augmentative communication , 1996 .

[13]  Nicholas W. D. Evans,et al.  Assessment of Objective Quality Measures for Speech Intelligibility Estimation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[14]  Susan R. Fussell,et al.  How people anthropomorphize robots , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[15]  Ing-Marie Jonsson,et al.  Social and Emotional Characteristics of Speech-based In-Vehicle Information Systems: Impact on Attitude and Driving Behaviour , 2009 .

[16]  Takayuki Kanda,et al.  Is The Uncanny Valley An Uncanny Cliff? , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[17]  Mark Coeckelbergh,et al.  You, robot: on the linguistic construction of artificial others , 2011, AI & SOCIETY.

[18]  Mark Coeckelbergh,et al.  Personal Robots, Appearance, and Human Good: A Methodological Reflection on Roboethics , 2009, Int. J. Soc. Robotics.

[19]  Kathryn D R Drager,et al.  Synthesized speech output and children: a scoping review. , 2010, American journal of speech-language pathology.

[20]  Marc Schröder,et al.  Expressive Speech Synthesis: Past, Present, and Possible Futures , 2009, Affective Information Processing.

[21]  Julie Carson-Berndsen,et al.  Evaluating expressive speech synthesis from audiobook corpora for conversational phrases , 2012, LREC.

[22]  Nick Campbell,et al.  Getting to the Heart of the Matter: Speech as the Expression of Affect; Rather than Just Text or Language , 2005, Lang. Resour. Evaluation.

[23]  James J. Bradac,et al.  Empirical Support for the Gender-as-Culture Hypothesis: An Intercultural Analysis of Male/Female Language Differences. , 2001 .

[24]  K. Scheibe,et al.  The Computer as Alter. , 1979, The Journal of social psychology.

[25]  Kerstin Dautenhahn,et al.  Methodology & Themes of Human-Robot Interaction: A Growing Research Field , 2007 .

[26]  Kerstin Fischer,et al.  Interpersonal variation in understanding robots as social actors , 2011, 2011 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[27]  Per Persson,et al.  Exms: an animated and avatar-based messaging system for expressive peer communication , 2003, GROUP.

[28]  Daniel W. Gorenflo,et al.  Effects of synthetic speech, gender, and perceived similarity on attitudes toward the augmented communicator , 1997 .

[29]  J. Pennebaker,et al.  Psychological aspects of natural language. use: our words, our selves. , 2003, Annual review of psychology.

[30]  Heloir,et al.  The Uncanny Valley , 2019, The Animation Studies Reader.