The Role of Speech Technology in User Perception and Context Acquisition in HRI

The role and relevance of speech synthesis and speech recognition in social robotics is addressed in this paper. To increase the generality of this study, the interaction of a human being with one and two robots when executing tasks was considered. By making use of these scenarios, a state-of-the-art speech synthesizer was compared with non-linguistic utterances (1) from the human preference and (2) perception of the robots’ capabilities, (3) speech recognition was compared with typed text to input commands regarding the user preference, and (4) the importance of knowing the context of robots and (5) the role of synthetic voice to acquire this context were evaluated. Speech synthesis and recognition are different technologies but generating and understanding speech should be understood as different dimensions of the same spoken language phenomenon. Also, robot context denotes all the information about operating conditions and completeness status of the task that is being executed by the robot. Two robotic setups for online experiments were built. With the first setup, where only one robot was employed, our findings indicate that: highly natural synthetic speech is preferred over beep-like audio; users also prefer to enter commands by voice rather than by typing text; and, the robot voice has a more important effect on the perceived robot’s capability than the possibility to input commands by voice. The analysis presented here suggests that when the users interacted with a single robot, its voice as a social cue and cause of anthropomorphization lost relevance while the interaction was carried out and the users could evaluate better the robot’s capability with respect to its task. In the experiment with the second setup, a two-robot collaborative testbed was employed. When the robots communicated to each other to sort out the problems while they were trying to accomplish a mission, the user observed the situation from a more distanced position and the “reflective” perspective dominated. Our results indicate that to acquire the robots’ context was perceived as essential for a successful human–robot collaboration to accomplish a given objective. For this purpose, synthesized speech was preferred over text on a screen for context acquisition.

[1]  Kai Oliver Arras,et al.  R2-D2 Reloaded: A flexible sound synthesis system for sonic human-robot interaction design , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[2]  Jyh-Shing Roger Jang,et al.  Exploring perceptions of integrating tangible learning companions in learning English conversation , 2010, Br. J. Educ. Technol..

[3]  Kate Highfield,et al.  Robotic toys as a catalyst for mathematical problem solving , 2010 .

[4]  Yuichiro Yoshikawa,et al.  Multiple-Robot Conversational Patterns for Concealing Incoherent Responses , 2018, Int. J. Soc. Robotics.

[5]  François Michaud,et al.  Characteristics of mobile robotic toys for children with pervasive developmental disorders , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[6]  Tony Belpaeme,et al.  People Interpret Robotic Non-linguistic Utterances Categorically , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[7]  Leila Takayama,et al.  Initiating interactions in order to get help: Effects of social framing on people's responses to robots' requests for assistance , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[8]  Daniel Barber,et al.  Exploring the Effect of Communication Patterns and Transparency on Performance in a Human-Robot Team , 2019, Proceedings of the Human Factors and Ergonomics Society Annual Meeting.

[9]  Guy Hoffman,et al.  Effects of robotic companionship on music enjoyment and agent perception , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[10]  J. Lerner,et al.  Emotion and decision making. , 2015, Annual review of psychology.

[11]  Michael A. Goodrich,et al.  Human-Robot Interaction: A Survey , 2008, Found. Trends Hum. Comput. Interact..

[12]  T Gold,et al.  Speech production in hearing-impaired children. , 1980, Journal of communication disorders.

[13]  Aaron Steinfeld,et al.  From One to Another: How Robot-Robot Interaction Affects Users' Perceptions Following a Transition Between Robots , 2019, 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[14]  Stefan Kopp,et al.  Synthesizing multimodal utterances for conversational agents , 2004, Comput. Animat. Virtual Worlds.

[15]  Yael Edan,et al.  Human-robot collaboration for improved target recognition of agricultural robots , 2003, Ind. Robot.

[16]  Lee Sproull,et al.  Cooperating with life-like interface agents , 1999 .

[17]  Vladan Papić,et al.  Social Robotics in Education: State-of-the-Art and Directions , 2018, Advances in Service and Industrial Robotics.

[18]  I. René J. A. te Boekhorst,et al.  Human approach distances to a mechanical-looking robot with different robot voice styles , 2008, RO-MAN 2008 - The 17th IEEE International Symposium on Robot and Human Interactive Communication.

[19]  Clifford Nass,et al.  The media equation - how people treat computers, television, and new media like real people and places , 1996 .

[20]  Lee Sproull,et al.  When the Interface Is a Face , 1996, Hum. Comput. Interact..

[21]  Jonathan Klein,et al.  This computer responds to user frustration: Theory, design, and results , 2002, Interact. Comput..

[22]  Nian-Shing Chen,et al.  A joyful classroom learning system with robot learning companion for children to learn mathematics m , 2011 .

[23]  M. Crocker,et al.  Investigating joint attention mechanisms through spoken human–robot interaction , 2011, Cognition.

[24]  Clifford Nass,et al.  Computers are social actors , 1994, CHI '94.

[25]  Seung Ho Han,et al.  Robust GSC-based speech enhancement for human machine interface , 2010, IEEE Transactions on Consumer Electronics.

[26]  Ahmed KHOTA,et al.  Modelling of Non-Linguistic Utterances for Machine to Human Communication in Dialogue , 2019 .

[27]  Leila Takayama,et al.  Perspectives on Agency Interacting with and through Personal Robots , 2012, Human-Computer Interaction: The Agency Perspective.

[28]  Paul R. Havig,et al.  Transparency in a Human-Machine Context: Approaches for Fostering Shared Awareness/Intent , 2014, HCI.

[29]  Charles R. Crowelly,et al.  Gendered voice and robot entities: Perceptions and reactions of male and female subjects , 2009 .

[30]  A MarvelJeremy,et al.  Towards Effective Interface Designs for Collaborative HRI in Manufacturing , 2020 .

[31]  Friederike Eyssel,et al.  Activating elicited agent knowledge: How robot and user features shape the perception of social robots , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[32]  Rolf Dieter Schraft,et al.  Care-O-bot II—Development of a Next Generation Robotic Home Assistant , 2004, Auton. Robots.

[33]  Friederike Eyssel,et al.  ‘If you sound like me, you must be more human’: On the interplay of robot and user features on human-robot acceptance and anthropomorphism , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[34]  Kerstin Dautenhahn,et al.  Is someone watching me? - consideration of social facilitation effects in human-robot interaction experiments , 2005, 2005 International Symposium on Computational Intelligence in Robotics and Automation.

[35]  Jack Hollingum,et al.  Speech Technology at Work , 1988 .

[36]  Thomas Rist,et al.  CrossTalk: An Interactive Installation with Animated Presentation Agents , 2002 .

[37]  Christian Rauh,et al.  The effect of video feedback delay on frustration and emotion communication accuracy , 2011, Comput. Hum. Behav..

[38]  Li Gong,et al.  Shall we mix synthetic speech and human speech?: impact on users' performance, perception, and attitude , 2001, CHI.

[39]  Arthur C. Graesser,et al.  AutoTutor: an intelligent tutoring system with mixed-initiative dialogue , 2005, IEEE Transactions on Education.

[40]  Erica L. Meszaros,et al.  Trusted Communication: Utilizing Speech Communication to Enhance Human-Machine Teaming Success , 2018, 2018 Aviation Technology, Integration, and Operations Conference.

[41]  Nicole C. Krämer,et al.  Human-Agent and Human-Robot Interaction Theory: Similarities to and Differences from Human-Human Interaction , 2012, Human-Computer Interaction: The Agency Perspective.

[42]  S. L. See,et al.  The influence of voice pitch on the evaluation of a social robot receptionist , 2011, 2011 International Conference on User Science and Engineering (i-USEr ).

[43]  Justine Cassell,et al.  Human conversation as a system framework: designing embodied conversational agents , 2001 .

[44]  Pedro U. Lima,et al.  Institutional Robotics , 2015, International Journal of Social Robotics.

[45]  Roger K. Moore Spoken Language Processing: Time to Look Outside? , 2014, SLSP.

[46]  Siddhartha S. Srinivasa,et al.  Perceived robot capability , 2015, 2015 24th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[47]  Tony Belpaeme,et al.  Non-Linguistic Utterances Should be Used Alongside Language, Rather than on their Own or as a Replacement , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[48]  Kerstin Dautenhahn,et al.  Evaluating Trust and Safety in HRI : Practical Issues and Ethical Challenges , 2015 .

[49]  Tony Belpaeme,et al.  How to use non-linguistic utterances to convey emotion in child-robot interaction , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[50]  Bruce A. MacDonald,et al.  The Effects of Synthesized Voice Accents on User Perceptions of Robots , 2011, Int. J. Soc. Robotics.

[51]  Bradley S. Barker,et al.  Robotics as Means to Increase Achievement Scores in an Informal Learning Environment , 2007 .

[52]  Charles R. Crowell,et al.  Robot social presence and gender: Do females view robots differently than males? , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[53]  Marcelo H. Ang,et al.  Why Robots? A Survey on the Roles and Benefits of Social Robots in the Therapy of Children with Autism , 2013, International Journal of Social Robotics.

[54]  Matthias Scheutz,et al.  Covert robot-robot communication , 2015, HRI 2015.

[55]  Robin Read,et al.  A Study of Non-Linguistic Utterances for Social Human-Robot Interaction , 2014 .

[56]  Arne Jönsson,et al.  Wizard of Oz studies: why and how , 1993, IUI '93.

[57]  J. Cassell,et al.  Embodied conversational agents , 2000 .

[58]  C. Nass,et al.  Machines and Mindlessness , 2000 .

[59]  A. Bechara The role of emotion in decision-making: Evidence from neurological patients with orbitofrontal damage , 2004, Brain and Cognition.

[60]  Clifford Nass,et al.  The Effects of Emotion of Voice in Synthesized and Recorded Speech , 2001 .

[61]  Ilaria Torre,et al.  Can you Tell the Robot by the Voice? An Exploratory Study on the Role of Voice in the Perception of Robots , 2019, 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[62]  Haizhou Li,et al.  Making Social Robots More Attractive: The Effects of Voice Pitch, Humor and Empathy , 2013, Int. J. Soc. Robotics.

[63]  C. Nass,et al.  Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction. , 2001 .

[64]  Hyemi Kim,et al.  Fribo: A Social Networking Robot for Increasing Social Connectedness through Sharing Daily Home Activities from Living Noise Data , 2018, 2018 13th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[65]  Csaba Kardos,et al.  Context-dependent multimodal communication in human-robot collaboration , 2018 .

[66]  Norman I. Badler,et al.  Creating Interactive Virtual Humans: Some Assembly Required , 2002, IEEE Intell. Syst..

[67]  Il Hong Suh,et al.  Ontology Modeling and Storage System for Robot Context Understanding , 2005, KES.

[68]  H. Kozima,et al.  Children-robot interaction: a pilot study in autism therapy. , 2007, Progress in brain research.

[69]  Manuela M. Veloso,et al.  Confidence-Based Multi-Robot Learning from Demonstration , 2010, Int. J. Soc. Robotics.

[70]  Kuanhao Zheng,et al.  Designing and Implementing a Human–Robot Team for Social Interactions , 2013, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[71]  Pedro U. Lima,et al.  Institutional Robotics , 2007, ECAL.

[72]  Stefan Kopp,et al.  FlurMax: An Interactive Virtual Agent for Entertaining Visitors in a Hallway , 2003, IVA.

[73]  Byron Reeves,et al.  The effects of animated characters on anxiety, task performance, and evaluations of user interfaces , 2000, CHI.