Speech and Gaze Conflicts in Collaborative Human-Robot Interactions Henny Admoni, Christopher Datsikas, and Brian Scassellati (henny@cs.yale.edu, christopher.datsikas@yale.edu, scaz@cs.yale.edu) Department of Computer Science, Yale University New Haven, Connecticut 06520 USA Abstract have investigated the effects of speech-gaze conflicts. In this paper, we investigate how speech-gaze conflicts are handled by human partners in collaborative, embodied human-robot interactions. We focus on object selection tasks in which a robot provides instructions to a human, because these scenar- ios are central to collaborative action, and because communi- cation misinterpretation in such scenarios can be costly. We compare congruent gaze—in which the robot looks at the object it references in speech—and incongruent gaze— in which the robot looks at a different object—to a control condition in which the robot does not exhibit gaze cues. To quantitatively measure the effect of speech-gaze conflicts, we record the time between when the robot begins its instructions and when participants select an object. Response time serves as an approximation of task efficiency; faster responses mean less overall time taken for the task. As a final manipulation, we also include a human agent condition, in which the robot is replaced by a person who performs the robot’s role in the experiment. The human agent condition attempts to discover whether robot gaze is any more or less influential on human behavior than human gaze. The results of this study provide evidence of the effective- ness of gaze in collaborative human-robot interactions. As described below, we find that congruent gaze facilitates per- formance in both robot and human conditions. Interestingly, we also find that incongruent gaze does not hinder perfor- mance in either the robot or the human conditions. In other words, in this task, people are able to recover quickly enough from speech-gaze conflicts that their performance is statisti- cally no different than not having gaze at all. These results suggest that adding referential gaze may be a low-risk way to improve human performance in similar environments, even when the gaze system is unreliable. Gaze and speech are both important modes of communication for human-robot interactions. However, few studies to date have explored the effects of conflict in a robot’s multi-modal communication. In this paper, we investigate how such speech- gaze conflicts affect performance on a cooperative referential task. Participants play a selection game with a robot, in which the robot instructs them to select one object from among a group of available objects. We vary whether the robot’s gaze is congruent with its speech, incongruent with its speech, or absent, and we measure participants’ response times to the robot’s instructions. Results indicate that congruent speech facilitates performance but that incongruent speech does not hinder performance. We repeat the study with a human actor instead of a robot to investigate whether human gaze has the same effect, and find the same results: in this type of activ- ity, congruent gaze helps performance while incongruent gaze does not hurt it. We conclude that robot gaze may be a worth- while investment in such situations, even when gaze behaviors may be unreliable. Keywords: human-robot interaction; eye gaze; non-verbal communication Introduction In typical human interactions, eye gaze supports and aug- ments spoken communication (Kleinke, 1986). People gaze almost exclusively at task-relevant information (Hayhoe & Ballard, 2005), and gaze is used to disambiguate statements about objects in the environment (Hanna & Brennan, 2007). Similar mechanisms are also at play in human-robot interac- tions: task-relevant robot eye gaze can be used to improve the efficiency of collaborative action (Boucher et al., 2012). For example, imagine a human and robot collaboratively constructing a birdhouse. The robot can use its eye gaze to clarify an ambiguous speech reference, saying “Please pass the green block” while looking at a particular green building block to distinguish it from among other green blocks. This multi-modal communication makes the interaction more ef- ficient by using multiple channels to convey information, re- quiring less investment in costly mechanisms like generating sufficiently descriptive speech, and improving the naturalness of the interaction (Huang & Thomaz, 2011). But robots are not perfect, and sometimes speech and gaze cues will conflict. Sensor errors, hardware malfunctions, and software bugs can cause mismatches between a robot’s gaze and speech. In such cases, a human partner receives incor- rect or contradictory information from the robot. The human might misinterpret the robot’s speech or, at best, must hesi- tate to decide what the robot means, decreasing the collabo- ration’s efficiency and increasing the human’s cognitive load. While a growing body of evidence shows that people can interpret robot gaze and speech, only a few studies to date Related Work Directional eye gaze seems to be a special stimulus, evoking reflexive attention shifts that are robust to top-down modula- tion (Friesen, Ristic, & Kingstone, 2004). Functional MRI studies reveal a significant overlap in the brain areas that pro- cess theory of mind and those that process eye gaze (Calder et al., 2002). In fact, observing someone signaling the pres- ence of an object with referential gaze elicits the same neural response as observing someone physically reaching to grasp that object (Pierno et al., 2006), indicating that people use gaze as a powerful indicator of others’ future behavior. Where we look is closely coupled with what we say in human-human interactions. Objects or figures in the envi- ronment are typically fixated one second or less before they
[1]
A. Young,et al.
Reading the mind from eye gaze
,
2002,
Neuropsychologia.
[2]
Luca Turella,et al.
When Gaze Turns into Grasp
,
2006,
Journal of Cognitive Neuroscience.
[3]
Sean Andrist,et al.
Designing effective gaze mechanisms for virtual agents
,
2012,
CHI.
[4]
Bilge Mutlu,et al.
Robot behavior toolkit: Generating effective social behaviors for robots
,
2012,
2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).
[5]
Andrea Lockerd Thomaz,et al.
Effects of responding to, initiating and ensuring joint attention in human-robot interaction
,
2011,
2011 RO-MAN.
[6]
Takayuki Kanda,et al.
Nonverbal leakage in robots: Communication of intentions through seemingly unintentional behavior
,
2009,
2009 4th ACM/IEEE International Conference on Human-Robot Interaction (HRI).
[7]
Brian Scassellati,et al.
The Benefits of Interactions with Physically Present Robots over Video-Displayed Agents
,
2011,
Int. J. Soc. Robotics.
[8]
Marek P. Michalowski,et al.
Keepon : A Playful Robot for Research, Therapy, and Entertainment (Original Paper)
,
2009
.
[9]
S. Brennan,et al.
Speakers' eye gaze disambiguates referring expressions early during face-to-face conversation
,
2007
.
[10]
Takayuki Kanda,et al.
Footing in human-robot conversations: How robots might shape participant roles using gaze cues
,
2009,
2009 4th ACM/IEEE International Conference on Human-Robot Interaction (HRI).
[11]
Siddhartha S. Srinivasa,et al.
Toward seamless human-robot handovers
,
2013,
Journal of Human-Robot Interaction.
[12]
Bilge Mutlu,et al.
A Storytelling Robot: Modeling and Evaluation of Human-like Gaze Behavior
,
2006,
2006 6th IEEE-RAS International Conference on Humanoid Robots.
[13]
D. Ballard,et al.
Eye movements in natural behavior
,
2005,
Trends in Cognitive Sciences.
[14]
C. Kleinke.
Gaze and eye contact: a research review.
,
1986,
Psychological bulletin.
[15]
M. Crocker,et al.
Investigating joint attention mechanisms through spoken human–robot interaction
,
2011,
Cognition.
[16]
Zenzi M. Griffin,et al.
PSYCHOLOGICAL SCIENCE Research Article WHAT THE EYES SAY ABOUT SPEAKING
,
2022
.
[17]
Matthias Scheutz,et al.
Adaptive eye gaze patterns in interactions with human and artificial agents
,
2012,
TIIS.
[18]
Alan Kingstone,et al.
Attentional effects of counterpredictive gaze and arrow cues.
,
2004,
Journal of experimental psychology. Human perception and performance.
[19]
Peter Ford Dominey,et al.
I Reach Faster When I See You Look: Gaze Effects in Human–Human and Human–Robot Face-to-Face Cooperation
,
2012,
Front. Neurorobot..