Human Gaze Following for Human-Robot Interaction

Gaze provides subtle informative cues to aid fluent interactions among people. Incorporating human gaze predictions can signify how engaged a person is while interacting with a robot and allow the robot to predict a human's intentions or goals. We propose a novel approach to predict human gaze fixations relevant for human-robot interaction tasks-both referential and mutual gaze-in real time on a robot. We use a deep learning approach which tracks a human's gaze from a robot's perspective in real time. The approach builds on prior work which uses a deep network to predict the referential gaze of a person from a single 2D image. Our work uses an interpretable part of the network, a gaze heat map, and incorporates contextual task knowledge such as location of relevant objects, to predict referential gaze. We find that the gaze heat map statistics also capture differences between mutual and referential gaze conditions, which we use to predict whether a person is facing the robot's camera or not. We highlight the challenges of following a person's gaze on a robot in real time and show improved performance for referential gaze and mutual gaze prediction.

[1]  Takayuki Kanda,et al.  Footing in human-robot conversations: How robots might shape participant roles using gaze cues , 2009, 2009 4th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[2]  Katharina J. Rohlfing,et al.  Toward designing a robot that learns actions from parental demonstrations , 2008, 2008 IEEE International Conference on Robotics and Automation.

[3]  M. Argyle,et al.  Gaze and Mutual Gaze , 1994, British Journal of Psychiatry.

[4]  Hideaki Kuzuoka,et al.  Museum guide robot based on sociological interaction analysis , 2007, CHI.

[5]  Siddhartha S. Srinivasa,et al.  Predicting User Intent Through Eye Gaze for Shared Autonomy , 2016, AAAI Fall Symposia.

[6]  Luc Van Gool,et al.  Object Referring in Videos with Language and Human Gaze , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Alejandro Bordallo,et al.  Physical symbol grounding and instance learning through demonstration and eye tracking , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Pierre Dillenbourg,et al.  From real-time attention assessment to “with-me-ness” in human-robot interaction , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[9]  Tommy Strandvall,et al.  Eye Tracking in Human-Computer Interaction and Usability Research , 2009, INTERACT.

[10]  Dana H. Ballard,et al.  Recognizing Behavior in Hand-Eye Coordination Patterns , 2009, Int. J. Humanoid Robotics.

[11]  Antonio Torralba,et al.  Where are they looking? , 2015, NIPS.

[12]  M. Matarić,et al.  Monitoring and Guiding User Attention and Intention in Human-Robot Interaction , 2010, ICRA 2010.

[13]  Siddhartha S. Srinivasa,et al.  The YCB object and Model set: Towards common benchmarks for manipulation research , 2015, 2015 International Conference on Advanced Robotics (ICAR).

[14]  M L Abercrombie,et al.  Non-verbal communication. , 1972, Proceedings of the Royal Society of Medicine.

[15]  Candace L. Sidner,et al.  Recognizing engagement in human-robot interaction , 2010, HRI 2010.

[16]  B. Scassellati,et al.  Social eye gaze in human-robot interaction , 2017, HRI 2017.

[17]  James M. Rehg,et al.  Learning to Predict Gaze in Egocentric Video , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  Antonio Torralba,et al.  Following Gaze in Video , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  J. Wertsch,et al.  The creation of context in joint problem-solving. , 1984 .

[20]  Wojciech Matusik,et al.  Eye Tracking for Everyone , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[22]  Peter Robinson,et al.  OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[23]  Bilge Mutlu,et al.  Anticipatory robot control for efficient human-robot collaboration , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[24]  Hideaki Kuzuoka,et al.  Precision timing in human-robot interaction: coordination of head movement and utterance , 2008, CHI.

[25]  Raj M. Ratwani,et al.  Integrating vision and audition within a cognitive architecture to track conversations , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[26]  P. Dario,et al.  Gaze interface: Utilizing human predictive gaze movements for controlling a HBS , 2008, 2008 2nd IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics.

[27]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).