Real-time Human Gaze following for Human-Robot Interaction

Gaze provides subtle informative cues to aid fluent interactions among people. Incorporating human gaze predictions can signify how engaged a person is while interacting with a robot and allow the robot to predict a human’s intentions or goals. We propose a novel approach to predict human gaze fixations—both referential and mutual gaze—in real time. We use a deep learning approach which tracks a human’s gaze from a robot’s perspective in real time on a GPU enabled laptop. The approach builds on prior work which uses a deep network to predict the referential gaze of a person from a single 2D image. Our work uses an interpretable part of the network, a gaze heat map, and combines it with an object detector to predict which object is most likely under a human’s attention. We highlight the challenges of following a person’s gaze and show improved performance for referential gaze with our approach, along with an approach to predict mutual gaze. We describe our ongoing work in this direction and also outline future directions for evaluating the gaze following system. ACM Reference Format: Akanksha Saran, Srinjoy Majumdar, Andrea Thomaz, and Scott Niekum. 2018. Real-time Human Gaze following for Human-Robot Interaction. In Proceedings of ACM/IEEE International Conference on Human Robot Interaction (HRI’18). ACM, New York, NY, USA, 5 pages. https://doi.org/0.0

[1]  B. Scassellati,et al.  Social eye gaze in human-robot interaction , 2017, HRI 2017.

[2]  J. Wertsch,et al.  The creation of context in joint problem-solving. , 1984 .

[3]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[4]  Raj M. Ratwani,et al.  Integrating vision and audition within a cognitive architecture to track conversations , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[5]  Siddhartha S. Srinivasa,et al.  The YCB object and Model set: Towards common benchmarks for manipulation research , 2015, 2015 International Conference on Advanced Robotics (ICAR).

[6]  Antonio Torralba,et al.  Where are they looking? , 2015, NIPS.

[7]  Katharina J. Rohlfing,et al.  Toward designing a robot that learns actions from parental demonstrations , 2008, 2008 IEEE International Conference on Robotics and Automation.

[8]  Antonio Torralba,et al.  Following Gaze in Video , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Candace L. Sidner,et al.  Recognizing engagement in human-robot interaction , 2010, HRI 2010.

[10]  Hideaki Kuzuoka,et al.  Precision timing in human-robot interaction: coordination of head movement and utterance , 2008, CHI.

[11]  James M. Rehg,et al.  Learning to Predict Gaze in Egocentric Video , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Peter Robinson,et al.  OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13]  Hideaki Kuzuoka,et al.  Museum guide robot based on sociological interaction analysis , 2007, CHI.

[14]  Dana H. Ballard,et al.  Recognizing Behavior in Hand-Eye Coordination Patterns , 2009, Int. J. Humanoid Robotics.

[15]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Pierre Dillenbourg,et al.  From real-time attention assessment to “with-me-ness” in human-robot interaction , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[17]  Alejandro Bordallo,et al.  Physical symbol grounding and instance learning through demonstration and eye tracking , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Bilge Mutlu,et al.  Anticipatory robot control for efficient human-robot collaboration , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[19]  Wojciech Matusik,et al.  Eye Tracking for Everyone , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Takayuki Kanda,et al.  Footing in human-robot conversations: How robots might shape participant roles using gaze cues , 2009, 2009 4th ACM/IEEE International Conference on Human-Robot Interaction (HRI).