GazeEMD: Detecting Visual Intention in Gaze-Based Human-Robot Interaction

In gaze-based Human-Robot Interaction (HRI), it is important to determine human visual intention for interacting with robots. One typical HRI interaction scenario is that a human selects an object by gaze and a robotic manipulator will pick up the object. In this work, we propose an approach, GazeEMD, that can be used to detect whether a human is looking at an object for HRI application. We use Earth Mover’s Distance (EMD) to measure the similarity between the hypothetical gazes at objects and the actual gazes. Then, the similarity score is used to determine if the human visual intention is on the object. We compare our approach with a fixation-based method and HitScan with a run length in the scenario of selecting daily objects by gaze. Our experimental results indicate that the GazeEMD approach has higher accuracy and is more robust to noises than the other approaches. Hence, the users can lessen cognitive load by using our approach in the real-world HRI scenario.

[1]  Ian Oakley,et al.  Comparing selection mechanisms for gaze input techniques in head-mounted displays , 2020, Int. J. Hum. Comput. Stud..

[2]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[3]  Andreas Bulling,et al.  Pupil: an open source platform for pervasive eye tracking and mobile gaze-based interaction , 2014, UbiComp Adjunct.

[4]  Yutaka Ohtake,et al.  A system for three-dimensional gaze fixation analysis using eye tracking glasses , 2018, J. Comput. Des. Eng..

[5]  Yu Wang,et al.  Human-Robot Interaction Based on Gaze Gestures for the Drone Teleoperation , 2014 .

[6]  Michael Werman,et al.  A Unified Approach to the Change of Resolution: Space and Gray-Level , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Paul Lukowicz,et al.  Performance metrics for activity recognition , 2011, TIST.

[8]  Boris B. Velichkovsky,et al.  New Solution to the Midas Touch Problem: Identification of Visual Commands Via Extraction of Focal Fixations , 2014, IHCI.

[9]  Frédéric Dehais,et al.  Towards Mixed-Initiative Human–Robot Interaction: Assessment of Discriminative Physiological and Behavioral Features for Performance Prediction , 2020, Sensors.

[10]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Oleg V. Komogortsev,et al.  Using machine learning to detect events in eye-tracking data , 2018, Behavior research methods.

[12]  K. Rayner The 35th Sir Frederick Bartlett Lecture: Eye movements and attention in reading, scene perception, and visual search , 2009, Quarterly journal of experimental psychology.

[13]  George P. Mylonas,et al.  Gaze-contingent perceptually enabled interactions in the operating theatre , 2017, International Journal of Computer Assisted Radiology and Surgery.