Exploiting Three-Dimensional Gaze Tracking for Action Recognition During Bimanual Manipulation to Enhance Human–Robot Collaboration

Human–robot collaboration could be advanced by facilitating the intuitive, gaze-based control of robots, and enabling robots to recognize human actions, infer human intent, and plan actions that support human goals. Traditionally, gaze tracking approaches to action recognition have relied upon computer vision-based analyses of two-dimensional egocentric camera videos. The objective of this study was to identify useful features that can be extracted from three-dimensional (3D) gaze behavior and used as inputs to machine learning algorithms for human action recognition. We investigated human gaze behavior and gaze–object interactions in 3D during the performance of a bimanual, instrumental activity of daily living: the preparation of a powdered drink. A marker-based motion capture system and binocular eye tracker were used to reconstruct 3D gaze vectors and their intersection with 3D point clouds of objects being manipulated. Statistical analyses of gaze fixation duration and saccade size suggested that some actions (pouring and stirring) may require more visual attention than other actions (reach, pick up, set down, and move). 3D gaze saliency maps, generated with high spatial resolution for six subtasks, appeared to encode action-relevant information. The “gaze object sequence” was used to capture information about the identity of objects in concert with the temporal sequence in which the objects were visually regarded. Dynamic time warping barycentric averaging was used to create a population-based set of characteristic gaze object sequences that accounted for intra- and inter-subject variability. The gaze object sequence was used to demonstrate the feasibility of a simple action recognition algorithm that utilized a dynamic time warping Euclidean distance metric. Averaged over the six subtasks, the action recognition algorithm yielded an accuracy of 96.4%, precision of 89.5%, and recall of 89.2%. This level of performance suggests that the gaze object sequence is a promising feature for action recognition whose impact could be enhanced through the use of sophisticated machine learning classifiers and algorithmic improvements for real-time implementation. Robots capable of robust, real-time recognition of human actions during manipulation tasks could be used to improve quality of life in the home and quality of work in industrial environments.

[1]  Ross A. Knepper,et al.  Herb 2.0: Lessons Learned From Developing a Mobile Manipulator for the Home , 2012, Proceedings of the IEEE.

[2]  S. Schultz Principles of Neural Science, 4th ed. , 2001 .

[3]  D. Ballard,et al.  Eye movements in natural behavior , 2005, Trends in Cognitive Sciences.

[4]  Larry S. Davis,et al.  Towards 3-D model-based tracking and recognition of human movement: a multi-view approach , 1995 .

[5]  Martial Hebert,et al.  Temporal segmentation and activity classification from first-person sensing , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[6]  Manuel Lopes,et al.  Learning grasping affordances from local visual descriptors , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[7]  M. Hayhoe,et al.  In what ways do eye movements contribute to everyday activities? , 2001, Vision Research.

[8]  Nuria Oliver,et al.  Partial sequence matching using an Unbounded Dynamic Time Warping algorithm , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  J. Vickers Perception, Cognition, and Decision Training: The Quiet Eye in Action , 2007 .

[10]  Chen Yu,et al.  Understanding Human Behaviors Based on Eye-Head-Hand Coordination , 2002, Biologically Motivated Computer Vision.

[11]  Antonija Mitrovic,et al.  Intelligent Augmented Reality Training for Motherboard Assembly , 2015, International Journal of Artificial Intelligence in Education.

[12]  Carlos Morato,et al.  Toward Safe Human Robot Collaboration by Using Multiple Kinects Based Real-Time Human Tracking , 2014, J. Comput. Inf. Sci. Eng..

[13]  Fernando De la Torre,et al.  Generalized time warping for multi-modal alignment of human motion , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Brenna D. Argall,et al.  Turning assistive machines into assistive robots , 2015, SPIE OPTO.

[15]  Dana H. Ballard,et al.  Recognizing Behavior in Hand-Eye Coordination Patterns , 2009, Int. J. Humanoid Robotics.

[16]  三嶋 博之 The theory of affordances , 2008 .

[17]  Anthony G. Cohn,et al.  Egocentric activity recognition using Histograms of Oriented Pairwise Relations , 2014, 2014 International Conference on Computer Vision Theory and Applications (VISAPP).

[18]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[19]  James M. Rehg,et al.  Modeling Actions through State Changes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Marsette Vona,et al.  Bio-inspired rough terrain contact patch perception , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[21]  K. Rayner,et al.  The Perceptual Span and the Eye-Hand Span in Sight Reading Music , 1997 .

[22]  Dmitry Berenson,et al.  Unsupervised early prediction of human reaching for human–robot collaboration in shared workspaces , 2018, Auton. Robots.

[23]  N. Kruger,et al.  Learning object-specific grasp affordance densities , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[24]  Songpo Li,et al.  3-D-Gaze-Based Robotic Grasping Through Mimicking Human Visuomotor Function for People With Motion Impairments , 2017, IEEE Transactions on Biomedical Engineering.

[25]  Pierre Gançarski,et al.  A global averaging method for dynamic time warping, with applications to clustering , 2011, Pattern Recognit..

[26]  Anthony G. Cohn,et al.  Egocentric Activity Monitoring and Recovery , 2012, ACCV.

[27]  K Rayner,et al.  Saccade size in reading depends upon character spaces and not visual angle , 1981, Perception & psychophysics.

[28]  Jean-Yves Bouguet,et al.  Camera calibration toolbox for matlab , 2001 .

[29]  Jan Peters,et al.  Learning interaction for collaborative tasks with probabilistic movement primitives , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[30]  Ryan Snelgrove,et al.  Expert surgeon's quiet eye and slowing down: expertise differences in performance and quiet eye duration during identification and dissection of the recurrent laryngeal nerve. , 2014, American journal of surgery.

[31]  Samir Akkouche,et al.  Mixing triangle meshes and implicit surfaces in character animation , 2001 .

[32]  Siddhartha S. Srinivasa,et al.  The YCB object and Model set: Towards common benchmarks for manipulation research , 2015, 2015 International Conference on Advanced Robotics (ICAR).

[33]  R. C. Oldfield The assessment and analysis of handedness: the Edinburgh inventory. , 1971, Neuropsychologia.

[34]  Ali Farhadi,et al.  Understanding egocentric activities , 2011, 2011 International Conference on Computer Vision.

[35]  Sei Naito,et al.  An Attention-Based Activity Recognition for Egocentric Video , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[36]  Jean-Christophe Nebel,et al.  Recognition of Activities of Daily Living with Egocentric Vision: A Review , 2016, Sensors.

[37]  Marcus Nyström,et al.  An adaptive algorithm for fixation, saccade, and glissade detection in eyetracking data , 2010, Behavior research methods.

[38]  Siddhartha S. Srinivasa,et al.  Predicting User Intent Through Eye Gaze for Shared Autonomy , 2016, AAAI Fall Symposia.

[39]  R. Johansson,et al.  Eye–Hand Coordination in Object Manipulation , 2001, The Journal of Neuroscience.

[40]  Dimitrios Hatzinakos,et al.  Gait recognition using dynamic time warping , 2004, IEEE 6th Workshop on Multimedia Signal Processing, 2004..

[41]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[42]  Lorelei Lingard,et al.  Slowing Down to Stay Out of Trouble in the Operating Room: Remaining Attentive in Automaticity , 2010, Academic medicine : journal of the Association of American Medical Colleges.

[43]  M. Land,et al.  The Roles of Vision and Eye Movements in the Control of Activities of Daily Living , 1998, Perception.

[44]  Martin Volker Butz,et al.  Goal-oriented gaze strategies afforded by object interaction , 2015, Vision Research.

[45]  R. Klatzky,et al.  Hand movements: A window into haptic object recognition , 1987, Cognitive Psychology.

[46]  James M. Rehg,et al.  Learning to Recognize Daily Actions Using Gaze , 2012, ECCV.