An Ego-Vision System for Hand Grasp Analysis

This paper presents an egocentric vision (ego-vision) system for hand grasp analysis in unstructured environments. Our goal is to automatically recognize hand grasp types and to discover the visual structures of hand grasps using a wearable camera. In the proposed system, free hand–object interactions are recorded from a first-person viewing perspective. State-of-the-art computer vision techniques are used to detect hands and extract hand-based features. A new feature representation that incorporates hand tracking information is also proposed. Then, grasp classifiers are trained to discriminate among different grasp types from a predefined grasp taxonomy. Based on the trained grasp classifiers, visual structures of hand grasps are learned using an iterative grasp clustering method. In experiments, grasp recognition performance in both laboratory and real-world scenarios is evaluated. The best classification accuracy our system achieves is <inline-formula><tex-math notation="LaTeX">$\text{92}\%$</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">$\text{59}\%$</tex-math></inline-formula>, respectively. System generality to different tasks and users is also verified by the experiments. Analysis in a real-world scenario shows that it is possible to automatically learn intuitive visual grasp structures that are consistent with expert-designed grasp taxonomies.

[1]  Danica Kragic,et al.  Spatio-temporal modeling of grasping actions , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Katsushi Ikeuchi,et al.  A sensor fusion approach for recognizing continuous human grasping sequences using hidden Markov models , 2005, IEEE Transactions on Robotics.

[4]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[5]  G. Schlesinger Der mechanische Aufbau der künstlichen Glieder , 1919 .

[6]  Aaron M. Dollar,et al.  Finding small, versatile sets of human grasps to span common objects , 2013, 2013 IEEE International Conference on Robotics and Automation.

[7]  Danica Kragic,et al.  Non-parametric hand pose estimation with object context , 2013, Image Vis. Comput..

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Aaron M. Dollar,et al.  Grasp Frequency and Usage in Daily Household and Machine Shop Tasks , 2013, IEEE Transactions on Haptics.

[10]  Danica Kragic,et al.  Grasp Recognition for Programming by Demonstration , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[11]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[12]  Yoichi Sato,et al.  A scalable approach for understanding the visual structures of hand grasps , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Aaron M. Dollar,et al.  Analysis of Human Grasping Behavior: Object Characteristics and Grasp Type , 2014, IEEE Transactions on Haptics.

[15]  J. F. Soechting,et al.  Postural Hand Synergies for Tool Use , 1998, The Journal of Neuroscience.

[16]  Luca Benini,et al.  Gesture Recognition in Ego-centric Videos Using Dense Trajectories and Hand Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[17]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[18]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Manolis I. A. Lourakis,et al.  Real-Time Tracking of Multiple Skin-Colored Objects with a Possibly Moving Camera , 2004, ECCV.

[20]  Oliver Brock,et al.  A Novel Type of Compliant, Underactuated Robotic Hand for Dexterous Grasping , 2014, Robotics: Science and Systems.

[21]  Yi Li,et al.  Grasp type revisited: A modern perspective on a classical feature for vision , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yoichi Sato,et al.  Understanding Hand-Object Manipulation with Grasp Types and Object Attributes , 2016, Robotics: Science and Systems.

[23]  Kris M. Kitani,et al.  Hand parsing for fine-grained recognition of human grasps in monocular images , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  J. Fischer,et al.  The Prehensile Movements of the Human Hand , 2014 .

[25]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[26]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Leymann,et al.  Ersatzglieder und Arbeitshilfen für Kriegsbeschädigte und Unfallverletzte , 1919 .

[28]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[29]  Luc Van Gool,et al.  Tracking a hand manipulating an object , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[30]  Katsushi Ikeuchi,et al.  Toward automatic robot instruction from perception-recognizing a grasp from observation , 1993, IEEE Trans. Robotics Autom..

[31]  Danica Kragic,et al.  Visual recognition of grasps for human-to-robot mapping , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[32]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[33]  M. Arbib,et al.  Opposition Space as a Structuring Concept for the Analysis of Skilled Hand Movements , 1986 .

[34]  Grammati E. Pantziou,et al.  Computing Shortest Paths and Distances in Planar Graphs , 1991, ICALP.

[35]  Antonis A. Argyros,et al.  Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints , 2011, 2011 International Conference on Computer Vision.

[36]  S. Wolf,et al.  Assessing Wolf Motor Function Test as Outcome Measure for Research in Patients After Stroke , 2001, Stroke.

[37]  Thomas Feix,et al.  A comprehensive grasp taxonomy , 2009 .

[38]  Mark R. Cutkosky,et al.  On grasp choice, grasp models, and the design of hands for manufacturing tasks , 1989, IEEE Trans. Robotics Autom..

[39]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[40]  Aaron M. Dollar,et al.  The Yale human grasping dataset: Grasp, object, and task data in household and machine shop environments , 2015, Int. J. Robotics Res..

[41]  James M. Rehg,et al.  Learning to Predict Gaze in Egocentric Video , 2013, 2013 IEEE International Conference on Computer Vision.

[42]  Kris M. Kitani,et al.  How do we use our hands? Discovering a diverse set of common grasps , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Deva Ramanan,et al.  First-person pose recognition using egocentric workspaces , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Cheng Li,et al.  Pixel-Level Hand Detection in Ego-centric Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.