Hand parsing for fine-grained recognition of human grasps in monocular images

We propose a novel method for performing fine-grained recognition of human hand grasp types using a single monocular image to allow computational systems to better understand human hand use. In particular, we focus on recognizing challenging grasp categories which differ only by subtle variations in finger configurations. While much of the prior work on understanding human hand grasps has been based on manual detection of grasps in video, this is the first work to automate the analysis process for fine-grained grasp classification. Instead of attempting to utilize a parametric model of the hand, we propose a hand parsing framework which leverages a data-driven learning to generate a pixel-wise segmentation of a hand into finger and palm regions. The proposed approach makes use of appearance-based cues such as finger texture and hand shape to accurately determine hand parts. We then build on the hand parsing result to compute high-level grasp features to learn a supervised fine-grained grasp classifier. To validate our approach, we introduce a grasp dataset recorded with a wearable camera, where the hand and its parts have been manually segmented with pixel-wise accuracy. Our results show that our proposed automatic hand parsing technique can improve grasp classification accuracy by over 30 percentage points over a state-of-the-art grasp recognition technique.

[1]  Richard Bowden,et al.  A boosted classifier tree for hand shape detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[2]  Lale Akarun,et al.  Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.

[3]  Stan Sclaroff,et al.  Estimating 3D hand pose from a cluttered image , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[4]  Tae-Kyun Kim,et al.  Tensor Canonical Correlation Analysis for Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Dinesh K. Pai,et al.  Interaction capture and synthesis , 2005, ACM Trans. Graph..

[6]  Cheng Li,et al.  Model Recommendation with Virtual Probes for Egocentric Hand Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Stavros J. Perantonis,et al.  Hand Shape and 3D Pose Estimation Using Depth Data from a Single Cluttered Frame , 2012, ISVC.

[8]  Martial Hebert,et al.  Stacked Hierarchical Labeling , 2010, ECCV.

[9]  Katsushi Ikeuchi,et al.  Grasp Recognition Using The Contact Web , 1992, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Kris M. Kitani,et al.  How do we use our hands? Discovering a diverse set of common grasps , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Victor B. Zordan,et al.  Physically based grasping control from example , 2005, SCA '05.

[12]  Björn Stenger,et al.  Model-based hand tracking using a hierarchical Bayesian filter , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  J. Elliott,et al.  A CLASSIFICATION OF MANIPULATIVE HAND MOVEMENTS , 1984, Developmental medicine and child neurology.

[14]  Luca Benini,et al.  Gesture Recognition in Ego-centric Videos Using Dense Trajectories and Hand Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[15]  Mathias Kölsch,et al.  Robust hand detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[16]  Peter Kontschieder,et al.  Structured class-labels in random forests for semantic image labelling , 2011, 2011 International Conference on Computer Vision.

[17]  Wen-Huang Cheng,et al.  LaRED: a large RGB-D extensible hand gesture dataset , 2014, MMSys '14.

[18]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[19]  Ali Farhadi,et al.  Understanding egocentric activities , 2011, 2011 International Conference on Computer Vision.

[20]  Katsushi Ikeuchi,et al.  Toward automatic robot instruction from perception-mapping human grasps to manipulator grasps , 1997, IEEE Trans. Robotics Autom..

[21]  Mei Chen,et al.  Food recognition using statistics of pairwise local features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Niels da Vitoria Lobo,et al.  Segment-based hand pose estimation , 2005, The 2nd Canadian Conference on Computer and Robot Vision (CRV'05).

[23]  Katsushi Ikeuchi,et al.  A robot system that observes and replicates grasping tasks , 1995, Proceedings of IEEE International Conference on Computer Vision.

[24]  Danica Kragic,et al.  Non-parametric hand pose estimation with object context , 2013, Image Vis. Comput..

[25]  Jia Liu,et al.  A taxonomy of everyday grasps in action , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[26]  Takeo Kanade,et al.  DigitEyes: vision-based hand tracking for human-computer interaction , 1994, Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects.

[27]  Jitendra Malik,et al.  Shape contexts enable efficient retrieval of similar shapes , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[28]  J. Napier The prehensile movements of the human hand. , 1956, The Journal of bone and joint surgery. British volume.

[29]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[30]  Cheng Li,et al.  Pixel-Level Hand Detection in Ego-centric Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Aaron M. Dollar,et al.  Finding small, versatile sets of human grasps to span common objects , 2013, 2013 IEEE International Conference on Robotics and Automation.

[32]  Danica Kragic,et al.  Grasp Recognition for Programming by Demonstration , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[33]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[34]  Katsushi Ikeuchi,et al.  A Framework for Recognizing Grasps , 1991 .

[35]  Antonis A. Argyros,et al.  Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints , 2011, 2011 International Conference on Computer Vision.

[36]  Matthai Philipose,et al.  Egocentric recognition of handled objects: Benchmark and analysis , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[37]  Luc Van Gool,et al.  Tracking a hand manipulating an object , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[38]  Katsushi Ikeuchi,et al.  A grasp abstraction hierarchy for recognition of grasping tasks from observation , 1993, Proceedings of 1993 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS '93).

[39]  Zhuowen Tu,et al.  Auto-context and its application to high-level vision tasks , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Yoichi Sato,et al.  A scalable approach for understanding the visual structures of hand grasps , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Thomas Feix,et al.  A comprehensive grasp taxonomy , 2009 .

[42]  Mark R. Cutkosky,et al.  On grasp choice, grasp models, and the design of hands for manufacturing tasks , 1989, IEEE Trans. Robotics Autom..

[43]  Thi-Thanh-Hai Tran,et al.  Invariant lighting hand posture classification , 2010, 2010 IEEE International Conference on Progress in Informatics and Computing.

[44]  Miriam Zacksenhouse,et al.  Coordinative Structure of Manipulative Hand-Movements Facilitates Their Recognition , 2006, IEEE Transactions on Biomedical Engineering.