Understanding Hand-Object Manipulation with Grasp Types and Object Attributes

Our goal is to automate the understanding of natural hand-object manipulation by developing computer visionbased techniques. Our hypothesis is that it is necessary to model the grasp types of hands and the attributes of manipulated objects in order to accurately recognize manipulation actions. Specifically, we focus on recognizing hand grasp types, object attributes and actions from a single image within an unified model. First, we explore the contextual relationship between grasp types and object attributes, and show how that context can be used to boost the recognition of both grasp types and object attributes. Second, we propose to model actions with grasp types and object attributes based on the hypothesis that grasp types and object attributes contain complementary information for characterizing different actions. Our proposed action model outperforms traditional appearance-based models which are not designed to take into account semantic constraints such as grasp types or object attributes. Experiment results on public egocentric activities datasets strongly support our hypothesis.

[1]  Yiannis Aloimonos,et al.  Detection of Manipulation Action Consequences (MAC) , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Iasonas Kokkinos,et al.  Understanding Objects in Detail with Fine-Grained Attributes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Kris M. Kitani,et al.  Hand parsing for fine-grained recognition of human grasps in monocular images , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Heiner Deubel,et al.  Contact points during multidigit grasping of geometric objects , 2011, Experimental Brain Research.

[5]  J. F. Soechting,et al.  Postural Hand Synergies for Tool Use , 1998, The Journal of Neuroscience.

[6]  J. Napier The prehensile movements of the human hand. , 1956, The Journal of bone and joint surgery. British volume.

[7]  Hironobu Takagi,et al.  Recognizing hand-object interactions in wearable camera videos , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[8]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[9]  James M. Rehg,et al.  Learning to Recognize Daily Actions Using Gaze , 2012, ECCV.

[10]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  B. P. McCloskey,et al.  Knowledge about hand shaping and knowledge about objects. , 1987, Journal of motor behavior.

[12]  Song-Chun Zhu,et al.  Visual Persuasion: Inferring Communicative Intents of Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Katsushi Ikeuchi,et al.  Toward automatic robot instruction from perception-recognizing a grasp from observation , 1993, IEEE Trans. Robotics Autom..

[14]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Aaron M. Dollar,et al.  Finding small, versatile sets of human grasps to span common objects , 2013, 2013 IEEE International Conference on Robotics and Automation.

[16]  Aaron M. Dollar,et al.  Grasp Frequency and Usage in Daily Household and Machine Shop Tasks , 2013, IEEE Transactions on Haptics.

[17]  Ali Farhadi,et al.  Understanding egocentric activities , 2011, 2011 International Conference on Computer Vision.

[18]  Trevor Darrell,et al.  PANDA: Pose Aligned Networks for Deep Attribute Modeling , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[22]  Yoichi Sato,et al.  A scalable approach for understanding the visual structures of hand grasps , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Thomas Feix,et al.  A comprehensive grasp taxonomy , 2009 .

[24]  Mark R. Cutkosky,et al.  On grasp choice, grasp models, and the design of hands for manufacturing tasks , 1989, IEEE Trans. Robotics Autom..

[25]  Kris M. Kitani,et al.  How do we use our hands? Discovering a diverse set of common grasps , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Cheng Li,et al.  Pixel-Level Hand Detection in Ego-centric Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Larry S. Davis,et al.  Image ranking and retrieval based on multi-attribute queries , 2011, CVPR 2011.

[29]  Danica Kragic,et al.  Non-parametric hand pose estimation with object context , 2013, Image Vis. Comput..

[30]  Aaron M. Dollar,et al.  Analysis of Human Grasping Behavior: Object Characteristics and Grasp Type , 2014, IEEE Transactions on Haptics.

[31]  Yi Li,et al.  Robot Learning Manipulation Action Plans by "Watching" Unconstrained Videos from the World Wide Web , 2015, AAAI.

[32]  Yi Li,et al.  Grasp type revisited: A modern perspective on a classical feature for vision , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Aaron M. Dollar,et al.  An investigation of grasp type and frequency in daily household and machine shop tasks , 2011, 2011 IEEE International Conference on Robotics and Automation.

[34]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Aaron M. Dollar,et al.  Classifying Human Hand Use and the Activities of Daily Living , 2014, The Human Hand as an Inspiration for Robot Hand Development.