Learning data-efficient hierarchical features for robotic graspable object recognition

Robotic graspable object recognition is a crucial ingredient in many exciting autonomous manipulation applications. However, identifying complex image features from limited data remains largely unsolved. In this paper, we leverage the advantages of two kinds of feature representation approaches, kernel descriptors and deep neural networks, to present a novel hierarchical feature learning framework for robotic graspable object recognition. This work enables the recovery of sparse and compressible features from limited data examples. Firstly, we design multiple kernel descriptors from the raw RGB-D images to adequately capture the discriminative structure of the object. Then, the extracted abstract representations are transferred to a four-layer deep neural network to generate more representative features for final graspable discrimination. Our network obtains impressive generalization capability with limited training data. Extensive experiments are carried out to validate the proposed method and the results show the state-of-the-art performance in discriminating graspable object task under limited-data.

[1]  Hongmin Wu,et al.  Kinect-based robotic manipulation: From human hand to end-effector , 2015, 2015 IEEE 10th Conference on Industrial Electronics and Applications (ICIEA).

[2]  Dieter Fox,et al.  Kernel Descriptors for Visual Recognition , 2010, NIPS.

[3]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[4]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[5]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[6]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[9]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[11]  Chong Wang,et al.  Superpixel-Based Hand Gesture Recognition With Kinect Depth Camera , 2015, IEEE Transactions on Multimedia.

[12]  Jitendra Malik,et al.  Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and Semantic Segmentation , 2015, International Journal of Computer Vision.

[13]  Andreas Zell,et al.  Object Recognition and Tracking for Indoor Robots Using an RGB-D Sensor , 2014, IAS.

[14]  Dieter Fox,et al.  Object recognition with hierarchical kernel descriptors , 2011, CVPR 2011.

[15]  Andrew E. Johnson,et al.  Spin-Images: A Representation for 3-D Surface Matching , 1997 .

[16]  Quan Wang,et al.  Kernel Principal Component Analysis and its Applications in Face Recognition and Active Shape Models , 2012, ArXiv.

[17]  Hong Liu,et al.  Robot grasp detection using multimodal deep convolutional neural networks , 2016 .

[18]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[19]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..