Hand-Crafted Features or Machine Learnt Features? Together They Improve RGB-D Object Recognition

RGB-D object recognition is an important research topic in computer version, and seeking a robust image representation is the most important sub problem for RGB-D object recognition. On the one hand, the recently emerging deep learning methods, which learns image representations automatically by capturing the data structure, have demonstrated the impressive performance for object recognition. On the other hand, the previously commonly used hand-crafted features also encodes the prior knowledge about the data. By realizing that the hand-crafted features and machine learnt features actually characterize the different aspects of image data, rather than only using one type of feature, we propose to jointly use the machine learnt features and hand-crafted features for RGB-D object recognition. Specifically, we use the Convolution Neural Networks (CNNs) to extract the machine learnt representation, and use Locality-constrained Linear Coding (LLC) based spatial pyramid matching for hand-crafted features. We evaluated our proposed approach on three publicly available RGB-D datasets. Experimental results show that our method achieves the best performance under all the cases, which demonstrates the effectiveness of our method.

[1]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[2]  Jonathan T. Barron,et al.  A category-level 3-D object dataset: Putting the Kinect to work , 2011, ICCV Workshops.

[3]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Dieter Fox,et al.  Depth kernel descriptors for object recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[7]  Dieter Fox,et al.  Kernel Descriptors for Visual Recognition , 2010, NIPS.

[8]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[9]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[10]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[11]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[12]  J. Goldberg,et al.  SIFT: Design and analysis of a fault-tolerant computer for aircraft control , 1978, Proceedings of the IEEE.

[13]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[14]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[16]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[17]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[18]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[21]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Dieter Fox,et al.  Object recognition with hierarchical kernel descriptors , 2011, CVPR 2011.

[25]  Dieter Fox,et al.  Unsupervised Feature Learning for RGB-D Based Object Recognition , 2012, ISER.

[26]  Heinrich H. Bülthoff,et al.  Going into depth: Evaluating 2D and 3D cues for object classification on a new, large-scale object dataset , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[27]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[29]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  Liang-Tien Chia,et al.  Kernel Sparse Representation for Image Classification and Face Recognition , 2010, ECCV.

[31]  Ting Wang,et al.  Kernel Sparse Representation-Based Classifier , 2012, IEEE Transactions on Signal Processing.

[32]  Dong Xu,et al.  Recognizing RGB Images by Learning from RGB-D Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.